Archive for the ‘Regular Expressions’ Category

Read a CSV file with Regular Expressions in .Net

Tuesday, June 23rd, 2009

Here’s how you can read a CSV file using Regular Expressions in .Net:

</p>
public static DataTable GetDataTableFromCsvFile(string file)
{
// Where the CSV data goes
DataTable dt = new DataTable("CsvData");

// The pattern used to parse the CSV
const string csvPattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Regex csvRegex = new Regex(csvPattern);

// Read all lines in the file
// (not great for large files)
string[] fileLines = File.ReadAllLines(file);

// Get the column headers
// (assumes first row has headers and that
//  each column contains string values and
//  that each column name is unique)

Dictionary&lt;int, string&gt; headers = new Dictionary&lt;int, string&gt;();
string[] headerValues = csvRegex.Split(fileLines[0]);

for (int i = 0; i &lt; headerValues.Length; i++)
{
headers.Add(i, headerValues[i]);
dt.Columns.Add(new DataColumn(headerValues[i], typeof (string)));
}

// Then add the the rest of the lines
for (int k = 1; k &lt; fileLines.Length; k++)
{

DataRow dr = dt.NewRow();

string line = fileLines[k];
string[] cols = csvRegex.Split(line);

for (int i = 0; i &lt; cols.Length; i++)
{
string header = headers[i];
string data = cols[i];

// remove quotes around the field
if (data.Length &gt; 1 &amp;&amp; data.StartsWith("\"") &amp;&amp; data.EndsWith("\""))
data = data.Remove(data.Length - 1, 1).Remove(0, 1);

dr[header] = data;
}

dt.Rows.Add(dr);
}

return dt;
}

Of course, you will need to add in error catching and handling as well.

Hope this helps!

Regular Expression (RegEx) to Find Whole Words in a String

Thursday, July 31st, 2008

Ever want to match and replace whole words with in a string?  Regular expressions (System.Text.RegularExpressions) makes it a one line operation:

Regex.Replace(inputText, @"\b" + wordToReplace + @"\b", replacementText, RegexOptions.IgnoreCase);

This pattern uses “word boundries” as your delimiters for matching text.

Regex to Clean Out HTML from Text

Thursday, October 18th, 2007

Ahh, the power (and sometimes slight confusion) of regular expressions.  This seems to work well to remove HTML from text:

string htmlstring = { some chunk o html infested text };

Regex cleanOutHtml =
    new Regex(@”\s]+))?)+\s*|\s*)/?>”);


string onlytext = cleanOutHtml.Replace(htmlstring, “”);

Weee there you have it.