Sunday, August 4, 2013

Regular Expression to Replace Specific character between 2 immediate letters or numbers?

Wizend you are right: This data was most likely entered by a plumber.


I found the following REGEX which appeared to work extremely well: Knock on wood .


I have tested it by entering really dirty data and it work SSIS is no longer breaking the Insert Jobs.



Basically it turns any sequential quotes into single quotes. Do """"ABC"""","""ABC""" will turn into.

"'ABC'","'ABC'"


Here is the full code:




static void AddComma(string s, TextWriter writer)
{

foreach (var line in s.Split(new string[] { Environment.NewLine}, StringSplitOptions.None))
{
foreach (var t in line)
{
writer.Write(t);
}
writer.WriteLine();
}
writer.Flush();
}

static void Main(string[] args)
{
TextReader reader = new StreamReader(@"C:\Folder\test.txt");
string a = reader.ReadToEnd();
reader.Close();

FileStream aFile = new FileStream(@"C:\Folder\test.csv", FileMode.Create);

AddComma(Syntax.IlligalSyntax(a), new StreamWriter(aFile));
aFile.Close();
}



The I added a class with the method returning the regex just for kicks:




class Syntax
{
public static string IlligalSyntax(string syntaxInput)
{
string pattern = @"(?<!^\s*|,)""(?!,""|\s*$)";
string result = Regex.Replace(syntaxInput, pattern, "'", RegexOptions.Multiline);
return result;
}
}

The entire code is placed in an BI/SSIS Script Task (Control Flow) and is run to populate the table.



I am going to keep testing this and see but so far so good.



Two !@#$% weeks to get this going.



Hopefully someone from this thread can use this:



http://social.msdn.microsoft.com/Forums/sqlserver/en-US/a1cea026-545d-4548-9358-97a56fca90f2/ssis-bug-when-handling-csv-files-with-commas-embedded-in-quoted-strings


Thanks a million my friends:



OG


No comments:

Post a Comment