• Email this page
  • Print this page
  • Share this page

Managing String Manipulation with Regular Expressions

Posted in Technology by UpperStrata on December 23, 2009

Searching for particular strings in data is a task that many developers have done. The standard String methods within the Microsoft .NET framework are sufficient, but tedious to implement in certain situations. This problem is magnified particularly in projects that require a great deal of string manipulation and processing. One simple example includes:
 
// example of finding values in between patterns
string origStr = "<Voyage ID=\"REG080910\"><Sync Code=\"P\"/><Sync Code=\"E\"/></Voyage><Voyage ID=\"INS071205\" />";
string tempStr = origStr;
string foundStr = "";
int cntStart = 0;
int cntLength = 0;
 
do
{
    cntStart = tempStr.IndexOf("ID=\"");
    if (cntStart < 0) break;
    cntLength = tempStr.IndexOf("\"", cntStart);
    if (cntLength < 0) break;
    foundStr = tempStr.Substring(cntStart + 4, cntLength - cntStart + 6); // offset to get beginning & end of pattern we want.
    tempStr = tempStr.Substring(cntLength + foundStr.Length + 3); // offset to get to new beginning of string
} while (!String.IsNullOrEmpty(tempStr));
 
This process is often repeated ad nauseam to retrieve particular strings. For projects and tasks that require a great deal of string manipulation and processing, the amount of this type of code becomes unmanageable and unmaintainable. A more elegant solution is required.
 
Enter Regular Expressions
 
Since patterns are used to detect and manage the string manipulation, the use of regular expressions is an excellent answer for the foundation of our solution. If you are unfamiliar with regular expressions, a great application is available to help you learn.
The regular expression “Named Groups” concept is a very useful tool in matching particular patterns within strings. The “Named Groups” syntax is as follows:
 
(?<groupName>Pattern)
An implementation example follows below:
Regex pattern = new Regex("<Voyage ID=\"(?<voyage>[a-z0-9]*)\"|<PortCode>(?<port>[a-z0-9]*)", RegexOptions.IgnoreCase);
 
string sampleStr = "<Voyage ID=\"REG080910\"><Sync Code=\"P\"/><Sync Code=\"E\"/></Voyage><Voyage ID=\"INS071205\" />";
 
MatchCollection matches = pattern.Matches(sampleStr);
foreach (Match m in matches)
{
// m.Groups["groupName"].Value retrieves the matched pattern value
if (m.Groups["voyage"].Success)    
            // do something
     
if (m.Groups["port"].Success)
            // do something
}
 
The above example regular expression has two named groups: “voyage” and “port”. In common language, the regular expression is simply looking for any alphanumeric pattern of:
 
 <Voyage ID=”some alphanumeric”
and
<PortCode>some alphanumeric
 
The MatchCollection found via the regular expression can be enumerated and processed by checking the Success or Failure of your defined named groups. As you can see, this approach is very clean and easy to read. It definitely beats determining lengths, calculating positions, and writing extraneous looping code to obtain the required data. 
 
Bonus
 
Regular Expressions are common to many programming languages and this methodology can be easily ported. Be careful, however, .NET handles the syntax for “Named Groups” slightly different compared to other programming languages. This can be easily remedied by changing a few characters and should not deter a developer from trying out this feature. You can read more about this here: http://www.regular-expressions.info/named.html

 

Connect

Contact Us

1432 Edinger Avenue, Suite 120
Tustin, CA 92780

Email:
info@upperstrata.com
Phone:
949.393.9409

To get your project started, please give us a few details.

Client Login

Insights

Categories

Feeds