Home > Guides > Programming > .NET and Windows Programming

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Comparing Strings in .NET 2.0

Last updated May 19, 2006.

Sometimes the seemingly simplest things turn out to be the most difficult. For example, one would think that comparing strings in a program would be simple. It turns out that things can get ugly very quickly, and prior to .NET 2.0 there wasn’t a foolproof way to eliminate the problem.

The designers of the .NET Framework were very careful to include support for different cultures, and to implement the string comparison rules for each culture. This works very well when comparing culture-sensitive strings. But very often in our programs we want to compare culture-invariant strings: strings that must compare identically regardless of the culture. Handled incorrectly, such comparisons can produce undesirable results.

The "Turkish I" problem

For example, "everybody knows" that if you compare the strings "file" and "FILE" without regard to case, they’ll compare equal, right? That is:

if (string.Compare("file", "FILE", true) == 0)
{
  Console.WriteLine("’file’ is equal to ’FILE’");
}

That will work every time, right? Not quite. As it turns out, in Turkish the "capital I" is Ý (\u0130)--a "capital i with a dot." This is the capital version of the character "i". In Turkish there also is a lowercase "i without a dot", (\u0131), which capitalizes to "I".

The above is known as the "Turkish-I problem," and is perhaps the best-known example of how culture-sensitive comparisons on culture-insensitive strings can produce incorrect results. The rules for capitalizing i or lowercasing I differ among cultures. In the above code, if the culture were Turkish ("tr-TR"), the two strings would not compare equal. You might think that’s a minor problem, but consider what would happen if you wanted to prevent file URLs (i.e. file:\\) from being used in a program. Here’s how you might first write a method to check for such URLs:

static bool IsFileURL(string path)
{
  return (string.Compare(path, 0, "FILE", 0, 5, true) == 0);
}

That would work in most cases, but in places where the capital of "i" is not "I", somebody could slip a "file:" URL past you with possibly serious security consequences. Why? Because String.Compare uses the current culture for comparisons.

How to Compare Culture-Invariant Strings

In versions of the .NET Framework prior to 2.0, the recommendation was to use the invariant culture for comparing culture-invariant strings. That way, you always knew that strings would be compared the same way, regardless of the current culture. The solution in .NET 1.1, then, was:

static bool IsFileUrl(string path)
{
return (string.Compare(path, 0, "FILE", 0, 5, true,
    CultureInfo.InvariantCulture) == 0);
}

That works find when comparing ASCII strings, but InvariantCulture will sometimes make linguistic decisions that are not appropriate when character strings that should be treated as an array of bytes are instead interpreted. This can happen when comparing file names, cookies, and any other strings that can contain Unicode characters.

To alleviate these problems, .NET 2.0 introduces a new enumeration, StringComparison:

public enum StringComparison 
{
   CurrentCulture,
   CurrentCultureIgnoreCase,
   IvariantCulture,
   InvariantCultureIgnoreCase,
   Ordinal,
   OrdinalIgnoreCase
}

New overloads of String.Compare and String.Equals allow you to specify which of the StringComparison types to use in the comparison. The new ordinal comparison types ignore the features of natural languages, and instead do byte-by-byte comparisons. For case-insensitive comparisons, the invariant culture’s character tables and casing rules are used.

The correct way to write our IsFileUrl method is with one of the new String.Compare overloads, like this:

static bool IsFileUrl(string path)
{
return (string.Compare(path, 0, "FILE", 0, 5,
    StringComparison.OrdinalIgnoreCase) == 0);
}

The other benefit of using the ordinal comparisons is that they are fast. Much faster than the culture-sensitive or even the invariant culture comparisons. You should use them for all culture-agnostic string comparisons.

It’s important to note that the default comparison for String.Compare is the current culture. The default for String.Equals (including the == operator) is ordinal.

More Information

There is way more to comparing two strings than meets the eye. Naïve comparisons can cause no end of trouble and make your program fail in odd ways at unexpected times. Following the recommendations in the article New Recommendations for Using Strings in Microsoft .NET 2.0 will help keep you out of trouble.

Discussions

Copies of the array?
Posted Dec 23, 2008 03:40 PM by luige21
1 Replies
Hi
Posted Dec 5, 2008 05:10 AM by ajay2000bhushan
2 Replies
You have no clue.
Posted Jun 10, 2008 03:28 PM by theinternetmaster
1 Replies

Make a New Comment

You must log in in order to post a comment.

Related Resources

Jim Mischel"Highly unlikely" does not mean "impossible"
By Jim MischelJuly 18, 2009 No Comments

One of my programs crashed the other day in a very unexpected place.  A call to System.Threading.ConcurrentQueue.TryDequeue (from the Parallel Extensions to .NET) resulted in an OverflowException being thrown.  Investigation revealed a pretty serious bug in the System.Random constructor.

It's Here; Put Away Your Pre-Conceptions on What an OS Must Be: Part II
By John TraenkenschuhMay 24, 2009 No Comments

In the last blog in this series, Traenk relates his first experiences with computers and with coding.  But now, some years have passed. . .

It's Here; Put Away Your Pre-Conceptions on What an OS Must Be: Part I
By John TraenkenschuhMay 24, 2009 No Comments

Traenk relates his past experience with Operating Systems that goes back 25 years, ok, more than that but he ain't tellin'

See More Blogs

Informit Network