Home > Articles > Programming > Windows Programming

Hello, C#

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

1.7 The string Type

Once we have read a line of text, we need to separate it into the individual words. The simplest method of doing that is to use the Split() method of string—for example,

string text_line;
string [] text_words;

while (( text_line = freader.ReadLine() ) != null )
{
   text_words = text_line.Split( null );
   // ...
}

Split() returns an array of string elements separated by a set of characters indicated by the user. If Split() is passed null, as it is in our example, it separates the elements of the original string using white space, such as a blank character or a tab. For example, the string

A beautiful fiery bird, he tells her, magical but untamed.

is split into an array of 10 string elements. Three of them, however—bird, her, and untamed—retain their adjacent punctuation. One strategy for removing the punctuation is to provide Split() with an explicit array of characters with which to separate the string—for example,

char [] separators = {
  ' ', '\n', '\t', // white space
  '.', '\"', ';', ',', '?', '!', ')', '(', '<', '>', '[', ']'
};

text_words = text_line.Split( separators );

A character literal is placed within single quotation marks. The new-line and tab characters are represented by the two-character escape sequences \n and \t. Each sequence represents a single character. The double quotation mark must also be escaped (\") in order for it to be interpreted as a character rather than the beginning of a string literal.

The string type supports the subscript operator ([])—but for read operations only. Indexing begins at zero, and extends to Length-1 characters—for example,

for ( int ix = 0; ix < text_line.Length; ++ix )
	if ( text_line[ ix ] == '.' ) // OK: read access
	 text_line[ ix ] = '_'; 					// error: no write access

The string type does not support use of the foreach loop to iterate across the characters of its string.7 The reason is the somewhat nonintuitive immutable aspect of a string object. Before we make sense of what that means, let me briefly make sense of the C# for loop statement.

The for loop consists of the following elements:

for ( init-statement; condition; expression )
   statement

init-statement is executed once before the loop is executed. In our example, ix is initialized to 0 before the loop begins executing.

condition serves as the loop control. It is evaluated before each iteration of the loop. For as many iterations as condition evaluates as true, statement is executed. statement can be either a single statement or a statement block. If condition evaluates to false on the first iteration, statement is never executed. In our example, condition tests whether ix is less than text_line.Length—that is, the count of the number of characters contained within the string. While ix continues to index a valid character element, the loop continues to execute.

expression is evaluated after each iteration of the loop. It is typically used to modify the objects initialized within init-statement and tested in condition. If condition evaluates to false on the first iteration, expression is never executed. In our example, ix is incremented following each loop iteration.

The reason we cannot write to the individual characters of the underlying string literal is that string objects are treated as immutable. Whenever it seems as if we are changing the value of a string object, what has actually happened is that a new string object containing those changes has been created.

For example, to do an accurate occurrence count of words within a text file, we'll want to recognize A and a as being the same word. One way to do that is to change each string to all lowercase before we store it:

while (( text_line = freader.ReadLine() ) != null )
{
   // oops: this doesn't work as we intended ...
   text_line.ToLower();
   text_words = text_line.Split( null );

   // ...
}

ToLower() correctly changes all uppercase characters within text_line to lowercase. (There is a ToUpper() member function as well.) The problem is that the new representation is stored in a new string object that is returned by the call to ToLower(). Because we do not assign the new string object to anything, it's just tossed away. text_line is not changed, and it won't change unless we reassign the return value to it, as shown here:

text_line = text_line.ToLower();

To minimize the number of new string objects generated when we are making multiple modifications to a string, we use the StringBuilder class. See Section 4.9 for an illustration.

  • + Share This
  • 🔖 Save To Your Account