- Strongly Typing in JScript .NET
- Basic Datatypes
- Declaring and Typing Arrays
- Using the String Object
- Summary
Using the String Object
The String object is probably the most widely used and also the most complex object available in JScript .NET. The semantics surrounding this object are fairly complex because there is only one object that is converted on-the-fly between a JScript String object and a CLR String object. All the functions and properties of both objects are available to programmers, and so are all legacy JScript semantics. In other .NET languages, it is very easy to cause exceptions by using a String object. In JScript .NET, you very rarely see this because the compiler does conversions, casting, and initializations for you. This section examines the important aspects of String objects.
Declaring Strings
Two types in JScript (String and System.String) evaluate to the same String object. This is because a JScript String object and a CLR String object both map to a CLR String object internally. Then, special handler functions are called whenever a JScript String object function is called. Therefore, the programmer sees great performance from CLR String objects as well as great flexibility because existing code that works on the String object continues to work. Here's an example of calling both JScript String methods and CLR String methods on the same String object:
var MyString:String = "Hello"; MyString.toString(); MyString.ToString();
This example calls both toString(), which is a JScript function and ToString(), which is a CLR function. For the toString() function, a special call is made into a JScript helper library, and the string is printed according the rules of JScript. The ToString() function, on the other hand, is called directly from the String object and returns according to CLR rules. The two functions behave identically, but they take different code paths. To keep things simple, let's assume there is only one String object and that it simply has two sets of functionality that happen to overlap in certain areas. Where functionality overlaps, we'll be using the CLR versions because they map directly to the underlying object and should be a bit quicker than the JScript versions.
String Manipulation
The String object hosts quite a few methods for manipulating existing strings, creating new strings, and performing quite a few search-and-replace functions for string parsing. At this point, note that any modification to a String object results in a new String object being created because all String objects are immutable and thread safe. You will never run into work on a String object and have it change on you.
All the string manipulation functions can be grouped into several categories:
The JScript set of functionsThese are functions that retain their meanings from JScript and still exist as part of the JScript language.
The instance methods of the String objectThese are methods than you can call when you have an existing String object and want to perform some operations on it.
The static String object methodsThese methods accept parameters and return a new string as a result.
Each of the following sections provides short descriptions of these functions and a brief example. Some of the examples include output, but in most you should run the code yourself to see the results.
JScript Functions and Properties
Each of the functions and properties discussed in this section exists for compatibility with legacy JScript code. The only nonlegacy property is the length property, which gives the length of a string, in characters. All the old JScript functions that operate on the String object exist in JScript .NET. This can be somewhat strange because in many cases these functions were used when interacting with Hypertext Markup Language (HTML) and return an HTML string result. The following functions are some of the commonly used HTML methods of the String object:
anchorThe anchor method surrounds the current string with an <anchor> tag and returns the result as a new string. The string parameter passed to the anchor method becomes the value of the name attribute.
bigThe big method surrounds the current string with a <big> tag, which increases the size of the string in HTML by one font size and returns the result as a new string.
blinkThe blink method surrounds the current string with a <blink> tag, which causes text to blink in most browsers and returns the result as a new string.
boldThe bold method surrounds the current string with a <b> tag and returns the results as a new string.
These are just a few of the HTML-related methods, and they have all been part of the JScript standard for quite some time and are well documented both in other books on JScript and on the Web. At this point we will skip the rest of the HTML functions and move on to functions that have CLR equivalents or that can be used for more generic programming than just HTML.
This next set of functions is very helpful in retrieving pieces of the current string:
charAtSometimes you might want only a character, the numeric value of a character, or only a portion of a string. The charAt method takes the 0-based index of the character to return within the string. If the offset in the string doesn't exist, the method simply returns an empty string.
charCodeAtThe charCodeAt method behaves identically to the charAt method, except that it returns the numeric representation of the character rather than a string. If the offset does not exist, charCodeAt returns the value NaN.
fromCharCodeThe fromCharCode method is related to charCodeAt, and turns a list of character codes into a string. This is generally helpful when you're trying to shift the values of characters up or down a few places to perform a primitive form of cryptography.
Listing 3.5 demonstrates each of these functions in use. Note that the Caesar shift functionality (which is a form of simple encryption) of the last part of the listing doesn't work very well, but it handles the encryption (if you can call it that) of Unicode characters outside of the English character set.
Listing 3.5 Character Manipulation in Strings
import System; var MyString:String = "Getting Character values from Strings"; // Here we are going to get some characters out of the string for(var i = 0; i < 7; i++) { System.Console.Write(MyString.charAt(i)); } System.Console.WriteLine(); // Now lets print their Unicode values for(var i = 0; i < 7; i++) { System.Console.Write(MyString.charCodeAt(i) + ","); } System.Console.WriteLine(); // Here we are going to do a modified Caesar shift // Modified because it will use computer characters // and won't wrap the alphabet (a won't wrap to z) var MyShift:Int32 = 5; // Lets shift by 5 characters for(var i = 0; i < MyString.length; i++) { System.Console.Write( String.fromCharCode(MyString.charCodeAt(i) + MyShift) ); } System.Console.WriteLine();
The character code methods in Listing 3.5 are great for working with single characters when character offsets and values are known. To help find character offsets and search strings, you use the indexOf and lastIndexOf methods. These methods find the first or last occurrences of a substring, starting at a given index. The indexOf method takes the substring to search for within the string and an optional beginning index. If it is left null, the beginning index is 0 or the first character in the string. The result is the beginning of the substring within the current string. If the substring is not found, the result is -1. The lastIndexOf method is identical to indexOf method. However, this method searches the string, starting from the end, which means that the second optional parameter, rather than 0, is by default the last character. The return values of indexOf and lastIndexOf are the same.
To retrieve substrings, you use another set of functions. The substr and substring methods provide different ways of getting part of the current string:
substrThe substr method takes the beginning index of the string to retrieve and an optional length. If the index is past the end of the string, an empty string is returned. If the length is 0 or negative, an empty string is returned. If the length proceeds past the end of the string or the length is not specified, the function retrieves a substring from the offset to the end of the string.
substringThe substring method behaves identically to substr, except that in place of the length, it takes the ending index. If you make a mistake and pass in a higher starting index than ending index, JScript automatically swaps the parameters for you. The string returned includes the starting index up to, but not including, the ending index.
Finally, a less well-known method, slice, has the same parameters as substring as well as the same return results for normal operation. The one difference is that the slice method has a different set of rules for treating starting and ending values that are out of bounds for the current string. These rules allow you to index from the end of the string by specifying negative values for either the starting or ending indexes.
Some of these methods might seem strange. Look at Listing 3.6 for an example of their use.
Listing 3.6 Capturing Substrings of a String
var MyString:String = "Parsing strings and obtaining substrings"; var MyStartIndex:Int32; var MyEndIndex:Int32; // Success and Failure of indexOf System.Console.WriteLine("Index of (substring) and (notinstring)"); System.Console.WriteLine(MyString.indexOf("substring") + "," + MyString.indexOf("notinstring")); System.Console.WriteLine(); // indexOf versus lastIndexOf System.Console.WriteLine("Index of (string) and Last Index of (string)"); System.Console.WriteLine(MyString.indexOf("string") + "," + MyString.lastIndexOf("string")); System.Console.WriteLine(); // substr Semantics System.Console.WriteLine("The method substr"); System.Console.WriteLine(MyString.substr(8)); System.Console.WriteLine(MyString.substr(8,5)); System.Console.WriteLine(MyString.substr(8,100)); System.Console.WriteLine(MyString.substr(8,-1)); System.Console.WriteLine(); // substring versus slice System.Console.WriteLine("The method substring versus slice"); System.Console.WriteLine(MyString.substring(8,14) + "," + MyString.slice(8,14)); System.Console.WriteLine(MyString.substring(8) + "," + MyString.slice(8)); System.Console.WriteLine(MyString.substring(8,-1) + "," + MyString.slice(8,-1)); System.Console.WriteLine(MyString.substring(-5,8) + "," + MyString.slice(-5,8)); System.Console.WriteLine();
There are only a few remaining functionsthe replacement functions and a few miscellaneous, useful functions. The replace function in JScript runs against the current String object. It replaces a regular expression pattern with a replacement string. The regular expression pattern can be inline, or it can be a regular expression object defined by JScript.
NOTE
Chapter 12, "Regular Expressions," discusses replacement strings and regular expressions.
The replacement string can contain replacement text or escape characters that can be used to include portions of the matched text in the replacement text. (You'll learn about the advanced portions of this parameter, including several ways of manipulating the replacement string, in Chapter 11. At this point, we will use very basic strings as replacements.)
The two remaining functions we'll explore here are toLowerCase and toUpperCase. These functions perform the roles their names suggest. The toLowerCase function returns a copy of the current string with all characters converted to their lowercase equivalents. The toUpperCase function returns a copy of the current string with all characters converted to their uppercase equivalents. Listing 3.7 uses the replace function with a simple case-sensitive set of regular expressions to match and perform replacements against strings that are modified with the toUpperCase and toLowerCase functions.
Listing 3.7 Basic JScript Regular Expressions
import System; var reUpperCase = /MATCH/g; var reLowerCase = /match/g; var reNoCase = /MaTcH/ig; var strCamelCase:String = "Where Is The Match In This String?"; // Let's try to Replace Match with Sasquatch in the string Console.WriteLine(strCamelCase.replace(reUpperCase, "Sasquatch")); Console.WriteLine(strCamelCase.replace(reLowerCase, "Sasquatch")); Console.WriteLine(strCamelCase.replace(reNoCase, "Sasquatch")); Console.WriteLine(); // Now let's make a lowercase version of the string and do it again var strLowerCase:String = strCamelCase.toLowerCase(); Console.WriteLine(strLowerCase.replace(reUpperCase, "Sasquatch")); Console.WriteLine(strLowerCase.replace(reLowerCase, "Sasquatch")); Console.WriteLine(strLowerCase.replace(reNoCase, "Sasquatch")); Console.WriteLine(); // And one final time, with the uppercase string var strUpperCase:String = strCamelCase.toUpperCase(); Console.WriteLine(strUpperCase.replace(reUpperCase, "Sasquatch")); Console.WriteLine(strUpperCase.replace(reLowerCase, "Sasquatch")); Console.WriteLine(strUpperCase.replace(reNoCase, "Sasquatch")); Console.WriteLine();
CLR String Instance Methods and Properties
This section covers all the instance methods and properties that you can use with a String object. It also discusses the set of methods and properties that you can use without an instance of a string.
Let's start by examining all the different ways you can get an instance of a String object. JScript allows you to create a string from a string literal in code. But the CLR can create strings from arrays of characters, from string literals, or from a single character repeated many times. These methods of creating String objects are best demonstrated via code. The following example uses several String object methods to create character arrays:
import System; // Creation of a literal string var LiteralString:String = "Hello World"; Console.WriteLine(LiteralString); // Creation of a Character string created from the LiteralString var CharString1:String = new System.String(LiteralString.ToCharArray()); Console.WriteLine(CharString1); // Creation of a Character string using only parts of the LiteralString // 0 is the offset to begin in the array, and 5 is the length. var CharString2:String = new System.String(LiteralString.ToCharArray(), 0, 5); Console.WriteLine(CharString2); // Creation of a Character string by repeating a single character // 20 means to repeat the character 20 times var CharString3:String = new System.String(`0', 20); Console.WriteLine(CharString3);
This example converts a string literal to a character array and uses the character array to then create a String object. The method used in this example is simpler than creating a character array from scratch.
This example creates two "Hello World" strings, one "Hello" string, and a string of 20 zeros. Now that you have some String objects to play with, you can begin calling some of the methods and properties. To begin, you can get the length of any string in characters (not in bytes), by calling the Length property. No matter how the string is stored, you can always get the number of characters by using this property. You can then get individual characters from the String object by using the Chars property, which is an indexed property (that is, you can treat it just like an array). Simply passing in an array index retrieves a character. Make sure you use numbers in the range of 0 to Length -1, or you will get a runtime error, specifying that an invalid parameter was passed into the property. This is one of the many features of the strongly typed CLR datatypes.
All the string manipulation functions discussed earlier in the chapter are also made available here, in their CLR variants. The string constructors that use a character array to build a string and the Chars property are directly equivalent to the JScript character code functions.
The string search functions consist of the familiar IndexOf and LastIndexOf functions. Both of these functions can take a substring to search for, take a substring and a beginning offset, or take a substring with the beginning and ending offset. The major difference between the two functions lies in the ability to search for a single character and in being able to search for any given number of characters (for instance, a or b or c as opposed to the string "abc"). All versions of these functionswhether taking a substring, character, or array of charactersoptionally accept a starting offset and an ending offset. If a string or character match is found, the offset within the string is returned; otherwise, -1 is returned. (This is the same behavior as in the JScript functions.)
The CLR offers the functions StartsWith and EndsWith, which are shortcuts for comparing substrings against the beginning or end of a given string. The return value is a Boolean true or false, depending on whether the string matched. However, these functions are often useful only after the string has been fully normalized (that is, any whitespace at the beginning or end of the string has been removed so that the string match can happen). To aid in removing extra characters, the CLR provides the trim functions. The Trim function trims whitespace from the beginning and ending of a string, or it trims an array of characters from the string instead of trimming just whitespace. The TrimStart and TrimEnd functions take an array of characters or a null value to indicate whitespace. With this set of functions, you can trim any given number of characters or amount of whitespace from the beginning and end of a string.
The CLR also has padding functions. You can use the PadLeft and PadRight functions to add padding to each side of a string. To use either function, you simply pass in the length you want the string to be, and the function will pad the string with the needed spaces. You can pass an optional parameter in the form of a single character to specify padding with something other than a space. For example, later in this section, we'll be using an uppercase A instead of a space character to pad strings.
The CLR also provides the Substring function. Remember that JScript has three separate functions for getting substrings, each of which treats parameters in a different way. The CLR needs only one function for obtaining substrings, so if you want JScript-like functionality with multiple functions that do nearly the same thing, use the JScript functions. But be warned that they simply map back into the single CLR Substring function in the end. The Substring function takes just a starting index and returns up to the end of the string. It optionally takes a length parameter and returns the number of characters specified in the length parameter rather than returning to the end of the string. Note that if the starting index plus the length goes beyond the end of the string, the program throws a runtime exception. So in many cases, this function is not as nice as the JScript wrappers.
Listing 3.8 demonstrates all the functionality of the methods talked about so far.
Listing 3.8 CLR String Methods and Character Arrays
import System; var Searchable:String = "This is our searchable string!"; var CharString:String = "aeiou"; // Demonstrates of the basic IndexOf and LastIndexOf functions Console.WriteLine(Searchable.IndexOf("is")); Console.WriteLine(Searchable.LastIndexOf("is")); Console.WriteLine(); // Using Character Arrays. Will return vowels Console.WriteLine(Searchable.IndexOf(CharString.ToCharArray())); Console.WriteLine(Searchable.LastIndexOf(CharString.ToCharArray())); Console.WriteLine(); // Advanced indexes for IndexOf and LastIndexOf functions // These will return the second occurence of the string is from the // beginning and from the end using the first search for the beginning // offset of the second search. Console.WriteLine(Searchable.IndexOf("is", Searchable.IndexOf("is") + 1)); Console.WriteLine( Searchable.LastIndexOf("is", Searchable.LastIndexOf("is") - 1) ); Console.WriteLine(); // Set up variables for use with EndsWith and StartsWith var NonPaddedString:String = "Match me in the beginning and the end."; var PaddedString:String = "AAAAMatch me in the beginning and the end.AAAA"; Console.WriteLine(PaddedString.StartsWith("Match")); Console.WriteLine(PaddedString.EndsWith("end.")); Console.WriteLine(); Console.WriteLine(NonPaddedString.StartsWith("Match")); Console.WriteLine(NonPaddedString.EndsWith("end.")); Console.WriteLine(); // Trim Some variables var TrimString:String = PaddedString.Trim("A".ToCharArray()); var TrimEndString:String = PaddedString.TrimEnd("A".ToCharArray()); var TrimStartString:String = PaddedString.TrimStart("A".ToCharArray()); Console.WriteLine(TrimString); Console.WriteLine(TrimEndString); Console.WriteLine(TrimStartString); Console.WriteLine(); // Pad some variables var PadLeftString:String = NonPaddedString.PadLeft(NonPaddedString.Length + 4, `A'); var PadRightString:String = NonPaddedString.PadRight(NonPaddedString.Length + 4, `A'); Console.WriteLine(PadLeftString); Console.WriteLine(PadRightString); Console.WriteLine(); // And finally some Substrings var Substring1:String = PaddedString.Substring(4); var Substring2:String = PaddedString.Substring(4, PaddedString.Length - 8); Console.WriteLine(Substring1); Console.WriteLine(Substring2); Console.WriteLine();
Another set of instance functionality that you should understand is the conversion functions, which behave identically to the JScript to* functions. The CLR maintains a ToUpper method and a ToLower method similar to the JScript equivalents. You've already seen the ToCharArray function, which allows you to convert a string into an array of its constituent characters. This comes in handy when you need to create character arrays from strings on-the-fly, as we've done several times in this chapter.
You also need a set of functions that work on the current string to add, remove, and insert substrings. The CLR has instance methods for removing and inserting strings, but it doesn't have one for appending or concatenating strings together. However, you can use the Insert method to make up for this shortcoming. The Insert method works on the current string and takes a starting index along with the substring to be inserted. You can use this function for appending a substring by setting the starting index to the very end of the current string. An associated Remove method takes the starting index and a length to specify a substring to be removed. Together, these two functions are capable of performing most string maintenance tasks.
You can compare any String object to any other String object by using the CompareTo method. The CompareTo method can also be used to compare any CLR objects.
The following is a fairly simple code example that shows how to do basic string manipulation by using the Insert and Remove methods
import System; var TargetString:String = "Here is the final string!"; var StartString:String = "is the wrong stuff!"; // Lets go ahead and start the string off correctly StartString = StartString.Insert(0, "Here "); Console.WriteLine(StartString); Console.WriteLine(TargetString.CompareTo(StartString)); // We need to get rid of that ending StartString = StartString.Remove(12, 12); Console.WriteLine(StartString); Console.WriteLine(TargetString.CompareTo(StartString)); // Now lets finish this bad boy off StartString = StartString.Insert(StartString.Length, "final string!"); Console.WriteLine(StartString); Console.WriteLine(TargetString.CompareTo(StartString));
In this example, notice the results of the CompareTo method, which keeps working until the strings are equal. The value -1 means that the current string is less than the string being compared to, the value 1 means that the current string is greater than the string being compared to, and the result 0 means the two strings are equal.
CLR String Static Methods and Properties/Fields
There are only a couple helpful CLR string static methods and properties worth mentioning, so this section is short and to the point. There is only one field of note: the Empty field. The Empty field returns an empty string, and it is mainly used for comparing the current string against an empty string or for initializing string variables with nothing in them so that you can add to them later, using the string modification methods.
It can be useful to be able to print out a list of strings delimited with spaces, tabs, or commas. The Join method makes this pretty easy. With a separator string and an array of strings to join, the Join method returns a single string with all the strings in the array separated by the string.
What if you want to have a string returned in a special format? In this case, you need to use the Format method of the String object. The Format method enables you to pass in all sorts of formatting parameters, along with a number of objects. This method works on strings, numbers, and many other types of objects. The actual formatting string can be quite complex, but the general syntax is in the form of a string containing special formatting characters. These formatting characters begin and end with braces, in the form {numberofobjects, formatoptions}, where numberofobjects is the number of the object to format starting from 0, and formatoptions is a special set of formatting options that is passed to the object, allowing it to format itself (or not format itself, as the case may be).
Finally, the Compare method behaves identically to the CompareTo instance method. However, Compare lets you pass in the two strings to be compared rather than work against a string instance. This is handy when you need to compare two parameters to a function or compare command-line options. Listing 3.9 shows examples of formatting, joining, and comparing some numbers and converting them to strings. Numbers format very well and support quite a few format specifiers, so they work best when demonstrating the Format function.
Listing 3.9 String Comparisons and Formatting
import System; // We need an array of Integers var i:Int32 = 0; var IntArray:Int32[] = new Int32[6]; var StringArray:String[] = new String[6]; var ObjectArray:Object[] = new Object[6]; IntArray[0] = 1; IntArray[1] = -1; IntArray[2] = 2; IntArray[3] = -2; IntArray[4] = 100000; IntArray[5] = -100000; // Let's join some Strings together for(i = 0; i < 6; i++){ // Lets cast to a string StringArray[i] = String(IntArray[i]); // And an Object ObjectArray[i] = Object(IntArray[i]); } Console.WriteLine(String.Join("|sep|", StringArray)); Console.WriteLine(); // Let's compare some strings together for(i = 0; i < 6; i++,i++){ // Print out the results of string comparisons Console.WriteLine(String.Compare(StringArray[i], StringArray[i+1])); Console.WriteLine(String.Compare(StringArray[i], StringArray[i])); } Console.WriteLine(); // Let's finish up with the formatting var FormatString:String = "Hex the number\n{0:X}\n" + "Left align 20 characters\n{4,-20}\n" + "Right align 20 characters\n{5,20}\n"; Console.WriteLine(String.Format(FormatString, ObjectArray)); Console.WriteLine();
Using StringBuilder Instead of String
Now that we have explored the greatness of the String object, let's talk about what isn't so great about it. The problem with a String object is that every time it is modified, a brand new string is created. This happens so that any piece of a program that is working on or examining the older version of the string doesn't have it magically change. It's a nice feature of the CLR to make strings immutable and thread-safe.
It is fairly obvious that any process that concatenates many values to a single string, or works on very large strings with even a few items being appended to the end, needs more performance than does a process that copies the old String object with any new changes every time a modification is made. The CLR StringBuilder class is exactly what you need in this case. The StringBuilder class resides in the System.Text namespace and has only a few methods and properties that are of interest to us.
To add to the current string, you use the StringBuilder class's Append method. This method is overloaded and can take a piece of a string, an integer value, or even a Boolean variable. The AppendFormat method allows you to specify a formatting string along with the variables to be formatted. This is similar to the format string that is used elsewhere in the book with the Console methods. For more information on formatting, take a look at the .NET framework SDK documentation.
You can insert and remove values from a string. The Insert method takes a character offset in the string and begins placing either a string you've passed in or another value. The Insert method doesn't have a formatted version, so you have to do any formatting of the string beforehand. The Remove method takes both a starting index and a length for the string to be removed.
Append also includes some utility methods. The replace methods replace a given string, integer, or character with another. These are highly specialized and extremely quick replace functions. A more powerful alternative would be to use regular expressions, but for simple character replacement, the replace methods are fastest. Another important utility function is ToString(). Though this has to be exported by every object in the CLR, its use here is extremely important because it is the only way of turning a StringBuilder into a string that you can use elsewhere in an application.
Some of the properties of the StringBuilder object are also pretty neat. Each StringBuilder object has a Capacity property and a MaxCapacity property, which control how much space is allocated for the current string and how much space can be allocated as the string grows, respectively. If you know a string is going to grow to a certain size, it can be extremely helpful to set the capacity in the beginning. This prevents the StringBuilder object from having to grow as you expand the string and enables you to preallocate the space needed. You can pass an indexed property, Chars, the index of a character to return. It behaves like the charAt function of the JScript string. Finally, you can use the Length property to find the number of characters currently loaded into the StringBuilder object.
Listing 3.10 shows how to use the StringBuilder object.
Listing 3.10 Using StringBuilder to Create Strings
import System; import System.Text; BuildString(); function BuildString() { var MySB:StringBuilder = new StringBuilder(); // Appending normal text MySB.Append("Lets start with this\nLittle bit of text\n\n"); // Appending Formatted text MySB.Append("Here we are formatting some numbers\n"); MySB.AppendFormat("{0:X}, {1,10}, {2}\n\n", 500, 300, 10); // Using the Insert method MySB.Insert(0, "This text is getting inserted\n\n"); // Removing some text MySB.Insert(0, "This is getting inserted to be removed\n"); MySB.Remove(0, "This is getting inserted to be removed\n".Length); // Let's make numbers a bit bigger! MySB = MySB.Replace("numbers", "NUMBERS"); // And Finally lets examine the Properties MySB.AppendFormat("Capacity: {0}\n", MySB.Capacity); // Let's set a higher capacity MySB.Capacity = MySB.MaxCapacity / 100; MySB.AppendFormat("New Capacity: {0}\n", MySB.Capacity); MySB.AppendFormat("Max Capacity: {0}\n", MySB.MaxCapacity); MySB.AppendFormat("Length: {0}\n", MySB.Length); MySB.AppendFormat("Some Chars: {0} {1} {2}\n", MySB.Chars(0), MySB.Chars(1), MySB.Chars(2)); // Notice the use of ToString()! Console.WriteLine(MySB.ToString()); }