Home > Articles

0672320606

Pascal Strings

I've included this section on strings because this feature of the language has a number of very confusing aspects. Under normal circumstances, Pascal strings are very easy to use. However, there happen to be a number of different kinds of Pascal strings, and that proliferation of types really cries out for a clear explanation.

Object Pascal has four different kinds of strings: ShortStrings, AnsiStrings, PChars, and WideStrings. All Object Pascal strings except WideStrings are, at heart, little more than an array of Char. A WideString is an array of WideChars. A Char is 8 bits in size, while a WideChar is 16 bits—going on 32 bits—in size. I will explain more about WideStrings and WideChars at the end of this section on strings.

The following code fragment gives you examples of the types of things you can do with a Char or a String. The code explicitly uses AnsiStrings, but most of it would work the same regardless of whether the variables S and S2 were declared as ShortStrings, PChars, or AnsiStrings. Of course, I will explain the differences among these three types later in this section. Here is the example:

var
  a, b: Char;
  S, S2: String;
begin
  S := `Sam';     // Valid: Set a string equal to a string literal
  S := `1';       // Valid: Set a string equal to character
  S := `';        // Valid: Set a string equal to an empty string literal
  a := `1';       // Valid: Set a Char equal to a character literal
  b := a;          // Valid: Set a Char equal to Char
  a := `Sam';     // Invalid: You can't set a Char equal to a string
  a := #65;       // Valid: Set a Char equal to a character literal
  a := Char(10);  // Valid: Set a Char equal to an integer converted to a char
  a := S[1];      // Valid: Set a Char equal to the first Char in a string
  S2 := `Sam'#10; // Valid: Set a string equal to a string with Char appended
  S := S + S2;    // Valid: Concatenate two strings
  if (S = S2) then
    ShowMessage(`S and S2 contain equivalent strings');
  if (S > S2) then
    ShowMessage(`S would appear in a dictionary after S2');
end;

The Pascal language originated in Europe, so strings follow the traditional European syntax and are set off with single rather than double quotes. The code shown here declares two Chars and two Strings. The first statement after the begin correctly sets the String equal to a string literal that contains three letters. You can also set a String equal to a string literal that contains a single character or no characters. You can set a Char equal to a single character such as a, b, A, or B. You cannot set a Char equal to a string such as Sam. You can, however, set a Char equal to the first character in a String, as in a := S[1]. You can also set a String equal to the 65th character in a character set by writing this syntax: a := #65. In the standard ANSI character set, the 65th character is a capital A, so this is equivalent to setting a Char equal to the letter A: a := `A';. The expression Char(10) is equivalent to the expression #10. Both expressions reference the 10th ANSI character, which is usually the linefeed character. It is also legal to append or insert characters into a string using the following syntax: S := `Sam'#10;. This adds a linefeed to the end of the string. Notice that the character is appended outside the closing quote.

C/C++, JAVA NOTE

In Java or C++ you would write "Sam\n" rather than `Sam'#10. The two statements are equivalent.

Studying the examples in this section should give you some sense of how to use strings in your programs. Notice that in one of the examples, you can use the + operator to concatenate two strings. You can also use the < and > operators to test whether a String is larger than another String, and you can use the = operator to test whether two Strings point to identical sets of characters.

JAVA NOTE

The = operator in Pascal does the same thing as the String::equals method does in Java. You are not testing to see whether the strings point at the same memory; you are testing to see whether they point at strings that contain the same sets of characters.

ShortStrings

The ShortString is the oldest kind of Pascal string, and it is rarely in use today. A ShortString is essentially a glorified array of Char with a maximum length of 256 characters. The first byte, the length byte, designates the length of the string. ShortStrings are not null-terminated; their length is determined only by the length byte. Remember that the length byte takes up 1 of the 256 bytes in the string, so the longest possible ShortString contains 255 characters. The limitation on the length of a ShortString exists because the first byte is 8 bits in size, and you can fit only 256 possible values in 8 bits.

NOTE

ShortStrings are used mostly for backward compatibility with old Pascal code. However, you might use a ShortString if you need to be sure that a block of memory has a prescribed size. For instance, you know that ShortStrings are usually 256 bytes long, so if you want to create an array of 4 Strings and you want to be sure that it occupies exactly 1,024 bytes of memory, regardless of the length of each string (and assuming that each string is 255 characters in length or less), you might decide to use ShortStrings rather than AnsiStrings. ShortStrings can also be useful in variant records, as described in the later section of this chapter titled "Variant Records."

Here is the syntax for using a ShortString:

var
  S: ShortString;
begin
  S := `Hello';
end;

This string is represented in memory as such: [#5][H][e][l][l][o]. The first byte of the string, which the user never sees, represents the length of the string. The remaining bytes contain the string itself.

You can also declare a ShortString like this:

var
  S: String[10];

This string contains only 10 characters rather than 255. More commonly, you might declare a type of string that is a custom length and then reuse that type throughout your program:

type
  String5 = String[5];
  String15 = String[15]
var
  S5: String5;
  S15: String15;

The compiler appears not to object to you assigning strings larger than 5 or 16 characters to the types declared previously. However, the string that you create will display only the appropriate number of characters. The others will be ignored.

Again, I want to stress that ShortStrings are not in common use today. In Java parlance, one might even say that they are deprecated, although I doubt that they will ever cease to be a part of the language.

AnsiStrings

AnsiStrings are also known as long strings. On 32-bit platforms, the maximum length for an AnsiString is 2GB. This type is the native Object Pascal string and the kind that you will use in most programs.

If you declare a variable as a String, it is assumed to be an AnsiString. In other words, if you do not specify that a string is an AnsiString, a ShortString, or a custom string such as String[10], you can assume that it is an AnsiString. The one exception to this rule occurs if you explicitly turn off the $H directive, where H can be thought of as standing for "huge" strings. In such cases, all strings are assumed to be ShortStrings unless explicitly declared otherwise. If you place the {$H-} directive at the top of a module, that entire module will use ShortStrings by default. If you deselect Project, Options, Compiler, Huge Strings from the menu, your entire program will use ShortStrings by default.

NOTE

When using the default key mappings, you can press Ctrl+O+O (that's the letter O) to get a list of all the compiler directives for the current module.

When a CLX method needs to be passed a string, it almost always expects to be passed an AnsiString. The AnsiString is the native type expected by CLX controls. Despite the simplicity of this statement, there are some twists and turns to it. As a result, I will discuss this in more depth both in this section and in the section "PChars."

An AnsiString is a pointer type, although you should rarely, if ever, need to explicitly allocate memory for it. The compiler notes the times when you make an assignment to a string, and it calls routines at that time for allocating the memory for the string. (Many of these routines are in System.pas, and you can step right into them with the compiler on some versions of Kylix.)

NOTE

You will find that many of the routines in the System unit use Assembly language. In general, they follow one of two different formats:

procedure Foo;
asm
  mov eax, 1
end;

procedure FooBar;
var
  X: Integer;
begin
  X := 7;
  asm
    mov eax, X;
  end;
end;

Procedure Foo uses asm where a normal Pascal procedure would use begin. In this type of procedure, all the code is written in Assembler until the closing end statement. The second example embeds an asm statement in a begin..end block. Both syntaxes are valid. When using the debugger, after starting your program, choose View, Debug Windows, CPU to step through your code. I will talk more about debugging in Chapter 5. However, I am not going to say anything more about Assembler in this book. Use System.pas as a reference if you are interested in this technology.

The only time that you might need to allocate memory for an AnsiString is if you are going to pass it to a routine that does not know about AnsiStrings—that is, when you are passing it a routine written in some language other than Pascal or when you are passing it to some exceptionally peculiar Pascal routine. In such a case, you would normally want to pass a PChar rather than an AnsiString. But it is possible to pass an AnsiString to such a routine; you allocate memory for it first and then pass it. (Use the SetLength routine to allocate memory for an AnsiString, as described at the very end of this section.)

Routines that take PChars are generally routines that are written in some other language, such as C or C++. If you pass an AnsiString into such a routine and you expect it to pass the string back with a new value in it (passing by reference), you probably need to allocate memory for the string before passing it. If you are passing an AnsiString into an Object Pascal routine, you can assume that the compiler will know how to allocate memory for it. In your day-to-day practice as an Object Pascal programmer, you should never need to think about allocating memory for an AnsiString. The cases when you need to do it are very rare and are not the type that beginning or intermediate-level programmers are ever likely to encounter.

AnsiStrings are null-terminated. This means that the end of the string is marked with #0, the first character in the ANSI character set. This is the same way that you mark the end of a string in C/C++. AnsiStrings are different than C/C++ strings, however, because they are usually prefaced with two 32-bit characters; one character holds the length of the string, and the other holds the reference count for a string. The only time that an AnsiString is not prefaced by these values is when the string variable references a 0-length string. As a programmer, you will almost certainly never have an occasion to explicitly reference either of these values.

It is a simple matter to understand the 32-bit value that holds the length of the string. It is similar to the length byte in a ShortString, except that it is 32 bits in size rather than 8 bits, so it can reference a very large string. What is the point, though, of the 32-bit value used for reference counting?

Reference counting is a means of saving memory and decreasing the time necessary to make string assignments. If two strings contain the same values, it is thriftiest to have them both point at the same memory. If possible, Object Pascal will do this by default. (You can override this behavior, as explained later in this section in the note on the UniqueString procedure.) When reference counting, the compiler simply points a second string at the memory allocated for a first string and then ups the reference count of the strings. Consider the following code fragment:

var
  Sam: String;
  Fred: String;
begin
  Sam := `Look at all beings with the eyes of compassion. -- Lotus Sutra';
  Fred := Sam; // Reference count incremented, no memory allocated for chars.
  Fred := `Learn to ` + Fred; // Strings not equal, memory must be allocated.
end;

When you set Sam equal to the quote from the Lotus Sutra, the compiler allocates sufficient memory for the variable Sam. When you set Fred equal to Sam, no new memory for character values is allocated. Instead, the reference count for the string is incremented and Fred is pointed at the same string as Sam. This kind of assignment is very fast and also saves memory. In short, you avoid both the extra memory consumed by allocating memory for the characters in the string and also the extra time required to copy the memory from one location in memory to another.

So far, so good. But what happens if you change one of the values that either variable addresses? That is what happens in the third line of the code fragment. When you change the value of Fred in the last line of the method, new memory is allocated for Fred and the reference count for the string is decremented by 1. At this point, Fred and Sam point at two entirely separate strings.

NOTE

You can use the UniqueString procedure to force a string to have a reference count of 1, even if it would normally have a higher count.

I want to stress that all these complicated machinations mean that you normally don't have to think about string memory allocation at all. You can just use a string type in a manner similar to the way you would use an Integer type. The compiler handles the allocation, and you don't have to think about it. However, it helps to know the inner workings of the AnsiString type, both so that you know what happens in unusual cases and so that you can design your code to be as efficient as possible.

Strings are generally allocated for you automatically. However, you can use the SetLength procedure to set or reset the length of a string:

var
  S: string;
begin
  SetLength(S, 10);
  SetLength(S, 12);
end;

Many routines built into the Object Pascal language can help you work with strings. In particular, see the FmtStr and Format functions. You might also want to browse the entire SysUtils unit and become familiar with the many useful routines found there. Also see the LCodeBox unit that ships with this book.

PChars

A PChar is a standard null-terminated string and is structurally exactly like a C string. In fact, this type was created primarily to provide compatibility with C class libraries. In particular, it was created for compatibility with the Windows API, which is written in C. It has proven to be a generally useful type, and it will come in handy when you are calling functions from the Linux C libraries such as Libc.

NOTE

To call most of the routines in the Libc library, just add Libc to your uses clause and go to work. This process is described in more depth in Chapter 6, "Understanding the Linux Environment."

The native Object Pascal string type is known as a String—or, more properly, as a long string or AnsiString. However, in most cases you are free to use either the native String type or the PChar type. Both types of strings are null-terminated. The difference between them is that a Pascal string has data placed in front of the String that determines the string's length and its reference count.

In most cases in a Kylix program, you should use the AnsiString type. A Kylix control such as a TEdit would never expect you to pass it a PChar. However, it is usually legal, but unorthodox, to pass it a PChar. This is confusing enough that an example might be helpful. Consider the following block of code:

procedure TForm1.Button1Click(Sender: TObject);
var
  Sam: PChar;
begin
  Sam := `Fred';
  Edit1.Text := Sam;
end;

This code will compile and run without error. In short, it is legal to assign a PChar to a property that is declared to be of type AnsiString. (Actually the Text property is declared to be of type TCaption, but TCaption is declared to be of type String.)

NOTE

CLX is built on top of the C++ library called Qt. As a result, many of the controls in CLX ultimately end up working with native C strings, or a C String object. However, none of that is any concern to us as Pascal programmers. CLX is expecting AnsiStrings and, when you work with CLX controls, you should use the native String type.

You can assign a PChar to a string directly. However, if you assign a String to a PChar, you need to typecast it:

var
  S: string;
  P: PChar;
begin
  P := PChar(S);

As you recall, an AnsiString is simply a PChar with some data in front of it. This data appears at a negative offset from the pointer to the AnsiString. As a result, typecasting the AnsiString as a PChar is really just a confirmation that from the pointer to the AnsiString and onward, an AnsiString is nothing more than a PChar. You will use this typecasting technique quite often if you need to pass AnsiStrings to routines written in C that are expecting a regular C string rather than an AnsiString.

Once the decision was made to make PChars part of Object Pascal there needed to be a set of routines to help you work with such strings. These routines are based closely on the functions you would use for manipulating strings in a C/C++ program. For instance, these routines have names such as StrLen, StrCat, StrPos, and StrScan. Again, you should look in the SysUtils unit for more information on these routines. You will find that there are dozens of such routines and that they are quite flexible and powerful.

WideStrings

WideStrings are very much like AnsiStrings, except that they point at wide characters of 16 bits rather than normal Chars of 8 bits. These large characters, known as WideChars, are a means of manipulating Unicode characters. Unicode in particular, and WideChars, in general, provide a means for working with large character sets that will not fit in the 256 bits of a Char. For instance, the kanji character sets from Asia have thousands of characters in them. You can't capture them using standard AnsiStrings; instead, you must use WideStrings.

NOTE

In Windows, the native wide character type (WCHAR) is 16 bits in size. In Linux, wide characters are 32 bits in size. The Kylix team decided to reuse the 16 bit WideChar in place for Windows rather than to rewrite the routines explicitly for the 32-bit Linux WideChar. As a result, your programs work with 16-bit WideChars, even though Linux defaults to 32-bit WideChars. Unless we are invaded from Alpha Centauri, where very large character sets are in common use, you should find that 16-bit WideChars are large enough for all practical purposes.

Starting with Kylix and Delphi 6, WideStrings are reference counted just as AnsiStrings are reference counted. In fact, you use a WideString exactly as you would use an AnsiString:

procedure TForm1.Button1Click(Sender: TObject);
var
  S: WideString;
begin
  S := `Sam';
  Edit1.Text := S;
end;

This example shows that you can convert an AnsiString to a WideString and also convert a WideString to an AnsiString through the simple use of the assignment operator. In Kylix and Delphi 6, code based on WideStrings is actually quite efficient. If you have good reason to use WideStrings, go ahead and use them. The compiler handles them quite easily.

This is the end of the section on Strings. Next up are typecasts, a technology used very widely in Kylix programs. After that, we will look at the array and record types, and then we'll take a quick tour of Object Pascal pointers.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020