Home > Articles > Mobile Application Development & Programming

  • Print
  • + Share This
This chapter is from the book

Surrogate Characters

Surrogate characters are typically referred to as surrogate pairs. They are the combination of two characters, containing a single code point. To make the detection of surrogate pairs easy, the Unicode standard has reserved the range from U+D800 to U+DFFF for the use of UTF-16. No characters are assigned to code point values in this range. When programs see a bit sequence that falls in this range, they immediately—zip! zip!—know that they have encountered a surrogate pair.

This reserved range is composed of two parts:

  • High surrogates—U+D800 to U+DBFF (total of 1,024 code points)
  • Low surrogates—U+DC00 to U+DFFF (total of 1,024 code points)

A lone surrogate is invalid in UTF-16; surrogates are always written in pairs, with the high surrogate followed by the low.

With UTF-16 encoding, characters with code points in ranges U+0000 through U+D7FF and U+E000 through U+FFFD are stored as single 16-bit units.

Table 2.6 contains examples of surrogate pairs.

Table 2.6 Examples of Surrogate Pairs

Character

Code Point

Surrogate Pair

017fig01.jpg

U+10000

{U+D800, U+DC00}

017fig02.jpg

U+10E6D

{U+D803, U+DE6D}

017fig03.jpg

U+1D11E

{U+D834, U+DD1E}

017fig04.jpg

U+10FFFF

{U+DBFF, U+DFFF}

The following code snippet shows you how to get a printout of a surrogate pair when you are given its code point:

uniChar characterArray[2];
CFStringGetSurrogatePairForLongCharacter(0x10FFFF, characterArray);
NSString *surrogate = [[NSString alloc] initWithCharacters:characterArray length:2];
NSLog(@"Surrogate: %@", surrogate);

Note that this is taking advantage of the CFStringGetSurrogatePairForLongCharacter function, which maps a UTF-32 character to a pair of UTF-16 surrogate characters. We need an array to plug the resulting UTF-16 pair into—that’s what the characterArray is for—and then the initWithCharacters:length: method of NSString does the rest.

Emoji

I’m making a special callout on the Emoji characters because they are extremely popular, and Apple both uses a special font to represent them and provides a keyboard to input just Emoji characters.

Introduced in the late 1990s from a Japanese mobile phone provider, Emoji is the Japanese term for picture characters. Created by Shigetaka Kurita as an effort to retain his company’s customer base, the smiley-faced icons gave their text messages more cuteness. The other supporting factor of the Emoji characters was the ability to give contextual information with a single character. What’s the weather going to be like today? That’s easily presented with a sun or umbrella or cloud Emoji character.

Figure 2.2 shows the first page of the Emoji keyboard.

Figure 2.2

Figure 2.2 The Emoji keyboard.

Apple Color Emoji is a font available on both iOS and OS X to provide support for the Unicode Emoji characters. Instead of this font having glyphs with black and white outlines, it has full-color, higher-resolution images for each of the nearly 900 glyphs it supports.

Strong support of Emoji has been a hard target to hit because it has historically occupied a private use area of Unicode with a range of code points from U+1F604 to U+1F539.

  • + Share This
  • 🔖 Save To Your Account