Understanding the Cocoa Text System
- The Building Blocks of Text
- Laying Out Text
- Laying Out Text Regions
- Paragraph Layout
- The High-Level View
One of the most powerfuland yet least understoodparts of the Cocoa API is the text system. This system encompasses the entire pipeline that begins with a string of characters and ends with a typeset page, view, or window. In this article, we'll take a look at some of the important components and how they fit together.
The Building Blocks of Text
Pretty much any programming language that you use has some kind of string data type. In C, this data type is very primitive; it's an array (which is really just a blob of memory) containing bytes, each of which represents an 8-bit character in some character set.
The character set is important. It defines the mapping from numbers, such as 65, to characters, such as "the uppercase Latin letter A." Without this mapping, strings are just ordered sequences of numbers. With it, each value has some meaning. In Cocoa, you typically store text in NSStrings, which use Unicode, with each value being a UTF-16 character.
Characters are quite abstract. When you read this sentence, you're not seeing a string of charactersyou're reading a sequence of glyphs. A glyph is a concrete representation of a character. The mapping from characters to glyphs is far from trivial. Consider a word such as flow. In English, this word is typically rendered with three glyphs. The last two letters have one glyph each, but the two characters fl are often rendered as a single ligature.
In some situations, one character may be represented by two glyphs. For example, consider the word Étoilé. You may have glyphs for the accented letters, or you may form them by combining an accent glyph with the glyphs for the letters E and e. To complicate things further, some of these mappings are context-dependent. The fl sequence may be rendered as a ligature or as two separate glyphs, depending on where those letters appear in a word.
The first job of the text system is converting strings to glyph runs. By itself, a string doesn't provide enough input for this translation. In Cocoa, the NSAttributedString class is used to provide the extra metadata. This class is incredibly simple and yet powerful. It wraps a string, allowing you to tag arbitrary ranges in the string with dictionaries. Although it's heavily used for storing presentation attributes, nothing stops you from storing any sort of arbitrary data in an attributed string object. This capability can be very powerful if you're handling structured text formatting, such as XML, containing metadata that isn't related directly to display.
An attributed string used for display typically indicates a font, along with some style flags for each range in the string.
The NSLayoutManager class, one of the core components of the text system in Cocoa, is responsible for mapping from characters to glyphs. In Cocoa, glyphs are stored in runs, which are basically arrays of numbers indicating the glyph index in a particular font. The font is then responsible for providing the drawing commands and metrics for the particular glyph when it's drawn.