Although the standard attributes that AppKit provides are related to presentation, you can store any object type in an attributed string. One thing that I've done with this facility is implement syntax highlighting in SourceCodeKit. This uses the clang APIs to parse Objective-C code and tag ranges with semantic attributes. clang is the new C-family front end for LLVM, designed to be used in various tools beyond the compiler. It provides a stable C API for inspecting source files.
This API lets you work at two levels, one for the tokenizer and one for the parser. The tokenizer (also called the lexer) converts a stream of text into a stream of tokens. These tokens have a type, such as keyword, punctuation, or identifier. Most editors that claim to provide syntax highlighting actually only provide lexical highlightingthey highlight the text based on the types of the tokens.
The parser provides higher-level information. For example, it can tell the difference between the declaration of a type, the declaration of a variable, and the use of a variable. Figure 1 shows the difference between the two approaches. The top window is highlighted using SourceCodeKit's syntax highlighting, while the bottom window uses Vim's lexical highlighting. Notice that the syntax highlighter can spot macros, types defined in typedef statements, and so on.
Figure 1 SourceCodeKit's syntax highlighting.
SourceCodeKit uses an NSAttributedString to store all of these attributes. clang parses the source file, and then SourceCodeKit iterates over the token stream, attaching the clang-supplied attributes.
These are not presentation attributes. For example, the token type, provided by the lexer, is added like this:
[source addAttribute: kSCKTextTokenType value: TokenTypes[clang_getTokenKind(tokens[i])] range: range];
This code sets the kSCKTextTokenType attribute to something like SCKTextTokenTypeKeyword, indicating a keyword. Something else can then consume this string and perform some processing for keywords. The semantic type, identified by the parser, is added in a similar way.
Why not set presentation markup directly? Because not every consumer will want it, and some will want more than just fonts. For example, an editor might want to provide different buttons, depending on the token under the cursor. If the cursor is on a variable, you might want to be able to jump to where the variable is declared. You might want to provide a floating inspector showing the type and scope of the variable and a list of all of the locations where it's referenced. None of these options is possible if you create the presentation attributes directly.
The other advantage of separating the presentation and semantic markup on the string is that this separation allows us to construct consistent presentation from different parsers. For example, we can easily add parsers for other languages that will annotate the attributed string with the same semantic markup elements and then have the syntax highlighted automatically with the same style rules.