Home > Articles

  • Print
  • + Share This
This chapter is from the book


Since you’re programming in Objective-C, it is likely that objects are going to be your major data-structuring mechanism.

Use C inside the objects. The messaging interface hides the representation, and users are none the wiser. Try to avoid using fine-grain, semantic-free objects to implement the coarse-grain, semantics-bearing objects.


Accessors are methods that just read or write an object’s internal data, corresponding roughly to memory read and write instructions. According to good object-oriented style, attributes of an object should not be accessed directly, certainly not from outside the object, but preferably also from within. Objective-C 2.0 properties handle the burden of creating accessors.

However, accessors should also at least be minimized and ideally should be eliminated altogether, because they turn objects from intelligent agents that respond to high-level requests for service to simple data-bearing structures with a higher cost of access. Apart from a cleaner design, passing high-level requests into an object also makes sense from a performance point of view because this means the transaction costs of a message send is paid only once, at which point the method in question has access to all parameters of the message and the object’s instance variables, instead of using multiple message sends to gather one piece of data from the object at a time.

Of course, in reality, accessors or property definitions are a common feature of Objective-C programs, partly because program architecture deviates from object-oriented ideals and partly because accessors for object references in Objective-C are also needed to help with reference counting, as shown in Example 3.5.

Example 3.5 Object accessors need to maintain reference counts

-(void)setInteger:(int)newInteger {
-(void)setObject:(id)newObject {
     [newObject retain];
     [_object release];

As with other repetitive boilerplate, it makes sense to automate accessor generation, for example, by using Xcode macros, preprocessor macros that generate the accessor code. Alternately, the language can take over: Since Objective-C 2.0 properties can automatically synthesize accessors and with Automatic Reference Counting (ARC), the actual reference counting code was moved from the accessors to the code-generation of all variable access.

A caveat with using properties for generating accessors is that the generated code is not under user control, with the default atomic read accessors up to five times slower than a straightforward implementation, because they retain and autorelease the result, place a lock around the read in case of multithreaded access, and finally need to wrap all of that in an exception handler in order to release the lock in case of an exception. An alternative is the accessor macros shown in Example 3.6. These macros generate the correct accessor code just like properties. However, this generation is under user control, meaning not only that you get to decide what code gets run, but also that you can (a) change your mind and (b) extend the idea further without having to modify the compiler, as I will show later.

Example 3.6 Accessor macros

#if !__has_feature(objc_arc)
#define ASSIGN_ID(var,value)  {    id tempValue=(value);      if ( tempValue!=var) {         if ( tempValue!=(id)self )           [tempValue retain];         if ( var && var!=(id)self)           [var release];         var = tempValue;       }   }
#define ASSIGN_ID(var,value)    var=value

#if !__has_feature(objc_arc)
#define AUTORELEASE(x)  ([(x) autorelease])
#define AUTORELEASE(x)  (x)

#define setAccessor( type, var,setVar ) -(void)setVar:(type)newVar {     ASSIGN_ID(var,newVar);} 
#define readAccessorName( type, var , name )-(type)name { return var; }

#define readAccessor( type, var ) readAccessorName( type, var, var )

#define objectAccessor( objectType, var, setVar )     readAccessor( objectType*, var )    setAccessor( objectType*, var,setVar )

In OS X 10.11, the slowdown has apparently been reduced to around 35%, with or without ARC enabled.

Due to the pervasiveness of accessors, this overhead is serious enough that teams at Apple sped up whole programs by more than 10% just by switching properties from atomic to nonatomic. An improvement of 10% may not seem much when we are frequently talking about improvements of 10 to 100 times, but it is actually huge when we are talking about the whole program, where significant engineering effort is often expended for single-digit percentage improvements. And here we get double digits with a single change that had no other effect. So why does atomic exist? And why is it the default?

The idea was to protect against code such as that shown in Example 3.7. This code has a stale reference to an object instance variable that was actually released when the pointer went stale, similar to some early Unix malloc() implementations having a free() function that delayed freeing its memory until the next call to malloc(), in essence avoiding a potential crash in buggy code such as that in Example 3.7.

Example 3.7 Stale pointer reference

  id myWindowTitle=[window title];
  [window setTitle:@"New Window title"];  // windowTitle goes stale
  [self reportTitle:myWindowTitle];         // crashes pre-ARC

The crash will occur if title is held onto by the window, and only by the window, because in that case setTitle: will release the title and the reference to this object in myWindowTitle will not only be stale, that is, no longer pointing to the window’s title—but also invalid. Having auto-releasing accessors such as the ones provided by the atomic keyword will prevent a crash in this case, but at the cost of hiding the fact that the reference has, in fact, gone stale. I can see two potential reasons for writing this code. The first is that of a simple but slightly premature optimization if the title is used several times and we don’t want to go fetch it from the window every time. In this case the code is simply wrong, because you’d actually want to get the new value from the window after it was set, and atomic in this case just masks the incorrect code. A crash would alert the programmer to the fact that the logic is amiss. The second case is that in which the programmer actually intended to stash away the old value. In this case, the code is also plain buggy, because the programmer is well aware that the new value will make the old value invalid—that’s why they are stashing it! The corrected code in Example 3.8 not only doesn’t need atomic, it also makes the intention of the code clear.

Example 3.8 Non-stale pointer reference

  id oldWindowTitle=[[window title] retain] autorelease];
  [window setTitle:@"New Window title"];
  [oldWindowTitle doSomething];   // clear that we want old title

Note that ARC also prevents the crash, and therefore also hides the staleness of the pointer, just like atomic did—by aggressively retaining objects even when they are stored into local variables. The advantage is that you don’t have to think as much about the lifetime of your objects. The disadvantage is that you don’t have to think as much about the lifetime of your objects, and you get significantly more reference-counting traffic, which impacts performance.

So while it is unclear whether atomic would be beneficial at all even if there were no performance penalty, the significant slowdown in a very common operation makes it highly questionable at best. The fact that the collection classes do not support this pattern (for performance reasons) and iOS’s UIKit explicitly sets nonatomic for over 99% of its property declarations shows that Apple itself is not of one mind in this case.

Even slower than atomic accessors is access via key-value coding (KVC): A call such as [aTester valueForKey:@“attribute”] is not only more verbose than the equivalent direct message send [aTester attribute], and not only more error prone because the compiler cannot check the validity of the string passed to valueForKey:, it is also 20 times slower. If runtime parameterization of the value to get is required, using [aTester performSelector:@selector (attribute)]; is only twice as slow as a straight message send and 10 times faster than valueForKey:.

You might expect from these basic performance parameters that technologies built on top of KVC such as key-value observing (KVO) and Cocoa Bindings can’t be too speedy, and you’d be right: Adding a single KVO observer adds a factor of 100 to the time of a basic set accessor (600 ns vs. 6 ns) and a single binding a factor of 150 (900 ns).

KVO and bindings also do not protect against cascading update notifications, which can lead to at least quadratic performance if there are transitive dependencies (b depending on a and c depending on both a and b will result in c being evaluated twice), and can lead to infinite recursion and crashes if there are dependency loops. So for larger data sets or complex dependencies, it is probably a good idea to investigate using a proper constraint solver in the tradition of the 1978 Xerox PARC ThingLab or later developments such as DeltaBlue, Amulet, or Cassowary. In fact, it appears that Cassowary was adopted by Apple for Mountain Lion’s auto-layout mechanism.

Public Access

When sending messages to access instance variables is too slow, those instance variables can be made @public. In this case, access time is essentially the same as for a C struct, (non-fragile instance variables mean that the offset is looked up in the class instead of being hard-coded at compile-time, slightly affecting the result) but then again so is safety and encapsulation: none of either. The case can therefore be made that if @public access is required, one should use a struct instead. In fact, there are some additional benefits to a struct, mainly that it can be allocated on the stack in an auto variable, passed by value to a function, or directly embedded into another object or struct or array, whereas an Objective-C object must be expensively allocated on the heap and can only be accessed indirectly via pointer.

However, there are also some benefits to keeping such an open object a true Objective-C object—namely, it can have additional functionality attached to it, access can be granted or denied on a per-field basis, and it may be used compatibly with other objects that are not aware of its publicly accessible instance variables. As an example, the PostScript interpreter mentioned in Chapter 1 uses a string object that has all its instance variables public, shown in Example 3.9, but at the same time can be used largely interchangeably with Cocoa NSString objects.

Example 3.9 Full public string object definition

@interface MPWPSString : MPWPSCompoundObject
    unsigned char *bytes;
    unsigned   length,capacity;

Of course, breaking encapsulation this way makes evolution of the software harder and should be considered as a last resort when all other techniques have been tried, performance is still not adequate, and careful measurement has determined that the access in question is the bottleneck.

Object Creation and Caching

As we have seen so far, object creation is expensive in Objective-C, so it is best to use objects as fairly static and coarse-grained entities that exchange information via messages, preferably with mostly scalar/primitive arguments. If complex arguments cannot be avoided and high rates of creation/exchange need to be maintained, both Objective-C and Swift can resort to structs instead of objects, which like primitives can be stack allocated, allocated in groups with a single malloc(), and passed by value as well as by reference. However, this often means either switching between objects and structs when modeling the problem domain, or even foregoing object modeling altogether.

Another option to lessen or even eliminate the performance impact of object creation when high rates of (temporary) object creation cannot be avoided, is object caching: reusing objects that have already been allocated. The advantage of object caching over using structs is that performance considerations do not interfere with the modeling of the problem domain and all the code involved. Instead, a pure performance fix can be applied if and when performance turns out to be a problem.

Table 1.2 shows that reusing just one object instead of allocating a new one, we have not only saved some memory, but also CPU time equivalent to approximately 50 message sends, allowing us to use objects where otherwise we might have had to revert to C for performance reasons. Object reuse was common in object-oriented languages until generation-scavenging copying garbage collectors with “bump pointer” allocation came online that made temporary objects extremely cheap. Alas, C’s memory model with explicit pointers makes such collectors that need to move objects nigh impossible, so object reuse it is!

In order to reuse an object, we have to keep a reference to it in addition to the reference we hand out, for example, in an instance variable or a local collection. We can either do this when we would have otherwise deallocated the object, or we can keep a permanent reference. Then, when it comes time to create another object of the desired class, we check whether we already have a copy of it and use that already allocated copy instead.

Mutability and Caching

When is it safe to reuse an object? Immutable value objects, for example, can be reused as often as desired, because different copies of the same value object are supposed to be indistinguishable. Foundation uses this strategy in a number of places for some global uniquing. Small number objects are kept in a cache once allocated, and constant string objects are merged by the compiler and linker and shared.

In order to cache objects behind the client’s back, these objects must be immutable, because sharing between unwitting clients becomes impossible if changes made by one client become visible to another. However, immutability forces creating a new object on every change, and creating a new (uncached) number object every time a new value is needed is around 30 to 40 times more expensive than just setting a new value, even if done safely via an accessor. So how can we reuse mutable objects?

One way, chosen by the UIKit for table cells, is to have a documented API contract that guarantees reusability. Another is to take advantage of the Foundation reference counting mechanism, which we use to track if the only reference left to the object is the one from the cache, in which case the object can be reused. Instead of using the 1→0 transition to see whether the object needs to be deallocated, we use the RC = 1 state to see whether the object can be reused, because the cache is keeping a single reference. Table 3.4 summarizes this information.

Table 3.4 Reference counts for object caching




unused action


RC > 0

RC = 0


object caching

RC > 1

RC = 1


Example 3.10 shows how this reference-count-aware1 cache can be implemented, though the actual implementation that’s part of a generic object-cache class is much more heavily optimized. The instance variables referenced here are defined in Example 3.18 and discussed in detail in the “IMP Caching” section in this chapter.

Example 3.10 Circular object cache implementation

    id obj;
    if ( objIndex >= cacheSize ) {
    if ( obj == nil || [obj retainCount] > 1 ) {
        if ( obj != nil ) {
            [obj release];      //--- removeFromCache
        obj=[[objClass alloc] init];
    } else {
        obj=[obj reinit];
    return obj;

The MPWObjectCache keeps a circular buffer of objects in its cache that’s a C array of ids. When getObject2 method is called to create or fetch an object, it looks at the current location and determines whether it can reuse the object or needs to allocate a new one and then bump the location. It assumes objects know how to reinitialize themselves from an already initialized but no longer valid state. The circular buffer structure gives objects that were vended by the cache some time before we try to reuse them, similar to the young space in a generation-scavenging collector. At around 9.5 ns per pop, allocating from the (optimized) object cache is around 15 times faster than straight object allocation, so this is a very worthwhile optimization.

Wraparound of the index is handled via an if-check rather than a modulo operation, because a modulo is a division, and as we saw earlier in this chapter, division is one of the few arithmetic operations that is still fairly slow even on modern CPUs. A different way of implementing a modulo would be by and-ing the low bits of the index, but that would restrict the cache size to powers of 2. Finally, there are many variations of probing and retirement policies that will have different performance characteristics, for example, attempting at least n consecutive slots or using random probing. So far, this very simple algorithm has proved to be the best balance for a wide variety of use-cases.

Another potential way of using reference counts is to stick to the 1→0 transition the way traditional reference counting does and then override dealloc to enqueue the object in a global cache instead of deallocating regularly. However, that sort of approach, unlike the object cache presented here, couples the target class tightly to the caching behavior and requires use of a global cache. I therefore recommend against that type of global cache, despite the fact that it is quite popular. Not requiring locking, scoping the cache to the lifetime of another object and the specific circumstance of that object’s use patterns are a large part of what makes object caching via a cache object powerful and fast.

Lazy Evaluation

Another use of caching is lazy evaluation of properties. When a message requests a property of an object that is expensive to compute and may not even be always needed, the object can delay that computation until the property is actually requested instead of computing the property during object initialization. Alternately, the result of an expensive computation can be cached if it is likely that the computation will be used in the future and if it is known that the parameters of the computation haven’t changed.

Lazy accessors have become common enough in my code that they warrant a specialized accessor macro, shown in Example 3.11.

Example 3.11 Lazy accessor macro

#define lazyAccessor( type, var ,setVar, computeVar )              readAccessorName( type,var, _##var )         setAccessor( type, var, setVar ) -(type)var {         if ( ![self _##var] ) {                 [self setVar:[self computeVar]];         }         return [self _##var]; } \

The accessor builds on the macros from Example 3.6 but also has a parameter computeVar that defines the message to be sent to compute the result. When the getter is called, it checks whether it has a result. If it has a result, it just returns it; if not, it calls the computeVar method and then stores the result before returning it. Another less frequent accessor macro is the relay accessor that simply forwards the request to an instance variable.

Caching Caveats

With all this caching going on, it is important to remember that caching isn’t without pitfalls of its own. In fact, a friend who became professor for computer science likes to ask the following question in his exams: “What is a cache and how does it slow down a computer?”

In the worst case of a thrashing cache with a hit rate of 0%, the cache simply adds the cost of maintaining the cache to the cost of doing the non-cached computation, and an easy way of reaching a 0% hit rate with the very simple cache policy used so far is invalidating a cache item just before it is needed again, for example, by having a linear or circular access pattern and a cache size that is smaller than the working set size, even by just a single item.

Additionally, caches use up memory by extending the lifetime of objects, and therefore increase the working set size, making it more likely to either push working-set items to a slower memory class (L1 cache to L2 cache, L2 cache to main memory, main memory to disk...) or even run out of memory completely on iOS devices, resulting in the process being killed. Global, transparent caches like CoreFoundation’s CFNumber cache fare worst in this regard, because they have effectively no application-specific information helping them determine an appropriate size, leading to caches with arbitrary fixed sizes, like 12.

In addition, they can have puzzling bugs and side effects that, because of their transparent nature, are hard for clients to work around. Example 3.12 demonstrates how different constant strings and number objects allocated by completely different means but with the same numeric value turn out to be the same actual object, as shown by logging the object pointer with the “%p” conversion directive.

Example 3.12 Globally uniqued Foundation string and number objects in 32 bit

#import <Foundation/Foundation.h>

NSString *b=@"hello world";
int main( int argc, char *argv[] ) {
    NSString *a=@"hello world";
    printf("NSStrings a=%p=b=%p\n",a,b);
    for ( int i=1; i<15; i++) {
         NSNumber *c=[NSNumber numberWithLongLong:i];
         CFNumberRef d=CFNumberCreate(NULL,  kCFNumberIntType,&i);
         printf("%d NSNumber: %p type: %s                  CFNumberCreate: %p type: %s\n",
                 i,c,[c objCType],d,[(id)d objCType]);
    return 0;
cc -Wall -m32 -o uniqueobjs uniqueobjs.m -framework Foundation
NSStrings a=0x6b014=b=0x6b014
11 NSNumber: 0x78e7ac30 type: i CFNumberCreate 0x78e7ac30 type: i
12 NSNumber: 0x78e7ac40 type: i CFNumberCreate 0x78e7ac40 type: i
13 NSNumber: 0x78e7ac60 type: q CFNumberCreate 0x78e7ab90 type: i
14 NSNumber: 0x78e7aba0 type: q CFNumberCreate 0x78e7abb0 type: i

At the time this test was run, the cutoff for the cache was 12, requests up to that value get a globally unique, cached object, whereas values larger than that result in an allocation. Also note that the objCType of all cached values is “i,” a 32-bit integer, despite the fact that we specifically asked for a long long, type code “q”. Once outside the cacheable area, the requested type is honored.

The reason for this odd behavior is that the cache used to always cache the first object created for a specific value, regardless of the type requested. So if the first request for the integer 5 was for a long long, then all subsequent requests for a “5” would return that long long NSNumber. However, this could and did break code that was not expecting a “q” (long long) type code in its NSNumber objects, for example, object serializers that used the type code and did not handle the “q” code! This bug was fixed by ignoring the requested type-code for the cached numbers and using “i” instead, which is in fact just as incorrect as the other case, but in practice appears to cause fewer problems. On the 64-bit runtimes, the cache is disabled because all these small integers are implemented as tagged pointers.

Another pitfall is the use of NSDictionary or NSSet instances to cache de-duplicate string objects. While they may reduce peak memory usage in some cases, they can also increase memory usage by unnecessarily and arbitrarily extending the lifetime of the stored strings. Furthermore, the fact that NSString objects have to be created before they can be tested for membership means that the CPU cost has already been paid before the cache is tested, so the cache actually increases CPU use. The way to improve this situation is to create a cache that can be queried using C-strings, either with a custom cache or with a custom set of callbacks for CFDictionary.

Pitfall: Generic (Intermediate) Representations

One of the fun features of the NeXTStep system that is the ancestor of OS X and iOS was its programmable graphics and window system based on DisplayPostscript. Just as the transition to OPENSTEP brought us Foundation with the basic object model of NSString, NSNumber, NSDictionary, and NSArray, I happened to be working on a kind of “PostScript virtual machine” that redefined PostScript operators to return graphical objects in a structured format rather than paint them on the screen, similar to the “distillery” code that to the best of my knowledge still powers Adobe’s Acrobat Distiller PostScript to PDF converter to this day.

As I looked at my fresh install of OPENSTEP, I noticed that the binary object sequence (BOS) format created by the interpreter’s printobject command included numbers, dictionary, arrays, and strings, mapping perfectly onto the data types provided by the brand new Foundation framework! So all I had to do was create a generic mapper to convert BOS format to Foundation, access the information encoded in those Foundation objects, and use that information to populate my domain objects, which included paths, images, text, and various graphics state parameters such as colors, transformation matrices, font names, and sizes.

While this approach allowed me to construct a prototype graphical object reader reasonably quickly, the performance was “majestic.” In a complete surprise, the limiting factor was neither the PostScript procedures that had to emulate the drawing commands and produce output, nor the serialization operator in the PostScript interpreter or the deserialization code, or even the somewhat pokey byte-oriented communications channel. No, the major limiting factor was the creation of Foundation objects, a factor I never would have thought of. After the shock of my disbelief wore off, I replaced the parts that had converted the BOS to Foundation objects with a simple cover object that kept the original data “as-is” but was able to access parts using a generic messaging interface. The parser then accessed this messaging interface instead of converted objects, and performance improved threefold.

This was the first time I learned the lesson that generic intermediate object representations, also known as data transfer objects, are just a Bad Idea, at least if you care about performance and are using Objective-C. While the general principle holds true in other languages, Objective-C drives that message home with a particular vengeance because of the 1:5:200 performance ratio between basic machine operations, messaging, and object allocation.

Of course, I had to relearn that lesson a couple of times before it finally managed to stick, but the reason why it is true is actually pretty simple: a generic representation will usually have significantly more objects than a final object representation because it needs to use dictionaries (object header + key and value storage) instead of plain old Objective-C objects (somehow the “POOO” acronym as analogous to Java’s Plain Old Java Objects [POJO] never caught on), object keys where objects can use instance variable offsets, and object values where objects can use simple scalar primitive types. So not only will you be creating objects that are significantly more expensive individually, but you will also need to create many more of these objects. Multiplying out these two factors makes generic intermediate object representations pretty deadly for performance in Objective-C and Swift.

Alas, Apple also makes this anti-pattern extremely convenient, so it has become pretty much the default for accessing any sort of serialized representation. A typical example is JSON parsing, with the only method directly supported by the frameworks being converting the JSON to and from in-memory property lists, that is, Foundation collections, NSNumber, and NSString objects. Even the plethora of Swift JSON “parsing” examples that have sprung up on the Internet essentially all first call NSJSONSerialization to do the actual parsing and generation.

Arrays and Bulk Processing

When dealing with a collection of entities accessed by integer indexes, Foundation NSArray improves on plain C arrays (array[index]) with a number of convenient services: automatic handling of memory management (retain / release), automatic growth, sorting, searching, subarray generation, and a memory model that allows efficient addition and removal of objects at both ends of the array.

What’s missing is a set of arrays of primitives types such as float, double, or int with similar sets of services, as it should be clear by now that wrapping the scalar values in NSNumber objects and sticking those in an NSArray will not perform particularly well. Such a wrapper is easy to write, and a number of them exist; for example, the author’s MPWRealArray, the arrays in FScript, or SMUGRealVector.

As Figure 3.1 shows, the performance benefits of having a homogenous collection of scalars are overwhelming: Summing the values in an array filled with 10,000 numbers is 5 times faster than summing NSNumbers in an NSArray even if the individual numbers are accessed via a messages send, and 17 times faster when the array is asked to perform the operation in bulk.

Figure 3.1

Figure 3.1 Time to create and sum a numeric 10,000-element array (microseconds)

The differences are even more pronounced for creating such an array and filling it with the values from 1 to 10,000: The homogenous array is 20 times faster than creating the same values as an NSArray of NSNumbers, even when every real value is added inefficiently using a separate message send. Moving to bulk operations, where the loop is executed inside the real array, takes the difference to a factor of 270.

Better yet, representing the numbers as a contiguous C array of floats allows us to use the vector processing tools built into OS X such as vDSP library. Using vDSP functions, summing using the MPWRealArray code in Example 3.13 becomes yet another 10 times faster than even the bulk processing scalar code, bringing the performance relative to the NSArray + NSNumber combination to 1.3 μs.

Example 3.13 Summing using vDSP

@interface MPWRealArray : NSObject
    int capacity;
    NSUInteger count;
    float *floatStart;

    float theSum=0;
    vDSP_sve ( floatStart, 1, &theSum, count );
    return theSum;

This takes the time for a single addition to only 0.13 ns, showing off the true power of our computing buzz saws. Creation is also another 4 times faster when adding vector functions bulk real processing; at 3.5 μS for the entire array, this is now 1,000 times faster than creating an equivalent NSArray of NSNumbers.

To bring this into perspective, a factor of 1,000 is close to the difference between the clock speed of a Mac Pro and that of the author’s Apple ][+ with its 1-MHz 6502 processor!

Of course, you don’t have to use an array object. If performance is critical, you can also always use a plain C array, which can store any type of primitive, struct, object. However, this method is without the conveniences of growability, taking care of reference counting, safe access, or convenience methods.

Swift crosses the NSArray of Foundation with plain C arrays to get the Swift array type: It provides most or all of the conveniences of an array object like NSArray or MPWRealArray, while at the same time using generics to be applicable to all types like a plain C array. The big advantage is that the temptation to use an array of NSNumber objects or similar when you just wanted to store some integers or reals has lessened dramatically. You just write [Int] when you want an array of integers, or with type inference provide a literal array of integers. Access times are reasonable and you still get growability and bounds checking.

The downside is that while it is harder to get unreasonably bad performance, it is also currently hard to get the best possible performance. In my tests, the Swift summation code was around 4 to 5 times slower than the equivalent Objective-C code, though that is probably going to change as the optimizer is improved. Speaking of the optimizer, unoptimized Swift array code was a shocking 1,000 times slower than reasonably optimized Objective-C code, on par with the object-based code provided here as a counterexample despite using primitives.

It is not exactly clear how Swift manages to be this slow without the optimizer, the typical factor for C code being in the 3 to 4 range, and Objective-C often not affected much at all. What is clear is that with this type of performance difference, unoptimized debug builds are probably out of the question for any code that is even remotely performance sensitive: when a task taking 100 ms optimized would take almost 2 minutes in a debug build, you can’t really debug.


The use of strings as keys mentioned in the “Strings” section of this chapter is usually in conjunction with some sort of key-value store, and in Cocoa this is usually an NSDictionary. An NSDictionary is a hash table mapping from object keys to object values, so features average case constant O(k) read access time.3

However, the generic nature of the keys means that, as Table 3.5 and Table 3.6 show, k is a relatively large number in this case, 23 to 100 ns or 10 to 50 times slower than a message-send. Furthermore, NSDictionary requires primitive types to be wrapped in objects, with the performance consequences that were discussed in the “Primitive Types” section in this chapter.

Table 3.5 Cost of dictionary lookups by type of stored and lookup key, large key

Time to lookup by lookup key type (ns)

Stored keys

Constant string

Regular string

Mutable string


Constant string





Regular string





Table 3.6 Cost of dictionary lookups by type of stored and lookup key, small key

Time to lookup by lookup key type (ns)

Stored keys

Constant string

Regular string

Mutable string


Constant string





Tagged string





Table 3.5 shows the more general cost of dictionary access, which is around 80 ns per read if you don’t have hash collisions (when two or more keys map onto to the same slot in the hash table). A single collision adds another 20 ns or so. The only time that deviates from the roughly 80 ns standard is when you have constant strings both as the keys of the dictionary and the key to look up. In this case, the lookup can be more than twice as fast, probably due to the fact that constant strings can be compared for equality using pointer equality due to being uniqued.

For small keys up to 7 characters, the tagged pointer optimization introduced in OS X 10.10 also helps. As with constant strings, pointer comparison is sufficient here because the value is stored in the pointer, but only if both strings are of the same type, either both tagged pointers or both constant strings. Table 3.6 shows this effect: When the key classes match, both constant strings and tagged pointer strings take around 22 ns for a single lookup, but there is no benefit if the classes do not match.

So in order to get optimized dictionary performance, you need to make sure that the class of the key used to store the value into the dictionary and the class of the key used to retrieve the value match. If a string literal (@“key”) was used to store the value, it is best if a string literal is used to retrieve it.

If you cannot use a string literal on retrieval, your keys are short enough to fit in a tagged pointer string. And if you retrieve values more often than you store them, it may be helpful to convert the keys you use to store the values to tagged pointer strings as was shown in Example 3.3: first do a mutable copy of the original key and then a copy of the mutable copy. This will make your retrievals 2 to 4 times faster, depending on the circumstances. (The detour via a mutable copy is necessary because all immutable strings, including constant strings, will just return self when asked to copy themselves.)

Even with these optimizations for small and constant strings, it is therefore best to look for alternatives to NSDictionary when there is a chance that performance may be relevant, unless what is needed exactly matches the capabilities of NSDictionary. The vast majority of dictionary uses are much more specialized, for example, using only fixed strings as keys, and often having only a bounded and small set of relevant or possible keys. The XML parser in Chapter 4 uses two types of specialized dictionaries: MPWXMLAttributes for storing XML attributes that supports XML semantics such as ordering and multiple values for a key and is tuned for those use-cases, and the MPWSmallStringTable that maps directly from a predefined set of C strings to objects.

MPWSmallStringTable does not use a hash-table but operates directly on the byte-by-byte character representation, trying to eliminate nonmatching strings as quickly as possible. While it is also approximately 4 times faster than NSDictionary for the small constant string cases that NSDictionary is specially optimized for, its main use is in dealing with externally generated string values, and for this use-case it is anywhere from 5 to 15 times faster than NSDictionary.

Swift dictionaries, which are and use value types, and which benefit from generics, are obviously faster than heavyweight NSDictionary objects that use slow objects, right? Alas, that is currently not the case: In all my tests, Swift Dictionary access was significantly slower than even NSDictionary. For example, a [String:Int] map, which maps two value types and is therefore unencumbered by any legacy Objective-C objects and “slow” dynamic dispatch, took anywhere from 140 ns to 280 ns per lookup, depending mostly on whether the key was a string literal or was provided externally, respectively. This slowdown of 3 to 7 times, compared to NSDictionary (and 17 to 25 times compared to MPWSmallStringTable) was largely independent of compiler flags, though as typical of Swift, compiling without any optimization causes a significant slowdown.

The easiest alternative to NSDictionary is to just define objects and use plain messaging to access their contents, especially when the dictionary in question has a reasonably small and mostly fixed set of keys. Not only is the first line of Example 3.14 anywhere from 10 to 100 times faster, it is also a cleaner design because message names are scoped by their class, whereas dictionary keys pollute the global namespace and must therefore use unwieldy long names.

Example 3.14 One of these is 100 times faster

[myParagraph setLeading: 10.0];
[myParagraph setAttribute:[NSNumber numberWithFloat:10.0]

Why might one prefer to use a dictionary instead of an object? With the nonfragile instance variables of the Objective-C 2.0 64-bit runtime and associated storage, future-proofing against additional required instance variable is no longer an issue. Potentially sparsely populated objects can be handled by partitioning into one or more subobjects to which the corresponding message are delegated and that are allocated as needed.

As long as clients are provided with a messaging interface, the implementation can be varied and optimized to fit. While it is tempting to provide a key-value based interface instead, the flexibility it appears to offer is an illusion. Once an NSDictionary-like key-value interface is provided to clients, the performance characteristics are pretty much locked in, because mapping from NSString external keys to messages or instance variable offsets internally is just about as costly in terms of CPU usage as an NSDictionary proper. So instead, if an NSDictionary-based internal representation is desired, it can and probably should be wrapped in an object that maps its accessor messages to the dictionary.

The Macro in Example 3.15 allows you to add a messaging interface to a key in a dictionary either statically by writing dictAccessor( var, setVar, [self_myDict] ) in your implementation, where var is the key and [self_myDict] is an expression that returns the dict to be used, or dynamically at runtime, using the imp_implementationWithBlock() function to turn a block into a method implementation.

  • + Share This
  • 🔖 Save To Your Account