Home > Articles > Programming

  • Print
  • + Share This
From the author of String Theory

String Theory

A huge class of cargo cult programming mistakes come from taking a pattern from one language and trying to apply it in another. I recently came across something like this as a mechanism for testing two strings for equality in C:

int streq(char *a, char *b)
{
    if (strlen(a) != strlen(b))
    {
        return 0;
    }
    return strcmp(a, b) == 0;
}

This is semantically valid and, at first glance, doesn't even look too ridiculous. It first checks whether the two strings are the same length. Then it goes to compare the two strings. This even makes sense when you consider the semantics of strcmp(), which has to return the ordering of the two strings, so it must search along them for the length of their common prefix, and can't skip the test if the two strings are of different lengths, and therefore obviously different.

It becomes clear that it is nonsense, however, when you remember how C implements strings. In C, a string is nothing more than a NULL-terminated array of characters. Its length is not stored anywhere; it must be calculated by iterating over the entire string until you encounter a 0 byte. The first line of this scans both strings, comparing their values with 0. It must scan the entire length of both strings. In contrast, the strcmp() call only needs to scan the common prefix. In the worst case, it will scan the entire length of the shorter string; in the best case, it will only check the first byte of each.

So, obviously bad code, but why is this an example of cargo cultism? C's implementation of strings is quite unusual. Strings in most other languages—including C++'s std::string and Objective-C's NSString—store their length. In almost any other language, this design would be fast. In C, it isn't.

While this is quite an extreme example, it’s worth remembering it when you move to a new programming language. The approaches that made sense in one often don't make sense in another. This is especially true when switching language paradigms: something that makes sense in C probably won't in Haskell or Erlang.

  • + Share This
  • 🔖 Save To Your Account