InformIT

A Taste of Erlang, a Dynamic, Asynchronous Message-Passing Language

Date: Jun 30, 2006

Article is provided courtesy of Prentice Hall Professional.

Return to the article

Erlang is a descendant of Prolog, intended for high-availability, high-performance applications. David Chisnall explains why he likes it, despite Erlang's occasional minor irritations.

Regular readers will know that I find most so-called "modern" programming languages to be somewhat underwhelming. Occasionally, though, one comes along that provides a nice contrast. One of these is Erlang.

Erlang was originally designed by Ericsson for high-availability, high-performance applications, such as telephone switchboards. Since Ericsson’s market is telecommunications, not programming languages, they released it as Free Software, including an extensive runtime library. Improvements have been made by others, including the University of Uppsala, which contributed most of the High-Performance Erlang (HiPE) branch.

Being commercially supported Free Software is a good start. The fact that Erlang has already been used in real-world projects of significant scale is another point in its favor. This article is not intended as an in-depth tutorial on Erlang, but rather as a taste of what’s possible with the language.

’Scalable’ Is the New ’Fast’

It used to be that if your code ran quite quickly, a year later you could run it twice as fast for the same amount of money. These days, you’re much less likely to get a chip that’s twice as fast, but you may get one with twice as many cores. If your code is highly parallel, you can just spread it out a bit more.

To provide some context, the machine I was using as my testbed for the last project I wrote in Erlang was a 64-processor MIPS box, already several years old. A pair of Sun T1 chips can have as many active contexts as this machine, and eight of them can have as many actually executing as this old machine—and doing it faster. Within a few years, it seems likely that laptops will emerge with similar abilities to this old behemoth.

Two features of Erlang, both of which are built into the language, make it especially suited to writing scalable applications: process creation and message passing.

Erlang is interpreted on all platforms, but can be just-in-time compiled on x86 and SPARC. In the worst case, algorithms implemented in Erlang run at about one-tenth the speed of their C counterparts. Distributed algorithms can run faster. Yet Another Web Server (YAWS), written in Erlang, has been shown to be able to handle more concurrent connections than Apache on the same machine, and the ejabberd XMPP server is very well-regarded in the community.

First Glances

Anyone who has used Prolog will find Erlang syntax eerily familiar. This is no accident; the first version of Erlang was written as a meta-language on top of Prolog. Some traits inherited from Prolog include atoms (textual identifiers), variables starting with a capital letter, statements being terminated with a full stop, and the list syntax.

There are two data structures in Erlang: lists and tuples. Lists may be of any length, while tuples have a fixed size. Strings are implemented as lists of ASCII characters. As with most Prolog dialects, there is some syntactic sugar for dealing with strings. Compare the following two representations of the same string:

"This is a string"
[84,104,105,115,32,105,115,32,97,32,115,116,114,105,110,103]

One slightly unusual feature of Erlang is that it’s a single-assignment language. Once a variable has been bound to a value, it cannot be modified. This is how a modern compiler works internally (a programmer-visible variable will be represented by a number of variables in an intermediate representation), but it’s relatively unusual for this to be exposed in the language.

The Bit Syntax

One of my favorite parts of the language is the bit syntax. Erlang supports a binary type, which can contain any binary data. This arrangement is very useful when writing network code, because any range of bits can be accessed easily. The bit syntax calls for a list of numbers in double angle brackets. The following example is three bytes long, with the bytes set to 1, 2, and 3, respectively:

<<1,2,3>>

By itself, this isn’t very useful; it’s little more than an array of 8-bit values, after all. Each of these values can have a length specifier appended, however, telling Erlang how much space to take up with the values. The following two representations are equivalent on a big-endian architecture:

<<1:32,2,3>>
<<0,0,0,1,2,3>>

Remember that I said "on a big-endian architecture." That’s important. If we’re dealing with network code, we’re likely to want to swap things between network and host byte order. Pulling data off a TCP connection and onto an x86 machine will require that the data be converted from big-endian to little-endian format. On Erlang, the following two forms are equivalent:

<<1:32/little,2,3>>
<<1,0,0,0,2,3>>

On top of this, the bit syntax supports pattern matching, so you can use partially instantiated binary objects in function definitions and case statements.

Pattern Matching

Pattern matching is where Prolog-like languages really come into their own, and Erlang is no exception. Most modern languages support some form of parametric polymorphism based on the types of arguments to a function. Erlang supports them based on the values.

Consider a function in a calculator that takes three arguments—an operation and two values. It then returns the result of applying the correct operation to the two values. In Erlang, it could be implemented something like this:

evaluate(add,A,B) ->
    A + B
;
evaluate(subtract,A,B) ->
    A - B
;
evaluate(multiply,A,B) ->
    A * B
;
evaluate(divide,A,B) ->
    A / B.

This example is slightly contrived, but it illustrates some of the power of the languages. Note the use of the semicolon (;) as a separator. In Prolog, a semicolon is a logical OR operation, and its meaning here is similar: Try the first version, and if that doesn’t work, try the next one. Note also the lack of an explicit return statement. Erlang functions (and blocks) return the value of the last line.

Pattern matching really starts to get fun when combined with the binary type. Consider an application that receives packets from a network and then processes them. The four bytes in a packet might be a network byte-order packet type identifier. In Erlang, you would just need a single processPacket function that could convert this into a data structure for internal processing. It would look something like this:

processPacket(<<1:32/big,RestOfPacket>>) ->
    % Process type one packets
    ...
;
processPacket(<<2:32/big,RestOfPacket>>) ->
    % Process type two packets
    ...

Process Creation

As I mentioned earlier, Erlang is designed for highly concurrent systems. The spawn function executes a given function in parallel. The ! primitive is used to send messages. The syntax is very simple:

Process ! message.

This instruction sends message to the specified Process, a pid value returned by spawn. Note that the syntax is the same, irrespective of whether Process is running on the local machine, another machine in a cluster, or another machine somewhere on the Internet. The message can be any type understood by Erlang, but it’s common to send a tuple, the first value of which is the type of the message. This technique allows pattern matching to be used to filter the messages.

The receive statement is used to get the next message from the message queue. It supports the same pattern-matching syntax as the rest of Erlang. The following snippet shows a fragment of code used to reply to pings and log ping replies:

receive
    {ping, {Sender,Sent}} ->
       Sender ! {ack, {Sent, now()}}
    ;
    {ack, {Sent, Received}
       logRoundTrip(Sent, Received}
end

As before, the message type is used for pattern matching. Recall that variables begin with capital letters; the values of Sender, Sent, and Received are filled in when a message matching the stated part of the pattern is received.

Parting Thoughts

Erlang is far from perfect. The biggest limitation I’ve encountered is that there’s no neat way of doing a remote function call. While asynchronous programming is easier to scale, you sometimes need to send a message and wait for the result before you can proceed. Ideally, you would save the state related to the current message and proceed with processing the next one, but at the very least you should be able to wait for a specific message and then grab it from the message queue while still processing another message.

In Erlang, there’s no sensible way of doing this. You need to create a meta-message-loop that takes the messages from the message queue and then processes them in order, but allows them to be accessed out of order. This is something you might want to do often, which brings me to the second major limitation of Erlang.

There is only a very limited scope for meta-programming. While Lisp programmers look down on C for having very primitive macro support, even C programmers can look down on Erlang’s macro capability, which isn’t useful for much more than defining constants.

Overall, Erlang is a nice language. There are a few minor irritations, but the majority of the language is nice to work with. Anyone looking for a change of style, or a language for a large distributed project, would do well to download Erlang and play with it for a bit. The documentation is relatively good, and should be enough for a new user to get to know the system. Where it isn’t, the Erlang interpreter is very good for trying out things to check your understanding of how they work.

800 East 96th Street, Indianapolis, Indiana 46240