Home > Articles > Programming > Java

Languages, Verbosity, and Java

  • Print
  • + Share This
  • 💬 Discuss
Like this article? We recommend

Like this article? We recommend

With the new spate of programming languages emerging for the Java virtual machine and other platforms, it's more important than ever that the rules of a language make code clear and concise. But clarity and conciseness don't exactly go hand in hand. Dhanji R. Prasanna compares some of these new languages with a popular mainstay, Java, contrasts their expressiveness, and explains why Java is verbose for a reason.

I learned Java in a short summer course right after graduating from high school. Since then, I have programmed with Java off and on for nearly 12 years, most recently at Google (which I represented on several Java expert groups) and a short consulting stint at the payments startup Square. I enjoy programming in Java. I'm not one of those engineers who bemoans Java's various idiosyncrasies around the coffee machine (although I occasionally enjoy doing that). I have an unabashed love for the language and platform and all the engineering power it represents.

Java is verbose—full of seemingly unnecessary repetitions; lengthy, overwrought conventions; and general syntax excessiveness. This isn't really news; Java was conceived as a subset of C++, which itself derives from C, a language that's over 30 years old and not particularly known for being concise.

As a platform, however, Java is modern and genuinely competitive. The combination of a robust garbage collector, blazing fast virtual machine, and a battery of libraries for just about every task has made it the perfect launchpad for a plethora of products and new hosted languages. (Interestingly, Google's V8 is following a similar pattern.)

Expressiveness

"ProducerConstructorFactoryFactory" jokes notwithstanding, there is little doubt that the Java language suffers from a poor character-to-instruction ratio. I call this property "expressiveness"—in other words, the number of keys you must press in order to accomplish a simple task. This number is pretty large in Java. It repeatedly violates the "don't repeat yourself" (DRY) principle, and many of its modern features (such as Generics) feel lumbering and unwieldy, making reading and understanding source code a tedious task.

Comprehending unfamiliar source code—perhaps including your own source code after a few weeks of neglect—is probably the most important thing a professional engineer does. So tedium in this task is genuinely painful, and it generally describes an unhealthy state of affairs. As a result, many new languages are designed with the problem of expressiveness in mind. Ruby and Python led this trend in relation to C, and Scala, Fantom, Mirah, Groovy, and so on continue it on the Java virtual machine. They have achieved remarkable results, as you can see by comparing Listing 1 with Listing 2.

Listing 1—Java code for determining whether a string contains numeric characters.

    boolean numeric = false;
    for (int i = 0; i < string.length(); ++i) {
      if (Character.isDigit(string.charAt(i))) {
        numeric = true;
        break;
      }
    }

Listing 2—Equivalent Scala code is much more expressive.

val numeric = string.exists(_.isDigit)

This simplicity is wonderful news for all those insurance companies processing claims for repetitive stress injury (RSI) from programmers. We can do the same thing in far fewer lines of code, and in some cases the savings are over an order of magnitude! So have we solved the verbosity problem? Well, yes and no.

Readability

Something that Josh Bloch once said has always stuck with me:

A little redundancy in a language is a good thing. It's important for readability.

Josh is Google's Chief Java Architect, but before that he spent years maintaining Java, was responsible for Java 5 language features, and created wonderful little tools like java.util.Collection and family. (He's also great at quotable little one-liners.)

As I surveyed the landscape of modern programming languages, I was struck by the wisdom of this statement. Many have made incredible strides in expressiveness, but fewer words to read doesn't always mean improved readability. In fact, in some cases expressiveness can be downright bad for readability. Consider the example in Listing 3.

Listing 3—Scala code to sum a list, using the fold operator.

val ls = List(1, 2, 3, 4, 5)
(0 /: ls) {_+_}

This code is gibberish if you don't understand that /: is a symbol that stands for the foldLeft operation, among other things. While it's difficult to read, it's still extremely powerful. This is the most compact way to sum a list (that I can think of) without custom functions. There certainly isn't anything like it in Java. However, even if you understand the symbols, it's not exactly a breeze to read, and this is only a simple example.

The problem is that when you're reading code like this, you must mentally substitute the expanded construction for every compressed symbol (/: -> foldLeft). This requirement has the unintended effect of slowing down your reading speed—especially if Scala isn't your primary day-to-day language.

If you have to go through a lot of code that looks like this, things can get tedious rather quickly. Some people refer to this phenomenon as language density.

Of course, for Scala experts, this is probably quite natural and not at all tedious. My intent is not to criticize Scala, but rather to illustrate the spectrum of syntax—from the very verbose to the very terse—and its concomitant effects on readability.

What's interesting is that these languages are solving the verbosity problem, but they're attacking it by improving writability, not necessarily readability.

Grammar and Syntax

Another effect of concise syntax is increasing complexity of the grammar. Java's grammar has an interesting property: Nearly any expression can be matched to a specific production (roughly, parsing rule), free of its surrounding context:

happy happy(happy happy) {
  happy.happy.happy(happy);
}

Anyone familiar with Java will have no trouble identifying each production in this code. It's obviously a method that returns type happy, taking an argument named happy of the same type, and so on. Even the potential ambiguity of the dot-syntax is a non-issue:

happy.happy;

is an incomplete production and thus a syntax error—you don't even need to invoke the compiler.

The neat thing is that a single statement doesn't require any surrounding context to identify which rule it matches, if any. For example:

happy.happy.happy();

is obviously a method invocation on a dereferenced variable. Similarly, referring to a package name and class is clear:

h = happy.happy.class;

Here the leftmost happy is obviously a package, and the rightmost is a class. [1] This preciseness is a remarkable property of the language, and it's still more remarkable that this feature has been preserved across so many versions of Java. It may not be immediately apparent, but this strict adherence to grammatical rigor has many benefits to readability.

Of course, Java also benefits from being syntactically similar to C++, which is buried deep within the collective consciousness of most programmers. Even so, Java has readily jettisoned syntax where potential ambiguity was a danger; operator overloading is a good example.

Ambiguity

Consider the same dimension in modern (and decidedly more expressive) languages. Many of them fall woefully short, as shown in Listing 4:

Listing 4—Ruby code illustrating syntax ambiguity.

happy.happy.happy

Does this code represent variables being dereferenced? Methods being called? Or something else? It's impossible to tell without the surrounding context. I don't mean to pick on Ruby; this is just a symptom of trading keystrokes for expressive power. In Python, the first happy could even refer to a module (analogous to a Java package). Similar syntactic ambiguities exist in statically typed languages, too.

At face value, this problem isn't really awful. What's so bad about looking around a statement to see what it's about? Nobody reads single statements in isolation, anyway. Yes, down to each individual case it's not a problem. But taken in aggregate, these decisions lead to a nontrivial increase in reading complexity. Throw in the optional mixture of infix and postfix operators, and things start to get messy.

A bias toward writability also engenders subtle pains. Consider these two code snippets in the popular language CoffeeScript:

http.createServer (request, response) ->
  ...


http.createServer(request, response) ->
  ...

The only difference is a space, but the option to invoke methods without parentheses creates a subtle bug with rather large consequences. Both forms are valid, but only the first one creates a server. The space tells CoffeeScript, that rather than being a function call with two arguments, it's a function call with a closure that takes two arguments.

Now look at these two examples when compiled into JavaScript:

http.createServer(function(request, response) {
  ...
});


http.createServer(request, response) (function() {
  ...
});

Despite being slightly more verbose, this example is much clearer to the untrained eye. Even to the trained one, I imagine spotting problems with the latter format is significantly easier. Remember Josh Bloch's words: A little redundancy in a language is a good thing.

Evolution

I love CoffeeScript. I like Ruby. I really enjoy reading about the intricacies of Scala's type system and learning about its dazzling array of brilliant, expressive features. Scala brings many of Haskell's powerful and sometimes obscure features to the mainstream in an accessible, pragmatic fashion. I believe that every one of these languages is an important, bold, and laudable attempt at pushing forward the edge of software engineering.

So what does this mean for verbosity? Are we always going to be stuck with it? Or do we have to trade expressiveness for readability? I'm not nearly so pessimistic. One of my favorite languages, Scheme, is incredibly expressive and readable. As a dialect of Lisp, it has a context-free grammar that's very simple—everything is a list of symbols, called S-Expressions. This approach is concise, and it requires fewer lines than Java does to achieve similar results. Of course, being more than 50 years old, the syntax shows some signs of age; it doesn't really work with object-oriented constructions, and there are those parentheses.

But on the whole it's instructive. If Lisp managed such dexterity decades ago, I'm optimistic for the future.

No one should walk away from this article thinking that the modern language diaspora is a bad thing. The evolution-by-degrees we're seeing now is thrilling and exciting. Perhaps one of these languages will gain enough of a foothold that we'll become used to its syntactic idiosyncrasies, reading it with natural ease. Or perhaps there will always be an evolving, frothing frontier to challenge and provoke us—to read better, write better, and create better languages to express ourselves.

Footnotes

[1] This example could also refer to an inner and outer class combination, but that serves effectively the same purpose (namespacing) as described.

Read Dhanji R. Prasanna at http://rethrick.com/about or find him on Twitter at http://twitter.com/dhanji.

  • + Share This
  • 🔖 Save To Your Account

Discussions

comments powered by Disqus