Home > Articles > Open Source > Ajax & JavaScript

  • Print
  • + Share This
This chapter is from the book

Item 6: Learn the Limits of Semicolon Insertion

One of JavaScript’s conveniences is the ability to leave off statement-terminating semicolons. Dropping semicolons results in a pleasantly lightweight aesthetic:

function Point(x, y) {
    this.x = x || 0
    this.y = y || 0
}

Point.prototype.isOrigin = function() {
    return this.x === 0 && this.y === 0
}

This works thanks to automatic semicolon insertion, a program parsing technique that infers omitted semicolons in certain contexts, effectively “inserting” the semicolon into the program for you automatically. The ECMAScript standard precisely specifies the semicolon insertion mechanism, so optional semicolons are portable between JavaScript engines.

But similar to the implicit coercions of Items 3 and 5, semicolon insertion has its pitfalls, and you simply can’t avoid learning its rules. Even if you never omit semicolons, there are additional restrictions in the JavaScript syntax that are consequences of semicolon insertion. The good news is that once you learn the rules of semicolon insertion, you may find it liberating to drop unnecessary semicolons.

The first rule of semicolon insertion is:

Semicolons are only ever inserted before a } token, after one or more newlines, or at the end of the program input.

In other words, you can only leave out semicolons at the end of a line, block, or program. So the following are legal functions:

function square(x) {
    var n = +x
    return n * n
}
function area(r) { r = +r; return Math.PI * r * r }
function add1(x) { return x + 1 }

But this is not:

function area(r) { r = +r return Math.PI * r * r } // error

The second rule of semicolon insertion is:

Semicolons are only ever inserted when the next input token cannot be parsed.

In other words, semicolon insertion is an error correction mechanism. As a simple example, this snippet:

a = b
(f());

parses just fine as a single statement, equivalent to:

a = b(f());

That is, no semicolon is inserted. By contrast, this snippet:

a = b
f();

is parsed as two separate statements, because

a = b f();

is a parse error.

This rule has an unfortunate implication: You always have to pay attention to the start of the next statement to detect whether you can legally omit a semicolon. You can’t leave off a statement’s semicolon if the next line’s initial token could be interpreted as a continuation of the statement.

There are exactly five problematic characters to watch out for: (, [, +, -, and /. Each one of these can act either as an expression operator or as the prefix of a statement, depending on the context. So watch out for statements that end with an expression, like the assignment statement above. If the next line starts with any of the five problematic characters, no semicolon will be inserted. By far, the most common scenario where this occurs is a statement beginning with a parenthesis, like the example above. Another common scenario is an array literal:

a = b
["r", "g", "b"].forEach(function(key) {
    background[key] = foreground[key] / 2;
});

This looks like two statements: an assignment followed by a statement that calls a function on the strings "r", "g", and "b" in order. But because the statement begins with [, it parses as a single statement, equivalent to:

a = b["r", "g", "b"].forEach(function(key) {
    background[key] = foreground[key] / 2;
});

If that bracketed expression looks odd, remember that JavaScript allows comma-separated expressions, which evaluate from left to right and return the value of their last subexpression: in this case, the string "b".

The +, -, and / tokens are less commonly found at the beginning of statements, but it’s not unheard of. The case of / is particularly subtle: At the start of a statement, it is actually not an entire token but the beginning of a regular expression token:

/Error/i.test(str) && fail();

This statement tests a string with the case-insensitive regular expression /Error/i. If a match is found, the statement calls the fail function. But if this code follows an unterminated assignment:

a = b
/Error/i.test(str) && fail();

then the code parses as a single statement equivalent to:

a = b / Error / i.test(str) && fail();

In other words, the initial / token parses as the division operator!

Experienced JavaScript programmers learn to look at the line following a statement whenever they want to leave out a semicolon, to make sure the statement won’t be parsed incorrectly. They also take care when refactoring. For example, a perfectly correct program with three inferred semicolons:

a = b    // semicolon inferred
var x    // semicolon inferred
(f())    // semicolon inferred

can unexpectedly change to a different program with only two inferred semicolons:

var x    // semicolon inferred
a = b    // no semicolon inferred
(f())    // semicolon inferred

Even though it should be equivalent to move the var statement up one line (see Item 12 for details of variable scope), the fact that b is followed by a parenthesis means that the program is mis-parsed as:

var x;
a = b(f());

The upshot is that you always need to be aware of omitted semicolons and check the beginning of the following line for tokens that disable semicolon insertion. Alternatively, you can follow a rule of always prefixing statements beginning with (, [, +, -, or / with an extra semicolon. For example, the previous example can be changed to protect the parenthesized function call:

a = b    // semicolon inferred
var x    // semicolon on next line
;(f())   // semicolon inferred

Now it’s safe to move the var declaration to the top without fear of changing the program:

var x    // semicolon inferred
a = b    // semicolon on next line
;(f())   // semicolon inferred

Another common scenario where omitted semicolons can cause problems is with script concatenation (see Item 1). Each file might consist of a large function call expression (see Item 13 for more about immediately invoked function expressions):

// file1.js
(function() {
    // ...
})()

// file2.js
(function() {
    // ...
})()

When each file is loaded as a separate program, a semicolon is automatically inserted at the end, turning the function call into a statement. But when the files are concatenated:

(function() {
    // ...
})()
(function() {
    // ...
})()

the result is treated as one single statement, equivalent to:

(function() {
    // ...
})()(function() {
    // ...
})();

The upshot: Omitting a semicolon from a statement requires being aware of not only the next token in the current file, but any token that might follow the statement after script concatenation. Similar to the approach described above, you can protect scripts against careless concatenation by defensively prefixing every file with an extra semicolon, at least if its first statement begins with one of the five vulnerable characters (, [, +, -, or /:

// file1.js
;(function() {
    // ...
})()

// file2.js
;(function() {
    // ...
})()

This ensures that even if the preceding file omits its final semicolon, the combined results will still be treated as separate statements:

;(function() {
    // ...
})()
;(function() {
    // ...
})()

Of course, it’s better if the script concatenation process adds extra semicolons between files automatically. But not all concatenation tools are well written, so your safest bet is to add semicolons defensively.

At this point, you might be thinking, “This is too much to worry about. I’ll just never omit semicolons and I’ll be fine.” Not so: There are also cases where JavaScript will forcibly insert a semicolon even though it might appear that there is no parse error. These are the so-called restricted productions of the JavaScript syntax, where no newline is allowed to appear between two tokens. The most hazardous case is the return statement, which must not contain a newline between the return keyword and its optional argument. So the statement:

return { };

returns a new object, whereas the code snippet:

return
{ };

parses as three separate statements, equivalent to:

return;
{ }
;

In other words, the newline following the return keyword forces an automatic semicolon insertion, which parses as a return with no argument followed by an empty block and an empty statement. The other restricted productions are

  • A throw statement
  • A break or continue statement with an explicit label
  • A postfix ++ or -- operator

The purpose of the last rule is to disambiguate code snippets such as the following:

a
++
b

Since ++ can serve as either a prefix or a suffix, but the latter cannot be preceded by a newline, this parses as:

a; ++b;

The third and final rule of semicolon insertion is:

Semicolons are never inserted as separators in the head of a for loop or as empty statements.

This simply means that you must always explicitly include the semicolons in a for loop’s head. Otherwise, input such as this:

for (var i = 0, total = 1 // parse error
     i < n
     i++) {
    total *= i
}

results in a parse error. Similarly, a loop with an empty body requires an explicit semicolon. Otherwise, leaving off the semicolon results in a parse error:

function infiniteLoop() { while (true) } // parse error

So this is one case where the semicolon is required:

function infiniteLoop() { while (true); }

Things to Remember

  • Semicolons are only ever inferred before a }, at the end of a line, or at the end of a program.
  • Semicolons are only ever inferred when the next token cannot be parsed.
  • Never omit a semicolon before a statement beginning with (, [, +, -, or /.
  • When concatenating scripts, insert semicolons explicitly between scripts.
  • Never put a newline before the argument to return, throw, break, continue, ++, or --.
  • Semicolons are never inferred as separators in the head of a for loop or as empty statements.
  • + Share This
  • 🔖 Save To Your Account