How to Create a Program: a Small Software Project in C++
- Design Specification
- Stubbed Implementation
- Expanding Stubs
- Final Implementation (Not!)
When you write a program, you don't ask, "How can I use a particular language feature?" You ask, "What language feature will help me solve my problem?"
Programmers are rarely totally satisfied with their code. After I wrote the program for my article on polymorphisma simple parser-calculatorI immediately noticed that I could have done it better. I felt that the top-level view of the program was not very clear. The nodes of the parsing tree needed access to components such as the Store (the calculator's "memory"), the symbol table (to print meaningful error messages), and maybe even to the function table (to evaluate built-in functions). I could have made all these objects global and thus known to everybody, or passed references to them to all the nodes of the tree.
The order of construction of these objects was important, so I felt that maybe they should be combined into one high-level object, the Calculator. The temptation to redesign and rewrite the whole program was strong. Maybe after the nth iteration I could come up with something close to the ideal?
Then I decided that I shouldn't do it. Part of making a program is coming up with not-so-good solutions and then improving on them. I would have cheated if I had come up with the best, and then reverse-engineered it to fit the top-down design and implementation process. So here it isthe actual, real-life process of creating a program.
The purpose of the calculator program is to accept arithmetic expressions from the user, evaluate them, and display the results. The expressions are to be parsed by a simple top-down recursive descent parser (if you have no idea what I'm talking about, don't panicit's just a name for what I'm going to describe in detail later).
In principle, the type of parser is an implementation detail and as such would not normally be mentioned in the functional specification. In our case, however, the type of parser (unfortunately) influences the choice of the grammar, which is part of the functional specification.
The parser's goal is to convert the input string into an arithmetic tree. The simple grammar accepted by the parser is defined as follows.
The parser is looking for an expression:
An expression is a term followed by a plus or a minus sign, which is followed by another expression. If an expression doesn't contain any plus or minus signs, it's equal to a term.
A term is a factor multiplied or divided by another term. If a term doesn't contain any multiplication or division operators, it's equal to a factor.
A factor can be any of the following:
An identifier corresponding to a variable
A minus sign followed by a factor (unary minus)
A whole expression enclosed in parentheses
For instance, the expression 1 + x * (2 - y) is parsed as shown in Figure 1.
Figure 1 The parsing of an arithmetic expression. The arrows with numbers correspond to the rules of grammar that are used to make each parsing step.
This grammar is simple and natural, but it has one flawthe arithmetic operators are right-associative instead of being left-associative. That means, for example, that a - b + c will be parsed as a - (b + c), which is not exactly what we'd expect. There are standard ways of fixing this grammar or modifying the parser, but that's beyond the scope of this section. (We'll revisit this issue later.)
Since we want our calculator to be able to use variables, we need a symbol table. A symbol table remembers names of variables. We'll also need to be able to assign values to variables. We can do it by expanding the definition of expression to include the following clause:
An expression can also be a term followed by the equal sign followed by another expression.
Not every term is appropriate on the left side of an assignment. It has to be an lvalue something that can be on the left side of an assignment. In our case, the only allowed type of lvalue will be an identifier corresponding to a variable. Similarly, anything that can be put on the right side of the assignment is called an rvalue. In our case, any expression is a valid rvalue.
You can also derive lvalue from "location value," since it represents a location where a value can be stored, and rvalue from "read value," since a value can be read from it.
We'll also introduce a scanner, which will convert the input string into a series of tokens. It will recognize arithmetic operators, numbers, and identifiers. This way, the parser will have an easier jobmatching sequences of tokens to grammatical productions.