 1. Parse Tree
2. C Digression
• Print
Programming expert Bartosz Milewski explains the use of polymorphism in a simple example of building an arithmetic tree using various kinds of nodes. This example will be used in a later article in the development of a symbolic calculator.
This article is excerpted from Milewski's book, C++ In Action: Industrial-Strength Programming Techniques (Addison-Wesley, ISBN 0-201-69948-6), a book that teaches professional programming in C++.

Parse Tree

This article demonstrates the use of polymorphism in an example of a very useful data structure—an arithmetic tree. (This example will be used in a later article in the development of a symbolic calculator.) An arithmetic expression can be converted into a tree structure whose nodes are arithmetic operators and leaf nodes are numbers. Figure 1 shows a tree that corresponds to the expression 2 * (3 + 4) + 5. Analyzing it from the root toward the leaves, we first encounter the plus node, whose children are the two terms that are to be added. The left child is a product of two factors. The left factor is the number 2 and the right factor is the sum of 3 and 4. The right child of the top-level plus node is the number 5. Notice that the tree representation doesn't require any parentheses or the knowledge of operator precedence. It uniquely describes the calculation to be performed.

Figure 1 The arithmetic tree corresponding to the expression 2 * (3 + 4) + 5.

We will represent all nodes of the arithmetic tree as objects inheriting from the class Node. The direct descendants of Node are NumNode, which represents a number, and BinNode, which represents a binary operator. For simplicity, let's restrict ourselves to only two classes derived from BinNode: AddNode and MultNode. Figure 2 shows the class hierarchy I have just described. Abstract classes are classes that cannot be instantiated; they only serve as parents for other classes. I'll explain this term in more detail in a moment.

Figure 2 The class hierarchy of nodes.

What are the operations we would like to perform on a node? We want to be able to calculate its value and, at some point, destroy it. The Calc method returns a double as the result of calculating the node's value. Of course, for some nodes the calculation may involve the recursive calculations of its children. This method is const because it doesn't change the node itself. Since each type of node has to provide its own implementation of the Calc method, we make this function virtual. However, there is no "default" implementation of Calc for an arbitrary Node. A function that has no implementation (inherited or otherwise) is called a pure virtual function. That's the meaning of = 0 in the declaration of Calc:

class Node
{
public:
virtual ~Node () {}
virtual double Calc () const = 0;
};

A class that has one or more pure virtual functions is called an abstract class, and it cannot be instantiated (no object of this class can be created). Only classes that are derived from it, and that provide their own implementations of all the pure virtual functions, can be instantiated. Notice that our sample arithmetic tree has instances of AddNodes, MultNodes, and NumNodes, but no instances of Nodes or BinNodes.

A common rule is that if a class has a virtual function, it probably needs a virtual destructor as well—and once we decide to pay the overhead of a vtable pointer, subsequent virtual functions will not increase the size of the object. So, in such a case, adding a virtual destructor doesn't add any significant overhead.

In our case, we can anticipate that some of the descendant nodes will have to destroy their children in their destructors, so we really need a virtual destructor. A destructor can be made into a pure virtual function, but we would have to override it in every derived class, whether it needs it or not. That's why I gave it an empty body. (Even though I made it inline, the compiler will create a function body for it, because it needs to stick a pointer for it into the virtual table.)

NumNode stores a double value that is initialized in its constructor (see Listing 1). NumNode also overrides the Calc virtual function. In this case, Calc simply returns the value stored in the node.

Listing 1 Class NumNode (Numeric Node)

class NumNode: public Node
{
public:
NumNode (double num) : _num (num ) {}
double Calc () const;
private:
const double _num;
};

double NumNode::Calc () const
{
std::cout << "Numeric node " << _num << std::endl;
return _num;
}

BinNode has two children that are pointers to (abstract) Nodes. They are initialized in the constructor and deleted in the destructor—this is why I could make them const pointers (see Listing 2). The Calc method is still a pure virtual function, inherited from Node; only the descendants of BinNode will know how to implement it.

Listing 2 Class BinNode (Binary Node)

class BinNode: public Node
{
public:
BinNode (Node * pLeft, Node * pRight)
: _pLeft (pLeft), _pRight (pRight) {}
~BinNode ();
protected:
Node * const _pLeft;
Node * const _pRight;
};

BinNode::~BinNode ()
{
delete _pLeft;
delete _pRight;
}

This is where you first see the advantage of polymorphism. A binary node can have children that are arbitrary nodes. Each of them can be a number node, an addition node, or a multiplication node. There are nine possible combinations of children—it would be silly to make separate classes for each of them—consider this, for instance:

We have no choice but to accept and store pointers to children as more general pointers to Nodes. Yet, when we call destructors through them, we need to call different functions to destroy different Nodes. For instance, AddNode has a different destructor than NumNode (which has an empty one), and so on. This is why we have to make the destructors of Nodes virtual.

Notice that the two data members of BinNode are not private—they are protected. This qualification is slightly weaker than private. A private data member or method cannot be accessed from any code outside the implementation of the given class (or its friends)—not even from the code of the derived class. If we made _pLeft and _pRight private, we'd have to provide public methods to set and get them. That would be tantamount to exposing them to everybody. By making them protected, we let classes derived from BinNode manipulate them, but, at the same time, bar anybody else from doing so. Table 1 presents a short description of all three access specifiers. Incidentally, the same access specifiers are used in declaring inheritance. By far the most useful type of inheritance is public, and that's what we've been using so far, but if you'd like to restrict the access to the base class, you're free to use the other two types of inheritance

Table 1 The Meaning of the Three Access Specifiers

 Access Specifier Who Can Access the Members? public Anybody protected The class itself, its friends, and derived classes private Only the class itself and its friends

The class AddNode is derived from BinNode:

class AddNode: public BinNode
{
public:
AddNode (Node * pLeft, Node * pRight)
: BinNode (pLeft, pRight) {}
double Calc () const;
};

AddNode provides its own implementation of Calc. This is where you see the advantages of polymorphism again. We let the child nodes calculate themselves. Since the Calc method is virtual, they will do the right thing based on their actual class, and not on the class of the pointer (Node *). The two results of calling Calc are added and the sum returned.

double AddNode::Calc () const
{
return _pLeft->Calc () + _pRight->Calc ();
}

Notice how the method of AddNode directly accesses its parent's protected data members _pLeft and _pRight. Again, were they declared private, such access would be flagged as an error by the compiler.

For completeness, Listing 3 shows the implementation of MultNode and a simple test program.

Listing 3 Class MultNode (Multiplication Node) and a Simple Test Program

class MultNode: public BinNode
{
public:
MultNode (Node * pLeft, Node * pRight)
: BinNode (pLeft, pRight) {}
double Calc () const;
};

double MultNode::Calc () const
{
std::cout << "Multiplying\n";
return _pLeft->Calc () * _pRight->Calc ();
}

int main ()
{
// ( 20.0 + (-10.0) ) * 0.1
Node * pNode1 = new NumNode (20.0);
Node * pNode2 = new NumNode (-10.0);
Node * pNode3 = new AddNode (pNode1, pNode2);
Node * pNode4 = new NumNode (0.1);
Node * pNode5 = new MultNode (pNode3, pNode4);
std::cout << "Calculating the tree\n";
// tell the root to calculate itself
double x = pNode5->Calc ();
std::cout << "Result: " << x << std::endl;
delete pNode5; // and all children
}

Do you think you can write more efficient code by not using polymorphism? Think twice! If you're still not convinced, go on a little side trip into the alternative universe of C.