## Parse Tree

This article demonstrates the use of polymorphism in an example of a very useful
data structure—an arithmetic tree. (This example will be used in a later
article in the development of a symbolic calculator.) An arithmetic expression
can be converted into a tree structure whose nodes are arithmetic operators
and leaf nodes are numbers. Figure
1 shows a tree that corresponds to the expression *2 * (3 + 4) + 5*.
Analyzing it from the root toward the leaves, we first encounter the plus node,
whose children are the two terms that are to be added. The left child is a product
of two factors. The left factor is the number 2 and the right factor is the
sum of 3 and 4. The right child of the top-level plus node is the number 5.
Notice that the tree representation doesn't require any parentheses or
the knowledge of operator precedence. It uniquely describes the calculation
to be performed.

**Figure 1 **The arithmetic
tree corresponding to the expression 2 * (3 + 4) + 5.

We will represent all nodes of the arithmetic tree as objects inheriting from
the class `Node`. The direct descendants of `Node` are `NumNode`,
which represents a number, and `BinNode`, which represents a binary operator.
For simplicity, let's restrict ourselves to only two classes derived from
`BinNode`: `AddNode` and `MultNode`. Figure
2 shows the class hierarchy I have just described. Abstract classes are
classes that cannot be instantiated; they only serve as parents for other classes.
I'll explain this term in more detail in a moment.

**Figure 2 **The class hierarchy
of nodes.

What are the operations we would like to perform on a node? We want to be
able to calculate its value and, at some point, destroy it. The `Calc`
method returns a `double` as the result of calculating the node's
value. Of course, for some nodes the calculation may involve the recursive
calculations of its children. This method is `const` because it
doesn't change the node itself. Since each type of node has to provide its
own implementation of the `Calc` method, we make this function virtual.
However, there is no "default" implementation of `Calc` for an
arbitrary `Node`. A function that has no implementation (inherited or
otherwise) is called a *pure virtual function*. That's the meaning of
`= 0` in the declaration of `Calc`:

class Node { public: virtual ~Node () {} virtual double Calc () const= 0; };

A class that has one or more pure virtual functions is called an *abstract
class ,* and it cannot be instantiated (no object of this class can be
created). Only classes that are derived from it, and that provide their own
implementations of all the pure virtual functions, can be instantiated. Notice
that our sample arithmetic tree has instances of

`AddNode`s,

`MultNode`s, and

`NumNode`s, but no instances of

`Node`s or

`BinNode`s.

A common rule is that if a class has a virtual function, it probably needs a
*virtual destructor* as well—and once we decide to pay the overhead of
a vtable pointer, subsequent virtual functions will not increase the size of the
object. So, in such a case, adding a virtual destructor doesn't add any
significant overhead.

In our case, we can anticipate that some of the descendant nodes will have to destroy their children in their destructors, so we really need a virtual destructor. A destructor can be made into a pure virtual function, but we would have to override it in every derived class, whether it needs it or not. That's why I gave it an empty body. (Even though I made it inline, the compiler will create a function body for it, because it needs to stick a pointer for it into the virtual table.)

`NumNode` stores a `double` value that is initialized in its
constructor (see Listing 1). `NumNode` also overrides the `Calc`
virtual function. In this case, `Calc` simply returns the value stored in
the node.

#### Listing 1 Class `NumNode` (Numeric Node)

class NumNode: public Node { public: NumNode (double num) :_num(num ) {} double Calc () const; private: const double_num; }; double NumNode::Calc () const { std::cout << "Numeric node " <<_num<< std::endl; return_num; }

`BinNode` has two children that are pointers to (abstract)
`Node`s. They are initialized in the constructor and deleted in the
destructor—this is why I could make them `const` pointers (see
Listing 2). The `Calc` method is still a pure virtual function, inherited
from `Node`; only the descendants of `BinNode` will know how to
implement it.

#### Listing 2 Class `BinNode` (Binary Node)

class BinNode: public Node { public: BinNode (Node * pLeft, Node * pRight) :_pLeft(pLeft),_pRight(pRight) {} ~BinNode ();protected: Node * const_pLeft; Node * const_pRight; }; BinNode::~BinNode () { delete_pLeft; delete_pRight; }

This is where you first see the advantage of polymorphism. A binary node can have children that are arbitrary nodes. Each of them can be a number node, an addition node, or a multiplication node. There are nine possible combinations of children—it would be silly to make separate classes for each of them—consider this, for instance:

AddNodeWithLeftMultNodeAndRightNumberNode

We have no choice but to accept and store pointers to children as more
general pointers to `Node`s. Yet, when we call destructors through them,
we need to call different functions to destroy different `Node`s. For
instance, `AddNode` has a different destructor than `NumNode`
(which has an empty one), and so on. This is why we have to make the destructors
of `Node`s virtual.

Notice that the two data members of `BinNode` are not
`private`—they are `protected`. This qualification is
slightly weaker than `private`. A private data member or method cannot be
accessed from any code outside the implementation of the given class (or its
friends)—not even from the code of the *derived* class. If we made
`_pLeft` and `_pRight` private, we'd have to provide public
methods to set and get them. That would be tantamount to exposing them to
everybody. By making them `protected`, we let classes *derived* from
`BinNode` manipulate them, but, at the same time, bar anybody else from
doing so. Table 1 presents a short description of all three access specifiers.
Incidentally, the same access specifiers are used in declaring inheritance. By
far the most useful type of inheritance is `public`, and that's what
we've been using so far, but if you'd like to restrict the access to
the base class, you're free to use the other two types of inheritance

#### Table 1 The Meaning of the Three Access Specifiers

Access Specifier |
Who Can Access the Members? |

public |
Anybody |

protected |
The class itself, its friends, and derived classes |

private |
Only the class itself and its friends |

The class `AddNode` is derived from
`BinNode`:

class AddNode: public BinNode { public: AddNode (Node * pLeft, Node * pRight) : BinNode (pLeft, pRight) {} doubleCalc() const; };

`AddNode` provides its own implementation of `Calc`. This is
where you see the advantages of polymorphism again. We let the child nodes
calculate themselves. Since the `Calc` method is virtual, they will do
the right thing based on their actual class, and not on the class of the pointer
(`Node *`). The two results of calling `Calc` are added and the
sum returned.

double AddNode::Calc () const { std::cout << "Adding\n"; return_pLeft->Calc () +_pRight->Calc (); }

Notice how the method of `AddNode` directly accesses its parent's
protected data members `_pLeft` and `_pRight`. Again, were they
declared private, such access would be flagged as an error by the compiler.

For completeness, Listing 3 shows the implementation of `MultNode` and
a simple test program.

#### Listing 3 Class `MultNode` (Multiplication Node) and a Simple Test
Program

class MultNode: public BinNode { public: MultNode (Node * pLeft, Node * pRight) : BinNode (pLeft, pRight) {} double Calc () const; }; double MultNode::Calc () const { std::cout << "Multiplying\n"; return _pLeft->Calc () * _pRight->Calc (); } int main () { // ( 20.0 + (-10.0) ) * 0.1 Node * pNode1 = new NumNode (20.0); Node * pNode2 = new NumNode (-10.0); Node * pNode3 = new AddNode (pNode1, pNode2); Node * pNode4 = new NumNode (0.1); Node * pNode5 = new MultNode (pNode3, pNode4); std::cout << "Calculating the tree\n"; // tell the root to calculate itself double x = pNode5->Calc (); std::cout << "Result: " << x << std::endl; delete pNode5; // and all children }

Do you think you can write more efficient code by not using polymorphism? Think twice! If you're still not convinced, go on a little side trip into the alternative universe of C.