Home > Articles > Programming > C/C++

  • Print
  • + Share This

Expanding Stubs

The first stub to be expanded into a full implementation will be that of the Scanner. The scanner converts the input string into a series of tokens. It works like a sequencer. Whenever the Parser needs a token, it asks the Scanner for the current token. When this token is parsed, the Parser accepts it by calling Scanner::Accept. The Accept method scans the string further, trying to recognize the next token.

NOTE

Even if the parser recognizes an error, it still usually accepts the token, so that the parsing doesn't get into an infinite loop.

There is a finite, well-defined set of tokens. It's convenient to put them into the enumeration EToken (for now we'll only use a limited number of tokens, but we'll be adding more as the need arises).

enum EToken
{
  tEnd,
  tError,
  tNumber,
  tPlus,
  tMult
};

We would also like the Scanner to be able to convert the part of the input string recognized as a number to a floating-point value. This is done by the Number method, which can be called only when the current token is tNumber (see the assertion there). Notice the use of the type char const * const—a const pointer to a const string. The pointer is initialized in the constructor and never changes again. Its contents are read-only, too.

class Scanner
{
public:
  Scanner (char const * buf);
  EToken Token () const { return _token; }
  void Accept ();
  double Number ()
  {
   assert (_token == tNumber);
   return _number;
  }
private:
  void EatWhite ();

  char const * const  _buf;
  int         _iLook;
  EToken        _token;
  double        _number;
};

The constructor of Scanner, besides initializing all member variables, calls the method Accept. Accept recognizes the first available token and positions the index _iLook past the recognized part of the buffer.

Scanner::Scanner (char const * buf)
: _buf (buf), _iLook(0)
{
  std::cout << "Scanner with \"" << buf << "\""    << std::endl;
  Accept ();
}

EatWhite is the helper function that skips whitespace characters in the input.

void Scanner::EatWhite ()
{
  while (std::isspace (_buf [_iLook]))
   ++_iLook;
}

Accept is just one big switch statement. Depending on the value of the current character in the buffer, _buf[_iLook], the control passes to the appropriate case label within the switch. For now I have implemented the recognition of the addition and the multiplication operators. When either of them is recognized, the _token variable is initialized to the corresponding enumerated value, and the variable _iLook is incremented by one.

The recognition of digits is done using the fall-through property of the case statements. Unless there's an explicit break in a switch statement, the control passes to the next case. All digit cases fall through, to reach the code after case '9'. The string starting with a digit is converted to an integer (later we'll change this part of the program to recognize floating-point numbers) and _iLook is positioned after the last digit.

The default case is executed when no other case matches the value passed to the switch.

void Scanner::Accept ()
{
  EatWhite ();
  switch (_buf[_iLook])
  {
  case '+':
   _token = tPlus;
   ++_iLook;
   break;
  case '*':
   _token = tMult;
   ++_iLook;
   break;
  case '0': case '1': case '2': case '3': case '4':
  case '5': case '6': case '7': case '8': case '9':
   _token = tNumber;
   _number = std::atoi (&_buf [_iLook]);
   while (std::isdigit (_buf [_iLook]))
     ++_iLook;
   break;
  case '\0': // end of input
   _token = tEnd;
   break;
  default:
   _token = tError;
   break;
  }
}

You might ask, "Can this switch statement be replaced by some clever use of polymorphism?" I really don't know how you could do it. The trouble is that the incoming data stream—characters from the buffer—is amorphous. At the stage where we're trying to determine the meaning of the input—to give it form—the best we can do is to go through a multi-way conditional, in this case a switch statement. Dealing with amorphous input is virtually the only time when using a switch statement is completely legitimate in C++. The parsing of user input, data stored in a file, or external events—all require such treatment.

NOTE

In some cases, we can bypass conditionals by using a table-driven approach. For instance, since there are only 256 possible characters, we could have defined a table of 256 entries and indexed it using the current input character. The entries could be pointers to functions (which I'll talk about later) designed specifically for processing a given type of character.

The dummy parser is, for now, implemented as a for loop that retrieves one token after another and prints the appropriate message.

Status Parser::Eval ()
{
  for (EToken token = _scanner.Token ();
   token != tEnd;
   _scanner.Accept ())
  {
   token = _scanner.Token ()
   switch (token)
   {
   case tMult:
     std::cout << "Times\n";
     break;
   case tPlus:
     std::cout << "Plus\n";
     break;
   case tNumber:
     std::cout << "Number: " << _scanner.Number ()
        << std::endl;
     break;
   case tError:
     std::cout << "Error\n";
     return stQuit;
   default:
     std::cout << "Error: bad token\n";
     return stQuit;
   }
  }
  return stOk;
}

Once more, this partially implemented program is compiled and tested. We can see the flow of control and the correct recognition of a few tokens.

  • + Share This
  • 🔖 Save To Your Account

Related Resources

There are currently no related titles. Please check back later.