Home > Articles > Programming > Java

  • Print
  • + Share This
This chapter is from the book

3.5 Input Elements and Tokens

The input characters and line terminators that result from escape processing (§3.3) and then input line recognition (§3.4) are reduced to a sequence of input elements. Those input elements that are not white space (§3.6) or comments (§3.7) are tokens. The tokens are the terminal symbols of the syntactic grammar (§2.3).

This process is specified by the following productions:

    
   Input:
        
   InputElementsopt
    
   Subopt
   

    
   InputElements:
        
   InputElement
        
   InputElements InputElement

    
   InputElement:
        
   WhiteSpace
        
   Comment
        
   Token

    
   Token:
        
   Identifier
        
   Keyword
        
   Literal
        
   Separator
        
   Operator

    
   Sub:
        
   the ASCII SUB character, also known as "control-Z"

White space (§3.6) and comments (§3.7) can serve to separate tokens that, if adjacent, might be tokenized in another manner. For example, the ASCII characters - and = in the input can form the operator token -= (§3.12) only if there is no intervening white space or comment.

As a special concession for compatibility with certain operating systems, the ASCII SUB character (\u001a, or control-Z) is ignored if it is the last character in the escaped input stream.

Consider two tokens x and y in the resulting input stream. If x precedes y , then we say that x is to the left of y and that y is to the right of x .

For example, in this simple piece of code:

    class Empty {
    }

we say that the } token is to the right of the { token, even though it appears, in this two-dimensional representation on paper, downward and to the left of the { token. This convention about the use of the words left and right allows us to speak, for example, of the right-hand operand of a binary operator or of the left-hand side of an assignment.

  • + Share This
  • 🔖 Save To Your Account