Home > Articles > Programming > Java

  • Print
  • + Share This
This chapter is from the book

3.8 Identifiers

An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. An identifier cannot have the same spelling (Unicode character sequence) as a keyword (§3.9), boolean literal (§3.10.3), or the null literal (§3.10.7).

    
   Identifier:
        
   IdentifierChars 
   but not a 
   Keyword 
   or 
   BooleanLiteral 
   or 
   NullLiteral

    
   IdentifierChars:
        
   JavaLetter
        
   IdentifierChars JavaLetterOrDigit

    
   JavaLetter:
        
   any Unicode character that is a Java letter (see below)

    
   JavaLetterOrDigit:
        
   any Unicode character that is a Java letter-or-digit (see below)

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.

The Java letters include uppercase and lowercase ASCII Latin letters AZ (\u0041\u005a), and az (\u0061\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ character should be used only in mechanically generated source code or, rarely, to access preexisting names on legacy systems.

The "Java digits" include the ASCII digits 0-9 (\u0030\u0039).

Two identifiers are the same only if they are identical, that is, have the same Unicode character for each letter or digit.

Identifiers that have the same external appearance may yet be different. For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A, \u0041), LATIN SMALL LETTER A (a, \u0061), GREEK CAPITAL LETTER ALPHA (A, \u0391), CYRILLIC SMALL LETTER A (a, \u0430) and MATHEMATICAL BOLD ITALIC SMALL A (a, \ud835\udc82) are all different.

Unicode composite characters are different from the decomposed characters. For example, a LATIN CAPITAL LETTER A ACUTE (Á, \u00c1) could be considered to be the same as a LATIN CAPITAL LETTER A (A, \u0041) immediately followed by a NON-SPACING ACUTE (´, \u0301) when sorting, but these are different in identifiers. See The Unicode Standard, Volume 1, pages 412ff for details about decomposition, and see pages 626–627 of that work for details about sorting. Examples of identifiers are:

      String     i3     areth     MAX_VALUE    isLetterOrDigit
  • + Share This
  • 🔖 Save To Your Account