- 5. Notational Conventions
- 5.1 Syntactic and Lexical Grammars
-
This section describes the context-free grammars used in this specification to define the lexical and syntactic
structure of an ECMAScript program.
- 5.1.1 Context-Free Grammars
-
A context-free grammar consists of a number of productions. Each production has an abstract symbol
called a nonterminal as its left-hand side, and a sequence of zero or more nonterminal and terminal
symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given
context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of terminal
symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production
for which the nonterminal is the left-hand side.
- 5.1.2 The Lexical and RegExp Grammars
-
A lexical grammar for ECMAScript is given in clause 7. This grammar has as its terminal
symbols the characters of the Unicode character set. It defines a set of productions, starting from the goal symbol
InputElementDiv or InputElementRegExp, that describe how sequences of Unicode characters are translated into
a sequence of input elements.
Input elements other than white space and comments form the terminal symbols for the syntactic grammar for ECMAScript
and are called ECMAScript tokens. These tokens are the reserved words, identifiers, literals, and punctuators of
the ECMAScript language. Moreover, line terminators, although not considered to be tokens, also become part of the
stream of input elements and guide the process of automatic semicolon insertion (7.9). Simple white
space and single-line comments are discarded and do not appear in the stream of input elements for the syntactic
grammar. A MultiLineComment (that is, a comment of the form "/* --- */" regardless of whether it spans more than
one line) is likewise simply discarded if it contains no line terminator; but if a MultiLineComment contains one
or more line terminators, then it is replaced by a single line terminator, which becomes part of the stream of input
elements for the syntactic grammar.
A RegExp grammar for ECMAScript is given in 15.10. This grammar also has as its
terminal symbols the characters of the Unicode character set. It defines a set of productions, starting from the goal
symbol Pattern, that describe how sequences of Unicode characters are translated into regular expression
patterns.
Productions of the lexical and RegExp grammars are distinguished by having two colons "::" as separating punctuation.
The lexical and RegExp grammars share some productions.
- 5.1.3 The Numeric String Grammar
-
A second grammar is used for translating strings into numeric values. This grammar is similar to the part of the
lexical grammar having to do with numeric literals and has as its terminal symbols the characters of the Unicode
character set. This grammar appears in 9.3.1.
Productions of the numeric string grammar are distinguished by having three colons ":::" as punctuation.
- 5.1.4 The Syntactic Grammar
-
The syntactic grammar for ECMAScript is given in clauses 11, 12, 13 and 14. This grammar has ECMAScript tokens
defined by the lexical grammar as its terminal symbols (5.1.2). It defines a set of productions,
starting from the goal symbol Program, that describe how sequences of tokens can form syntactically correct
ECMAScript programs.
When a stream of Unicode characters is to be parsed as an ECMAScript program, it is first converted to a stream of
input elements by repeated application of the lexical grammar; this stream of input elements is then parsed by a single
application of the syntax grammar. The program is syntactically in error if the tokens in the stream of input elements
cannot be parsed as a single instance of the goal nonterminal Program, with no tokens left over.
Productions of the syntactic grammar are distinguished by having just one colon ":" as punctuation. The syntactic
grammar as presented in sections 0, 0, 0 and 0 is actually not a complete account of which token sequences are accepted
as correct ECMAScript programs. Certain additional token sequences are also accepted, namely, those that would be
described by the grammar if only semicolons were added to the sequence in certain places (such as before line terminator
characters). Furthermore, certain token sequences that are described by the grammar are not considered acceptable if a
terminator character appears in certain "awkward" places.
- 5.1.5 Grammar Notation
-
Terminal symbols of the lexical and string grammars, and some of the terminal symbols of the syntactic grammar, are
shown in fixed width font, both in the productions of the grammars and throughout this specification
whenever the text directly refers to such a terminal symbol. These are to appear in a program exactly as written. All
nonterminal characters specified in this way are to be understood as the appropriate Unicode character from the ASCII
range, as opposed to any similar-looking characters from other Unicode ranges.
Nonterminal symbols are shown in italic type. The definition of a nonterminal is introduced by the name of the
nonterminal being defined followed by one or more colons. (The number of colons indicates to which grammar the
production belongs.) One or more alternative right-hand sides for the nonterminal then follow on succeeding lines. For
example, the syntactic definition:
- WithStatement :
- with ( Expression ) Statement
states that the nonterminal WithStatement represents the token with, followed by a left
parenthesis token, followed by an Expression, followed by a right parenthesis token, followed by a Statement.
The occurrences of Expression and Statement are themselves nonterminals. As another example, the syntactic
definition:
- ArgumentList :
- AssignmentExpression
- ArgumentList , AssignmentExpression
states that an ArgumentList may represent either a single AssignmentExpression or an ArgumentList,
followed by a comma, followed by an AssignmentExpression. This definition of ArgumentList is recursive,
that is, it is defined in terms of itself. The result is that an ArgumentList may contain any positive number of
arguments, separated by commas, where each argument expression is an AssignmentExpression. Such recursive
definitions of nonterminals are common.
The subscripted suffix "opt", which may appear after a terminal or nonterminal, indicates an optional symbol.
The alternative containing the optional symbol actually specifies two right-hand sides, one that omits the optional
element and one that includes it. This means that:
- VariableDeclaration :
- Identifier Initialiseropt
is a convenient abbreviation for:
- VariableDeclaration :
- Identifier
Identifier Initialiser
and that:
- IterationStatement :
- for ( ExpressionNoInopt ; Expressionopt
; Expressionopt ) Statement
is a convenient abbreviation for:
- IterationStatement :
- for (; Expressionopt ; Expressionopt )
Statement
- for ( ExpressionNoIn ; Expressionopt ;
Expressionopt ) Statement
which in turn is an abbreviation for:
- IterationStatement :
- for (;; Expressionopt ) Statement
- for (; Expression ; Expressionopt )
Statement
- for ( ExpressionNoIn ;; Expressionopt )
Statement
- for ( ExpressionNoIn ; Expression ; Expressionopt
) Statement
which in turn is an abbreviation for:
- IterationStatement :
- for (;;) Statement
- for (;; Expression ) Statement
- for (; Expression ;) Statement
- for (; Expression ; Expression ) Statement
- for ( ExpressionNoIn ;;) Statement
- for ( ExpressionNoIn ;; Expression ) Statement
- for ( ExpressionNoIn ; Expression ;) Statement
- for ( ExpressionNoIn ; Expression ; Expression
) Statement
so the nonterminal IterationStatement actually has eight alternative right-hand sides.
If the phrase "[empty]" appears as the right-hand side of a production, it indicates that the production's right-hand
side contains no terminals or nonterminals.
If the phrase "[lookahead ∉ set]" appears in the right-hand side of a production, it indicates that the
production may not be used if the immediately following input terminal is a member of the given set. The set
can be written as a list of terminals enclosed in curly braces. For convenience, the set can also be written as a
nonterminal, in which case it represents the set of all terminals to which that nonterminal could expand. For example,
given the definitions
- DecimalDigit :: one of
- 0 1 2 3 4 5 6 7 8 9
- DecimalDigits ::
- DecimalDigit
DecimalDigits DecimalDigit
the definition
- LookaheadExample ::
- n [lookahead ∉ {1, 3, 5, 7, 9}]
- DecimalDigits DecimalDigit [lookahead ∉ DecimalDigit ]
matches either the letter n followed by one or more decimal digits the first of which is even, or a
decimal digit not followed by another decimal digit.
If the phrase "[no LineTerminator here]" appears in the right-hand side of a production of the syntactic
grammar, it indicates that the production is a restricted production: it may not be used if a LineTerminator
occurs in the input stream at the indicated position. For example, the production:
- ReturnStatement :
- return [no LineTerminator here] Expressionopt ;
indicates that the production may not be used if a LineTerminator occurs in the program between the
return token and the Expression.
Unless the presence of a LineTerminator is forbidden by a restricted production, any number of occurrences of
LineTerminator may appear between any two consecutive tokens in the stream of input elements without affecting
the syntactic acceptability of the program.
When the words "one of" follow the colon(s) in a grammar definition, they signify that each of the terminal
symbols on the following line or lines is an alternative definition. For example, the lexical grammar for ECMAScript
contains the production:
- NonZeroDigit :: one of
- 1 2 3 4 5 6 7 8 9
which is merely a convenient abbreviation for:
- NonZeroDigit
- 1
2
3
4
5
6
7
8
9
When an alternative in a production of the lexical grammar or the numeric string grammar appears to be a
multi-character token, it represents the sequence of characters that would make up such a token.
The right-hand side of a production may specify that certain expansions are not permitted by using the phrase "but
not" and then indicating the expansions to be excluded. For example, the production:
- Identifier ::
- IdentifierName but not ReservedWord
means that the nonterminal Identifier may be replaced by any sequence of characters that could replace
IdentifierName provided that the same sequence of characters could not replace ReservedWord.
Finally, a few nonterminal symbols are described by a descriptive phrase in roman type in cases where it would be
impractical to list all the alternatives:
- SourceCharacter ::
- any Unicode character
- 5.2 Algorithm Conventions
-
The specification often uses a numbered list to specify steps in an algorithm. These algorithms are used to clarify
semantics. In practice, there may be more efficient algorithms available to implement a given feature.
When an algorithm is to produce a value as a result, the directive "return x" is used to indicate that the
result of the algorithm is the value of x and that the algorithm should terminate. The notation Result(n)
is used as shorthand for "the result of step n". Type(x) is used as shorthand for "the type of x".
Mathematical operations such as addition, subtraction, negation, multiplication, division, and the mathematical
functions defined later in this section should always be understood as computing exact mathematical results on
mathematical real numbers, which do not include infinities and do not include a negative zero that is distinguished from
positive zero. Algorithms in this standard that model floating-point arithmetic include explicit steps, where necessary,
to handle infinities and signed zero and to perform rounding. If a mathematical operation or function is applied to a
floating-point number, it should be understood as being applied to the exact mathematical value represented by that
floating-point number; such a floating-point number must be finite, and if it is +0 or -0 then the
corresponding mathematical value is simply 0.
The mathematical function abs(x) yields the absolute value of x, which is -x if x is
negative (less than zero) and otherwise is x itself.
The mathematical function sign(x) yields 1 if x is positive and -1 if x is negative. The sign function
is not used in this standard for cases when x is zero.
The notation "x modulo y" (y must be finite and nonzero) computes a value k of the same sign as
y (or zero) such that abs(k)< abs(y) and x-k = q * y for some integer q.
The mathematical function floor(x) yields the largest integer (closest to positive infinity) that is not
larger than x.
NOTE
floor(x) = x-(x modulo 1).
If an algorithm is defined to "throw an exception", execution of the algorithm is terminated and no result is
returned. The calling algorithms are also terminated, until an algorithm step is reached that explicitly deals with the
exception, using terminology such as "If an exception was thrown...". Once such an algorithm step has been encountered
the exception is no longer considered to have occurred.