- 7. Lexical Conventions
-
The source text of an ECMAScript program is first converted into a sequence of input elements, which are either
tokens, line terminators, comments, or white space. The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element.
There are two goal symbols for the lexical grammar. The InputElementDiv symbol is used in those syntactic
grammar contexts where a division (/) or division-assignment (/=) operator is permitted. The InputElementRegExp
symbol is used in other syntactic grammar contexts.
Note that contexts exist in the syntactic grammar where both a division and a RegularExpressionLiteral are
permitted by the syntactic grammar; however, since the lexical grammar uses the InputElementDiv goal symbol in
such cases, the opening slash is not recognised as starting a regular expression literal in such a context. As a
workaround, one may enclose the regular expression literal in parentheses.
Syntax
- InputElementDiv ::
- WhiteSpace
LineTerminator
Comment
Token
DivPunctuator
- InputElementRegExp ::
- WhiteSpace
LineTerminator
Comment
Token
RegularExpressionLiteral
- 7.1 Unicode Format-Control Characters
-
The Unicode format-control characters (i. e., the characters in category "Cf" in the Unicode Character Database such
as LEFT-TO-RIGHT MARK or RIGHT-TO-LEFT MARK) are control codes used to control the formatting of a range of text in the
absence of higher-level protocols for this (such as mark-up languages). It is useful to allow these in source text to
facilitate editing and display.
The format control characters can occur anywhere in the source text of an ECMAScript program. These characters are
removed from the source text before applying the lexical grammar. Since these characters are removed before processing
string and regular expression literals, one must use a Unicode escape sequence (see 7.6) to include
a Unicode format-control character inside a string or regular expression literal.
- 7.2 White Space
-
White space characters are used to improve source text readability and to separate tokens (indivisible lexical units)
from each other, but are otherwise insignificant. White space may occur between any two tokens, and may occur within
strings (where they are considered significant characters forming part of the literal string value), but cannot appear
within any other kind of token.
The following characters are considered to be white space:
| Code Point Value |
Name |
Formal Name |
| \u0009 |
Tab |
<TAB> |
| \u000B |
Vertical Tab |
<VT> |
| \u000C |
Form Feed |
<FF> |
| \u0020 |
Space |
<SP> |
| \u00A0 |
No-break space |
<NBSP> |
| Other category "Zs" |
Any other Unicode "space separator" |
<USP> |
Syntax
- WhiteSpace ::
- <TAB>
<VT>
<FF>
<SP>
<NBSP>
<USP>
- 7.3 Line Terminators
-
Like white space characters, line terminator characters are used to improve source text readability and to separate
tokens (indivisible lexical units) from each other. However, unlike white space characters, line terminators have some
influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens,
but there are a few places where they are forbidden by the syntactic grammar. A line terminator cannot occur within any
token, not even a string. Line terminators also affect the process of automatic semicolon insertion (7.9).
The following characters are considered to be line terminators:
| Code Point Value |
Name |
Formal Name |
| \u000A |
Line Feed |
<LF> |
| \u000D |
Carriage Return |
<CR> |
| \u2028 |
Line separator |
<LS> |
| \u2029 |
Paragraph separator |
<PS> |
Syntax
- LineTerminator ::
- <LF>
<CR>
<LS>
<PS>
- 7.4 Comments
-
Description
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because a single-line comment can contain any character except a LineTerminator character, and because of the
general rule that a token is always as long as possible, a single-line comment always consists of all characters from
the // marker to the end of the line. However, the LineTerminator at the end of the line is not
considered to be part of the single-line comment; it is recognised separately by the lexical grammar and becomes part of
the stream of input elements for the syntactic grammar. This point is very important, because it implies that the
presence or absence of single-line comments does not affect the process of automatic semicolon insertion (7.9).
Comments behave like white space and are discarded except that, if a MultiLineComment contains a line
terminator character, then the entire comment is considered to be a LineTerminator for purposes of parsing by the
syntactic grammar.
Syntax
- Comment ::
- MultiLineComment
SingleLineComment
- MultiLineComment ::
- /* MultiLineCommentCharsopt */
- MultiLineCommentChars ::
- MultiLineNotAsteriskChar MultiLineCommentCharsopt
- * PostAsteriskCommentCharsopt
- PostAsteriskCommentChars ::
- MultiLineNotForwardSlashOrAsteriskChar MultiLineCommentCharsopt
- * PostAsteriskCommentCharsopt
- MultiLineNotAsteriskChar ::
- SourceCharacter but not asterisk *
- MultiLineNotForwardSlashOrAsteriskChar ::
- SourceCharacter but not forward-slash / or asterisk *
- SingleLineComment ::
- // SingleLineCommentCharsopt
- SingleLineCommentChars ::
- SingleLineCommentChar SingleLineCommentCharsopt
- SingleLineCommentChar ::
- SourceCharacter but not LineTerminator
- 7.5 Tokens
-
Syntax
- Token ::
- ReservedWord
Identifier
Punctuator
NumericLiteral
StringLiteral
- 7.5.1 Reserved Words
-
Description
Reserved words cannot be used as identifiers.
Syntax
- ReservedWord ::
- Keyword
FutureReservedWord
NullLiteral
BooleanLiteral
- 7.5.2 Keywords
-
The following tokens are ECMAScript keywords and may not be used as identifiers in ECMAScript programs.
Syntax
- Keyword :: one of
- break else new var case finally return void catch for switch while continue function this with default if
throw delete in try do instanceof typeof
- 7.5.3 Future Reserved Words
-
The following words are used as keywords in proposed extensions and are therefore reserved to allow for the
possibility of future adoption of those extensions.
Syntax
- FutureReservedWord :: one of
- abstract enum int short boolean export interface static byte extends long super char final native
synchronized class float package throws const goto private transient debugger implements protected volatile double
import public
- 7.6 Identifiers
-
Description
Identifiers are interpreted according to the grammar given in Section 5.16 of the upcoming version 3.0 of the Unicode
standard, with some small modifications. This grammar is based on both normative and informative character categories
specified by the Unicode standard. The characters in the specified categories in version 2.1 of the Unicode standard
must be treated as in those categories by all conforming ECMAScript implementations; however, conforming ECMAScript
implementations may allow additional legal identifier characters based on the category assignment from later versions of
Unicode.
This standard specifies one departure from the grammar given in the Unicode standard: The dollar sign ($) and the
underscore (_) are permitted anywhere in an identifier. The dollar sign is intended for use only in mechanically
generated code.
Unicode escape sequences are also permitted in identifiers, where they contribute a single character to the
identifier, as computed by the CV of the UnicodeEscapeSequence (see 7.8.4). The \
preceding the UnicodeEscapeSequence does not contribute a character to the identifier. A UnicodeEscapeSequence
cannot be used to put a character into an identifier that would otherwise be illegal. In other words, if a \
UnicodeEscapeSequence sequence were replaced by its UnicodeEscapeSequence's CV, the result must still be a
valid Identifier that has the exact same sequence of characters as the original Identifier.
Two identifiers that are canonically equivalent according to the Unicode standard are not equal unless they
are represented by the exact same sequence of code points (in other words, conforming ECMAScript implementations are
only required to do bitwise comparison on identifiers). The intent is that the incoming source text has been converted
to normalised form C before it reaches the compiler.
Syntax
- Identifier ::
- IdentifierName but not ReservedWord
- IdentifierName ::
- IdentifierStart
IdentifierName IdentifierPart
- IdentifierStart ::
- UnicodeLetter
- $
- _
- \ UnicodeEscapeSequence
- IdentifierPart ::
- IdentifierStart
UnicodeCombiningMark
UnicodeDigit
UnicodeConnectorPunctuation
\ UnicodeEscapeSequence
- UnicodeLetter
- any character in the Unicode categories "Uppercase letter (Lu)", "Lowercase letter (Ll)", "Titlecase letter
(Lt)", "Modifier letter (Lm)", "Other letter (Lo)", or "Letter number (Nl)".
- UnicodeCombiningMark
- any character in the Unicode categories "Non-spacing mark (Mn)" or "Combining spacing mark (Mc)"
- UnicodeDigit
- any character in the Unicode category "Decimal number (Nd)"
- UnicodeConnectorPunctuation
- any character in the Unicode category "Connector punctuation (Pc)"
- UnicodeEscapeSequence
- see 7.8.4.
- HexDigit :: one of
- 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
- 7.7 Punctuators
-
Syntax
- Punctuator :: one of
- {
}
(
)
[
]
.
;
,
<
>
<=
>=
==
!=
===
!==
+
-
*
%
++
--
<<
>>
>>>
&
|
^
!
~
&&
||
?
:
=
+=
-=
*=
%=
<<=
>>=
>>>=
&=
|=
^=
- DivPunctuator :: one of
- /
/=
- 7.8 Literals
-
Syntax
- Literal ::
- NullLiteral
BooleanLiteral
NumericLiteral
StringLiteral
- 7.8.1 Null Literals
-
Syntax
- NullLiteral ::
- null
Semantics
The value of the null literal null is the sole value of the Null type, namely null.
- 7.8.2 Boolean Literals
-
Syntax
- BooleanLiteral ::
- true
false
Semantics
The value of the Boolean literal true is a value of the Boolean type, namely true.
The value of the Boolean literal false is a value of the Boolean type, namely false.
- 7.8.3 Numeric Literals
-
Syntax
- NumericLiteral ::
- DecimalLiteral
HexIntegerLiteral
- DecimalLiteral ::
- DecimalIntegerLiteral . DecimalDigitsopt ExponentPartopt
. DecimalDigits ExponentPartopt
DecimalIntegerLiteral ExponentPartopt
- DecimalIntegerLiteral ::
- 0
NonZeroDigit DecimalDigitsopt
- DecimalDigits ::
- DecimalDigit
DecimalDigits DecimalDigit
- DecimalDigit :: one of
- 0 1 2 3 4 5 6 7 8 9
- NonZeroDigit :: one of
- 1 2 3 4 5 6 7 8 9
- ExponentPart ::
- ExponentIndicator SignedInteger
- ExponentIndicator :: one of
- e E
- SignedInteger ::
- DecimalDigits
+ DecimalDigits
- DecimalDigits
- HexIntegerLiteral ::
- 0x HexDigit
0X HexDigit
HexIntegerLiteral HexDigit
The source character immediately following a NumericLiteral must not be an IdentifierStart or
DecimalDigit.
NOTE
For example:
3in
is an error and not the two input elements 3 and in.
Semantics
A numeric literal stands for a value of the Number type. This value is determined in two steps: first, a mathematical
value (MV) is derived from the literal; second, this mathematical value is rounded as described below.
- The MV of NumericLiteral :: DecimalLiteral is the MV of DecimalLiteral.
- The MV of NumericLiteral :: HexIntegerLiteral is the MV of HexIntegerLiteral.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . is the MV of
DecimalIntegerLiteral.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits is the MV
of DecimalIntegerLiteral plus (the MV of DecimalDigits times 10 -n ), where n is the
number of characters in DecimalDigits.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . ExponentPart is the MV
of DecimalIntegerLiteral times 10 e ,where e is the MV of ExponentPart.
- The MV of DecimalLiteral :: DecimalIntegerLiteral . DecimalDigits ExponentPart
is (the MV of DecimalIntegerLiteral plus (the MV of DecimalDigits times 10-n )) times 10e,
where n is the number of characters in DecimalDigits and e is the MV of ExponentPart.
- The MV of DecimalLiteral ::. DecimalDigits is the MV of DecimalDigits times
10-n, where n is the number of characters in DecimalDigits.
- The MV of DecimalLiteral :: . DecimalDigits ExponentPart is the MV of DecimalDigits
times 10e-n,where n is the number of characters in DecimalDigits and e is the
MV of ExponentPart.
- The MV of DecimalLiteral :: DecimalIntegerLiteral is the MV of DecimalIntegerLiteral.
- The MV of DecimalLiteral :: DecimalIntegerLiteral ExponentPart is the MV of
DecimalIntegerLiteral times 10e, where e is the MV of ExponentPart.
- The MV of DecimalIntegerLiteral :: 0 is 0.
- The MV of DecimalIntegerLiteral :: NonZeroDigit DecimalDigits is (the MV of NonZeroDigit
times 10n ) plus the MV of DecimalDigits, where n is the number of characters in
DecimalDigits.
- The MV of DecimalDigits :: DecimalDigit is the MV of DecimalDigit.
- The MV of DecimalDigits :: DecimalDigits DecimalDigit is (the MV of DecimalDigits
times 10) plus the MV of DecimalDigit.
- The MV of ExponentPart :: ExponentIndicator SignedInteger is the MV of SignedInteger.
- The MV of SignedInteger :: DecimalDigits is the MV of DecimalDigits.
- The MV of SignedInteger :: + DecimalDigits is the MV of DecimalDigits.
- The MV of SignedInteger :: - DecimalDigits is the negative of the MV of
DecimalDigits.
- The MV of DecimalDigit :: 0 or of HexDigit :: 0 is 0.
- The MV of DecimalDigit :: 1 or of NonZeroDigit :: 1 or of
HexDigit :: 1 is 1. The MV of DecimalDigit :: 2 or of NonZeroDigit
:: 2 or of HexDigit :: 2 is 2.
- The MV of DecimalDigit :: 3 or of NonZeroDigit :: 3 or of
HexDigit :: 3 is 3.
- The MV of DecimalDigit :: 4 or of NonZeroDigit :: 4 or of
HexDigit :: 4 is 4.
- The MV of DecimalDigit :: 5 or of NonZeroDigit :: 5 or of
HexDigit :: 5 is 5. The MV of DecimalDigit :: 6 or of NonZeroDigit
:: 6 or of HexDigit :: 6 is 6.
- The MV of DecimalDigit :: 7 or of NonZeroDigit :: 7 or of
HexDigit :: 7 is 7.
- The MV of DecimalDigit :: 8 or of NonZeroDigit :: 8 or of
HexDigit :: 8 is 8.
- The MV of DecimalDigit :: 9 or of NonZeroDigit :: 9 or of
HexDigit :: 9 is 9. The MV of HexDigit :: a or of HexDigit ::
A is 10.
- The MV of HexDigit :: b or of HexDigit :: B is 11.
- The MV of HexDigit :: c or of HexDigit :: C is 12.
- The MV of HexDigit :: d or of HexDigit :: D is 13.
- The MV of HexDigit :: e or of HexDigit :: E is 14.
- The MV of HexDigit :: f or of HexDigit :: F is 15.
- The MV of HexIntegerLiteral :: 0x HexDigit is the MV of HexDigit.
- The MV of HexIntegerLiteral :: 0X HexDigit is the MV of HexDigit.
- The MV of HexIntegerLiteral :: HexIntegerLiteral HexDigit is (the MV of
HexIntegerLiteral times 16) plus the MV of HexDigit.
Once the exact MV for a numeric literal has been determined, it is then rounded to a value of the Number type. If the
MV is 0, then the rounded value is +0; otherwise, the rounded value must be the number value for the MV
(in the sense defined in 8.5), unless the literal is a DecimalLiteral and the literal has
more than 20 significant digits, in which case the number value may be either the number value for the MV of a literal
produced by replacing each significant digit after the 20th with a 0 digit or the number value for the
MV of a literal produced by replacing each significant digit after the 20th with a 0 digit and then
incrementing the literal at the 20th significant digit position. A digit is significant if it is not part of an
ExponentPart and
- it is not 0; or
- there is a nonzero digit to its left and there is a nonzero digit, not in the ExponentPart, to its right.
- 7.8.4 String Literals
-
A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by
an escape sequence.
Syntax
- StringLiteral ::
- " DoubleStringCharactersopt "
' SingleStringCharactersopt '
- DoubleStringCharacters ::
- DoubleStringCharacter DoubleStringCharactersopt
- SingleStringCharacters ::
- SingleStringCharacter SingleStringCharactersopt
- DoubleStringCharacter ::
- SourceCharacter but not double-quote " or backslash \ or
LineTerminator
\ EscapeSequence
- SingleStringCharacter ::
- SourceCharacter but not single-quote ' or backslash \ or
LineTerminator
\ EscapeSequence
- EscapeSequence ::
- CharacterEscapeSequence
0 [lookahead ∉ DecimalDigit]
HexEscapeSequence
UnicodeEscapeSequence
- CharacterEscapeSequence ::
- SingleEscapeCharacter
NonEscapeCharacter
- SingleEscapeCharacter :: one of
- ' " \ b f n r t v
- NonEscapeCharacter ::
- SourceCharacter but not EscapeCharacter or LineTerminator
- EscapeCharacter ::
- SingleEscapeCharacter
DecimalDigit
x
u
- HexEscapeSequence ::
- x HexDigit HexDigit
- UnicodeEscapeSequence ::
- u HexDigit HexDigit HexDigit HexDigit
The definitions of the nonterminal HexDigit is given in section 7.8.3.
SourceCharacter is described in sections 2 and 6.
A string literal stands for a value of the String type. The string value (SV) of the literal is described in terms of
character values (CV) contributed by the various parts of the string literal. As part of this process, some characters
within the string literal are interpreted as having a mathematical value (MV), as described below or in section
7.8.3.
- The SV of StringLiteral :: "" is the empty character sequence.
- The SV of StringLiteral :: '' is the empty character sequence.
- The SV of StringLiteral :: " DoubleStringCharacters " is the SV of
DoubleStringCharacters.
- The SV of StringLiteral :: ' SingleStringCharacters ' is the SV of
SingleStringCharacters.
- The SV of DoubleStringCharacters :: DoubleStringCharacter is a sequence of one character,
the CV of DoubleStringCharacter.
- The SV of DoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacters is a
sequence of the CV of DoubleStringCharacter followed by all the characters in the SV of
DoubleStringCharacters in order.
- The SV of SingleStringCharacters :: SingleStringCharacter is a sequence of one character,
the CV of SingleStringCharacter.
- The SV of SingleStringCharacters :: SingleStringCharacter SingleStringCharacters is a
sequence of the CV of SingleStringCharacter followed by all the characters in the SV of
SingleStringCharacters in order.
- The CV of DoubleStringCharacter :: SourceCharacter but not double-quote
" or backslash \ or LineTerminator is the SourceCharacter
character itself.
- The CV of DoubleStringCharacter :: \ EscapeSequence is the CV of the
EscapeSequence. The CV of SingleStringCharacter :: SourceCharacter but not
single-quote ' or backslash \ or LineTerminator is the
SourceCharacter character itself.
- The CV of SingleStringCharacter :: \ EscapeSequence is the CV of the
EscapeSequence.
- The CV of EscapeSequence :: CharacterEscapeSequence is the CV of the
CharacterEscapeSequence.
- The CV of EscapeSequence :: 0 [lookahead ∉ DecimalDigit] is a <NUL> character
(Unicode value 0000).
- The CV of EscapeSequence :: HexEscapeSequence is the CV of the HexEscapeSequence.
- The CV of EscapeSequence :: UnicodeEscapeSequence is the CV of the UnicodeEscapeSequence.
- The CV of CharacterEscapeSequence :: SingleEscapeCharacter is the character whose code
point value is determined by the SingleEscapeCharacter according to the following table:
| Escape Sequence |
Code Point Value |
Name |
Symbol |
| \b |
\u0008 |
backspace |
<BS> |
| \t |
\u0009 |
horizontal tab |
<HT> |
| \n |
\u000A |
line feed (new line) |
<LF> |
| \v |
\u000B |
vertical tab |
<VT> |
| \f |
\u000C |
form feed |
<FF> |
| \r |
\u000D |
carriage return |
<CR> |
| \" |
\u0022 |
double quote |
" |
| \' |
\u0027 |
single quote |
' |
| \\ |
\u005C |
backslash |
\ |
- The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the
NonEscapeCharacter.
- The CV of NonEscapeCharacter :: SourceCharacter but not EscapeCharacter or
LineTerminator is the SourceCharacter character itself.
- The CV of HexEscapeSequence :: x HexDigit HexDigit is the character whose code
point value is (16 times the MV of the first HexDigit) plus the MV of the second HexDigit.
- The CV of UnicodeEscapeSequence :: u HexDigit HexDigit HexDigit HexDigit is the
character whose code point value is (4096 (that is, 163 ) times the MV of the first HexDigit) plus
(256 (that is, 162 ) times the MV of the second HexDigit) plus (16 times the MV of the third
HexDigit) plus the MV of the fourth HexDigit.
NOTE
A 'LineTerminator' character cannot appear in a string literal, even if preceded by a backslash \. The
correct way to cause a line terminator character to be part of the string value of a string literal is to use an escape
sequence such as \n or \u000A.
- 7.8.5 Regular Expression Literals
-
A regular expression literal is an input element that is converted to a RegExp object (section
15.10) when it is scanned. The object is created before evaluation of the containing program or function begins.
Evaluation of the literal produces a reference to that object; it does not create a new object. Two regular expression
literals in a program evaluate to regular expression objects that never compare as === to each other
even if the two literals' contents are identical. A RegExp object may also be created at runtime by new RegExp
(section 15.10.4) or calling the RegExp constructor as a function (section
15.10.3).
The productions below describe the syntax for a regular expression literal and are used by the input element scanner
to find the end of the regular expression literal. The strings of characters comprising the RegularExpressionBody
and the RegularExpressionFlags are passed uninterpreted to the regular expression constructor, which interprets
them according to its own, more stringent grammar. An implementation may extend the regular expression constructor's
grammar, but it should not extend the RegularExpressionBody and RegularExpressionFlags productions or the
productions used by these productions.
Syntax
- RegularExpressionLiteral ::
- / RegularExpressionBody / RegularExpressionFlags
- RegularExpressionBody ::
- RegularExpressionFirstChar RegularExpressionChars
- RegularExpressionChars ::
- [empty]
RegularExpressionChars RegularExpressionChar
- RegularExpressionFirstChar ::
- NonTerminator but not * or \ or /
BackslashSequence
- RegularExpressionChar ::
- NonTerminator but not \ or /
BackslashSequence
- BackslashSequence ::
- \ NonTerminator
- NonTerminator ::
- SourceCharacter but not LineTerminator
- RegularExpressionFlags ::
- [empty]
RegularExpressionFlags IdentifierPart
NOTE
Regular expression literals may not be empty; instead of representing an empty regular expression literal, the
characters // start a single-line comment. To specify an empty regular expression, use /(?:)/.
Semantics
A regular expression literal stands for a value of the Object type. This value is determined in two steps: first, the
characters comprising the regular expression's RegularExpressionBody and RegularExpressionFlags production
expansions are collected uninterpreted into two strings Pattern and Flags, respectively. Then the new RegExp
constructor is called with two arguments Pattern and Flags and the result becomes the value of the
RegularExpressionLiteral. If the call to new RegExp generates an error, an implementation may, at
its discretion, either report the error immediately while scanning the program, or it may defer the error until the
regular expression literal is evaluated in the course of program execution.
- 7.9 Automatic Semicolon Insertion
-
Certain ECMAScript statements (empty statement, variable statement, expression statement, do-while
statement, continue statement, break statement, return statement, and
throw statement) must be terminated with semicolons. Such semicolons may always appear explicitly in the
source text. For convenience, however, such semicolons may be omitted from the source text in certain situations. These
situations are described by saying that semicolons are automatically inserted into the source code token stream in those
situations.
- 7.9.1 Rules of Automatic Semicolon Insertion
-
However, there is an additional overriding condition on the preceding rules: a semicolon is never inserted
automatically if the semicolon would then be parsed as an empty statement or if that semicolon would become one of the
two semicolons in the header of a for statement (section 12.6.3).
NOTE
These are the only restricted productions in the grammar:
- PostfixExpression :
- LeftHandSideExpression [no LineTerminator here] ++
LeftHandSideExpression [no LineTerminator here] --
- ContinueStatement :
- continue [no LineTerminator here] Identifieropt ;
- BreakStatement :
- break [no LineTerminator here] Identifieropt ;
- ReturnStatement :
- return [no LineTerminator here] Expressionopt ;
- ThrowStatement :
- throw [no LineTerminator here] Expression ;
The practical effect of these restricted productions is as follows:
- When a ++ or -- token is encountered where the parser would treat it as a
postfix operator, and at least one LineTerminator occurred between the preceding token and the ++
or -- token, then a semicolon is automatically inserted before the ++ or --
token.
- When a continue, break, return, or throw token is encountered and a
LineTerminator is encountered before the next token, a semicolon is automatically inserted after the
continue, break, return, or throw token.
The resulting practical advice to ECMAScript programmers is:
- A postfix ++ or -- operator should appear on the same line as its operand.
- An Expression in a return or throw statement should start on the same
line as the return or throw token.
- A label in a break or continue statement should be on the same line as the
break or continue token.
- 7.9.2 Examples of Automatic Semicolon Insertion
-
The source
{ 1 2 } 3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast,
the source
{ 1
2 } 3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{ 1
;2 ;} 3;
which is a valid ECMAScript sentence.
The source
for (a; b
)
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion because the semicolon is
needed for the header of a for statement. Automatic semicolon insertion never inserts one of the two
semicolons in the header of a for statement.
The source
return
a + b
is transformed by automatic semicolon insertion into the following:
return;
a + b;
NOTE
The expression a + b is not treated as a value to be returned by the return statement,
because a 'LineTerminator' separates it from the token return.
The source
a = b
++c
is transformed by automatic semicolon insertion into the following:
a = b;
++c;
NOTE
The token ++ is not treated as a postfix operator applying to the variable b, because a
'LineTerminator' occurs between b and ++.
The source
if (a > b)
else c = d
is not a valid ECMAScript sentence and is not altered by automatic semicolon insertion before the else
token, even though no production of the grammar applies at that point, because an automatically inserted semicolon would
then be parsed as an empty statement.
The source
a = b + c
(d + e).print()
is not transformed by automatic semicolon insertion, because the parenthesised expression that begins the
second line can be interpreted as an argument list for a function call:
a = b +c(d + e).print()
In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the
programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic
semicolon insertion.