- A regular function can have duplicate parameters except in strict mode
or if its parameter list is not "simple" (has a default or rest
parameter)
- An arrow function can never have duplicate parameters
Compared to other engines I opted for more useful syntax error messages
than a generic "duplicate parameter name not allowed in this context":
"use strict"; function test(foo, foo) {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in strict mode (line: 1, column: 34)
function test(foo, foo = 1) {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in function with default parameter (line: 1, column: 20)
function test(foo, ...foo) {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in function with rest parameter (line: 1, column: 23)
(foo, foo) => {}
^
Uncaught exception: [SyntaxError]: Duplicate parameter 'foo' not allowed in arrow function (line: 1, column: 7)
https://tc39.es/ecma262/#sec-directive-prologues-and-the-use-strict-directive
A Use Strict Directive is an ExpressionStatement in a Directive Prologue
whose StringLiteral is either of the exact code point sequences
"use strict" or 'use strict'. A Use Strict Directive may not contain an
EscapeSequence or LineContinuation.
https://tc39.es/ecma262/#sec-additional-syntax-string-literals
The syntax and semantics of 11.8.4 is extended as follows except that
this extension is not allowed for strict mode code:
Syntax
EscapeSequence::
CharacterEscapeSequence
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
LegacyOctalEscapeSequence::
OctalDigit [lookahead ∉ OctalDigit]
ZeroToThree OctalDigit [lookahead ∉ OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit
ZeroToThree :: one of
0 1 2 3
FourToSeven :: one of
4 5 6 7
NonOctalDecimalEscapeSequence :: one of
8 9
This definition of EscapeSequence is not used in strict mode or when
parsing TemplateCharacter.
Note
It is possible for string literals to precede a Use Strict Directive
that places the enclosing code in strict mode, and implementations must
take care to not use this extended definition of EscapeSequence with
such literals. For example, attempting to parse the following source
text must fail:
function invalid() { "\7"; "use strict"; }
We're passing a token to this function, so m_current_token is actually
the next token - which leads to incorrect line/column numbers for string
literal syntax errors:
"\u"
^
Uncaught exception: [SyntaxError]: Malformed unicode escape sequence (line: 1, column: 5)
Rather than:
"\u"
^
Uncaught exception: [SyntaxError]: Malformed unicode escape sequence (line: 1, column: 1)
This was a regression introduced by 9ffe45b - a TryStatement without
'catch' clause *is* allowed, if it has a 'finally' clause. It is now
checked properly that at least one of both is present.
This separates matching/parsing of statements and declarations and
fixes a few edge cases where the parser would incorrectly accept a
declaration where only a statement is allowed - for example:
if (foo) const a = 1;
for (var bar;;) function b() {}
while (baz) class c {}
From the spec: https://tc39.es/ecma262/#sec-literals-numeric-literals
The SourceCharacter immediately following a NumericLiteral must not be
an IdentifierStart or DecimalDigit.
For example: 3in is an error and not the two input elements 3 and in.
This allows us to provide better error messages as we can point the
syntax error location to the exact first invalid parameter instead of
always the end of the function within a object literal or class
definition.
Before this change:
const Foo = { set bar() {} }
^
Uncaught exception: [SyntaxError]: Object setter property must have one argument (line: 1, column: 28)
class Foo { set bar() {} }
^
Uncaught exception: [SyntaxError]: Class setter method must have one argument (line: 1, column: 26)
After this change:
const Foo = { set bar() {} }
^
Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 23)
class Foo { set bar() {} }
^
Uncaught exception: [SyntaxError]: Setter function must have one argument (line: 1, column: 21)
The only possible downside of this change is that class getters/setters
and functions in objects are not distinguished in the message anymore -
I don't think that's important though, and classes are (mostly) just
syntactic sugar anyway.
I'm about to add even more options and a bunch of unnamed true/false
arguments is really not helpful. Let's make this a single parse options
parameter using bit flags.
If there's a newline between the closing paren and arrow it's not a
valid arrow function, ASI should kick in instead (it'll then fail with
"Unexpected token Arrow")
This simplifies try_parse_arrow_function_expression() and fixes a few
cases that should not produce an arrow function AST but did:
(a,,) => {}
(a b) => {}
(a ...b) => {}
(...b a) => {}
The new parsing logic checks whether parens are expected and uses
parse_function_parameters() if so, rolling back if a new syntax error
occurs during that. Otherwise it's just an identifier in which case we
parse the single parameter ourselves.
'continue' is no longer allowed outside of a loop, and an unlabeled
'break' is not longer allowed outside of a loop or switch statement.
Labeled 'break' statements are still allowed everywhere, even if the
label does not exist.
The check for invalid lhs and assignment to eval/arguments in strict
mode should happen for all kinds of assignment expressions, not just
AssignmentOp::Assignment.
Since blocks can't be strict by themselves, it makes no sense for them
to store whether or not they are strict. Strict-ness is now stored in
the Program and FunctionNode ASTNodes. Fixes issue #3641
The parser considers it a syntax error at the moment, other engines
throw a ReferenceError during runtime for ++foo(), --foo(), foo()++ and
foo()--, so I assume the spec defines this.
literal methods; add EnvrionmentRecord fields and methods to
LexicalEnvironment
Adding EnvrionmentRecord's fields and methods lets us throw an exception
when |this| is not initialized, which occurs when the super constructor
in a derived class has not yet been called, or when |this| has already
been initialized (the super constructor was already called).
This adds regex parsing/lexing, as well as a relatively empty
RegExpObject. The purpose of this patch is to allow the engine to not
get hung up on parsing regexes. This will aid in finding new syntax
errors (say, from google or twitter) without having to replace all of
their regexes first!
This patch adds function declaration hoisting. The mechanism
is similar to var hoisting. Hoisted function declarations are to be put
before the hoisted var declarations, hence they have to be treated
separately.
When parsing JavaScript, we can get pretty much any sequnce of tokens,
and we shouldn't crash if it's not something that we normally expect.
Instead, emit syntax errors.
This rewrite drastically increases the accuracy of object literals.
Additionally, an "assertIsSyntaxError" function has been added to
test-common.js to assist in testing syntax errors.
The parser was chomping on commas present after the arrow function expression. eg. [x=>x,2] would parse as [x=>(x,2)] instead of [(x=>x),2].
This is not the case anymore. I've added a small test to prove this.
Adds the ability for a scope (either a function or the entire program)
to be in strict mode. Scopes default to non-strict mode.
There are two ways to determine the strict-ness of the JS engine:
1. In the parser, this can be accessed with the parser_state variable
m_is_strict_mode boolean. If true, the Parser is currently parsing in
strict mode. This is done so that the Parser can generate syntax
errors at parse time, which is required in some cases.
2. With Interpreter.is_strict_mode(). This allows strict mode checking
at runtime as opposed to compile time.
Additionally, in order to test this, a global isStrictMode() function
has been added to the JS ReplObject under the test-mode flag.
Rather than printing them to stderr directly the parser now keeps a
Vector<Error>, which allows the "owner" of the parser to consume them
individually after parsing.
The Error struct has a message, line number, column number and a
to_string() helper function to format this information into a meaningful
error message.
The Function() constructor will now include an error message when
throwing a SyntaxError.
There are many cases which shouldn't even parse, like
null = ...
true = ...
false = ...
123 = ...
"foo" = ...
However this *is* valid syntax:
foo() = ...
So we still have to keep the current code doing a runtime check if the
LHS value is a resolvable reference. I believe this was declared valid
syntax to *in theory* allow functions returning references - though in
practice that isn't a thing.
Fixes#2204.
This required 2 changes:
1. In the parser, create a new variable scope, so the variable is
declared in it instead of the scope in which the 'for' is found.
2. On execute, push the variable into the newly created block. Existing
code created an empty block (no variables, no arguments) which
allows Interpreter::enter_scope() to skip the creation of a new
environment, therefore when the variable initializer is executed, it
sets the variable to the outer scope. By attaching the variable to
the new block, the block gets a new environment.
This is only needed for 'let' / 'const' declarations, since 'var'
declarations are expected to leak.
Fixes: #2103
"[Function.length is] the number of formal parameters. This number
excludes the rest parameter and only includes parameters before
the first one with a default value." - MDN
To make processing tagged template literals easier, template literals
will now add one empty StringLiteral before and after each template
expression *if* there's no other string - e.g.:
`${foo}` -> "", foo, ""
`test${foo}${bar}test` -> "test", foo, "", bar, "test"
This also matches the behaviour of many other parsers.
A regression was introduced in dc9b4da where the parser would
incorrectly parse the assignment of arrow functions to (non-declaration)
variables. For example, consider:
a = () => {}
Because the parser was aware of default parameters, in
try_parse_arrow_function, the equals sign would be interpreted as a
default argument, leading to incorrect parsing of the overall
expression. Also resulted in some funny behavior
(a = () => {} => {} worked just fine!).
The simple fix is to only look for default parameters if the arrow
function is required to have parenthesis.
Adds fully functioning template literals. Because template literals
contain expressions, most of the work has to be done in the Lexer rather
than the Parser. And because of the complexity of template literals
(expressions, nesting, escapes, etc), the Lexer needs to have some
template-related state.
When entering a new template literal, a TemplateLiteralStart token is
emitted. When inside a literal, all text will be parsed up until a '${'
or '`' (or EOF, but that's a syntax error) is seen, and then a
TemplateLiteralExprStart token is emitted. At this point, the Lexer
proceeds as normal, however it keeps track of the number of opening
and closing curly braces it has seen in order to determine the close
of the expression. Once it finds a matching curly brace for the '${',
a TemplateLiteralExprEnd token is emitted and the state is updated
accordingly.
When the Lexer is inside of a template literal, but not an expression,
and sees a '`', this must be the closing grave: a TemplateLiteralEnd
token is emitted.
The state required to correctly parse template strings consists of a
vector (for nesting) of two pieces of information: whether or not we
are in a template expression (as opposed to a template string); and
the count of the number of unmatched open curly braces we have seen
(only applicable if the Lexer is currently in a template expression).
TODO: Add support for template literal newlines in the JS REPL (this will
cause a syntax error currently):
> `foo
> bar`
'foo
bar'
We already skipped random semicolons in Parser::parse_program(), but now
they are properly matched and parsed as empty statements - and thus
recognized as a valid body of an if / else / while / ... statement.
Adds the ability for function arguments to have default values. This
works for standard functions as well as arrow functions. Default values
are not printed in a <function>.toString() call, as nodes cannot print
their source string representation.
Instead of having fprintf()s all over the place we can now use
syntax_error("message") or syntax_error("message", line, column).
This takes care of a consistent format, appending a newline and getting
the line number and column of the current token if the last two params
are omitted.
Implement the syntax and behavor necessary to support array literals
such as [...[1, 2, 3]]. A type error is thrown if the target of the
spread operator does not evaluate to an array (though it should
eventually just check for an iterable).
Note that the spread token's name is TripleDot, since the '...' token is
used for two features: spread and rest. Calling it anything involving
'spread' or 'rest' would be a bit confusing.
It turns out "delete" is actually a unary op :)
This patch implements deletion of object properties, it doesn't yet
work for casually deleting properties from the global object.
When deleting a property from an object, we switch that object to
having a unique shape, no longer sharing shapes with others.
Once an object has a unique shape, it no longer needs to care about
shape transitions.
This is required for template literals - we're not quite there yet, but at
least the parser can now tell us when this token is encountered -
currently this yields "Unexpected token Invalid". Not really helpful.
The character is a "backtick", but as we already have
TokenType::{StringLiteral,RegexLiteral} this seemed like a fitting name.
This also enables syntax highlighting for template literals in the js
REPL and LibGUI's JSSyntaxHighlighter.
"var" declarations are hoisted to the nearest function scope, while
"let" and "const" are hoisted to the nearest block scope.
This is done by the parser, which keeps two scope stacks, one stack
for the current var scope and one for the current let/const scope.
When the interpreter enters a scope, we walk all of the declarations
and insert them into the variable environment.
We don't support the temporal dead zone for let/const yet.
Many other parsers call it with this name.
Also Type can be confusing in this context since the DeclarationType is
not the type (number, string, etc.) of the variables that are being
declared by the VariableDeclaration.