Explanations will be completed with code snippets and a link to a Github repository will be provided for testing a real example.
Here is the plan of this article:
- PART 1: A presentation of PEGJS
- PART 2: Overview of the application
- PART 3: Defining the grammar
- PART 4: Integrating the parser
- PART 5: Parsing a text input
- PART 6: Getting a working example
PART 1: A presentation of PEGJS
PEGJS is a parser generator for javascript that allows developers to build interpreters or compilers with good error reporting and to create their own Domain Specific Language (DSL) for instance. PEG stands for Parsing Expression Grammar. PEGJS has been developed by David Majda.
PART 2: Overview of the application
The application will implement a calculator for simple arithmetic operations such like multiplications and additions with integer and float numbers. When initializing the applictaion will load the grammar file and then display two panels: one for typing the arithmetic expressions and the other for displaying the result.
The arithmetic operations entered by the user support parenthesis and blank spaces between operators and operands.
The arithmetic operations entered by the user support parenthesis and blank spaces between operators and operands.
PART 3: Defining the grammar
In the application, the grammar has been defined in a file which extension is '.pegjs', but it could have been defined in any other files with a different extension (a '.txt' file for instance). Here is the grammar describes in the 'myGrammar.pegjs' file:
start = additive additive = left:multiplicative space* "+" space* right:additive { return left + right; } / multiplicative multiplicative = left:primary space* "*" space* right:multiplicative { return left * right; } / primary primary = number / "(" space* additive:additive space* ")" { return additive; } number = float / integer float "a float" = digits1:[0-9]+ "." digits2:[0-9]+ { return parseFloat(digits1.join("") + "." + digits2.join("")); } integer "an integer" = digits:[0-9]+ { return parseInt(digits.join(""), 10); } space = [ \t]
The grammar defines some rules, which most of the time consist of an identifier, a parsing expression and some javascript code that is executed when the pattern of the parsing expression matches successfully. The parsing starts with the rule which identifier is 'start'.
It is possible to make a reference to a rule in another rule. For instance, the 'additive' rule has a reference to the 'multiplicative' rule. It is also possible to give an expression a label in the rule. Then this label is used to reference in the javascript code. For instance the label 'left' is a reference to the 'primary' expression in the 'additive' rule and is used in the javascript code as 'return left + right;'. The label is declared before the expression and separated from the expression by a column ':'.
It is also possible to give a rule a human-readable name. This is the case for the 'float' rule where the name 'a float' will be for instance displayed in case of error in the error message (see PART 4 for more details). The human-readable name should be declared between the rule identifier and the '=' sign.
The expressions separated by a slash character '/' are interpreted as follow: if the first expression does not match successfully, then the parser tries to match the second expression. If the parser matches none of the expressions, then the match fails.
For more information about how to defined rules in PEGJS grammar, you can have a look here.
It is possible to make a reference to a rule in another rule. For instance, the 'additive' rule has a reference to the 'multiplicative' rule. It is also possible to give an expression a label in the rule. Then this label is used to reference in the javascript code. For instance the label 'left' is a reference to the 'primary' expression in the 'additive' rule and is used in the javascript code as 'return left + right;'. The label is declared before the expression and separated from the expression by a column ':'.
It is also possible to give a rule a human-readable name. This is the case for the 'float' rule where the name 'a float' will be for instance displayed in case of error in the error message (see PART 4 for more details). The human-readable name should be declared between the rule identifier and the '=' sign.
The expressions separated by a slash character '/' are interpreted as follow: if the first expression does not match successfully, then the parser tries to match the second expression. If the parser matches none of the expressions, then the match fails.
For more information about how to defined rules in PEGJS grammar, you can have a look here.
PART 4: Integrating the parser
When the grammar is defined, then you need in to upload the file that describes the grammar and build the parser. You can upload the file with an XMLHttpRequest:
var req = new XMLHttpRequest(); req.open("GET", 'grammar/myGrammar.pegjs', true); req.onload = function(e) { var grammarInput = req.responseText; if ((grammarInput != null) && (PEG != null)) { window.pegjsmain.parser = PEG.buildParser(grammarInput); } }; req.send();
When the request is sent and the response is loaded, you can get the file text thanks to the 'req.responseText' instruction.
You are able to create a PEGJS parser with the grammar text by calling the 'buildParser' function on the 'PEG' object.
You are able to create a PEGJS parser with the grammar text by calling the 'buildParser' function on the 'PEG' object.
PART 5: Parsing a text input
The parser object returned by the 'PEG.buildParser' instruction provides an API for parsing some text input with the 'parse' function. The 'parse' function returns the output of the parsing if the parsing is successfull or throws a structured error as a json object if the parsing fails.
The json object thrown as an error when the parsing fails contains the following properties: message which is the error message, the name which is the error name, line and column which are respectively the line and the column where the parser failed to match an expression, expected which is an array of the expected patterns. Unfortunately it seems that there is no documentation about the error object properties, so if you want to get more information about them, you will need to run an application with PEGJS and define some break points in your favorite brower tool.
try { var output = this.parser.parse(content); this.displayResult(output, true); } catch(error) { this.displayResult(error, false); } displayResult : function(output, success) { var errorDisplay = document.getElementById('errorDisplay'); var resultDisplay = document.getElementById('resultDisplay'); if ((resultDisplay != null) && (errorDisplay != null)) { if (success) { errorDisplay.innerHTML = " "; errorDisplay.classList.add("hidden"); resultDisplay.innerHTML = output; resultDisplay.classList.remove("hidden"); } else { resultDisplay.innerHTML = " "; resultDisplay.classList.add("hidden"); errorDisplay.innerHTML = "ERROR: line " + output.line + ", column " + output.column + " : " + output.message; errorDisplay.classList.remove("hidden"); } } }
The json object thrown as an error when the parsing fails contains the following properties: message which is the error message, the name which is the error name, line and column which are respectively the line and the column where the parser failed to match an expression, expected which is an array of the expected patterns. Unfortunately it seems that there is no documentation about the error object properties, so if you want to get more information about them, you will need to run an application with PEGJS and define some break points in your favorite brower tool.
PART 6: Getting a working example
You can find a working example in my github repository (the pegjsapp folder).