Recursive descent parser
A
recursive descent parser is a top-down
parser built from a set of
mutually-recursive procedures (or a non-recursive equivalent) where each such
procedure usually implements one of the production rules of the
grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognises.
A
predictive parser is a recursive descent parser with no backup. Predictive parsing is possible only for the class of
LL(k) grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. (The
LL(k) grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an
LL(k) grammar.) A predictive parser runs in
linear time.
Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to
LL(k) grammars, but is not guaranteed to terminate unless the grammar is
LL(k). Even when they terminate, parsers that use recursive descent with backup may require
exponential time.
A
packrat parser is a modification of recursive descent with backup that avoids nontermination by remembering its choices, so as not to make exactly the same choice twice. A
packrat parser runs in
linear time, but usually requires more space than a predictive parser.
Although predictive parsers are widely used, programmers often prefer to create
LR or
LALR parsers via parser generators without transforming the grammar into
LL(k) form.
Some authors define the recursive descent parsers as the predictive parsers. Other authors use the term more broadly, to include recursive descent with backup and
packrat parsers
The following
EBNF grammar (for
Niklaus Wirth's
PL/0 programming language, from
Algorithms + Data Structures = Programs) is in
LL(1) form (for simplicity,
identand
number are assumed to be terminals):
program = block "." .
block =
["const" ident "=" number {"," ident "=" number} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .
statement =
[ident ":=" expression"call" ident | "begin" statement {";" statement} "end" | "if" condition "then" statement | "while" condition "do" statement ] .
condition = "odd" expression | "#"|"<"|"<="|">"|">=") expression .
expression = ["+"|"-"] term {("+"|"-") term} .
term = factor {("*"|"/") factor} .
factor = ident | number | "(" expression ")" .Terminals are expressed in quotes (except for identand number). Each nonterminal is defined by a rule in the grammar.
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, sym, which contains the next symbol from the input, and the global function getsym, which updates sym when called.
typedef enum {ident, number, lparen, rparen, times, slash, plus, minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon, endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma, varsym, procsym, period, oddsym} Symbol;
Symbol sym;void getsym(void);void error(const char msg[]);void expression(void);
int accept(Symbol s) { if (sym == s) { getsym(); return 1; } return 0;}
int expect(Symbol s) { if (accept(s)) return 1; error("expect: unexpected symbol"); return 0; }
void factor(void) { if (accept(ident)) {} else if (accept(number)) { ; } else if (accept(lparen)) { expression(); expect(rparen); } else { error("factor: syntax error"); getsym(); } }
void term(void) { factor(); while (sym slash) { getsym(); factor(); } }
void expression(void) {if (sym minus) getsym(); term(); while (sym minus) { getsym(); term(); } }
void condition(void) { if (accept(oddsym)) { expression(); } else { expression();if (sym neqsym == lss sym gtr | sym == geq) { getsym(); expression(); } else { error("condition: invalid operator"); getsym(); } } }
void statement(void) { if (accept(ident)) { expect(becomes); expression(); } else if (accept(callsym)) { expect(ident); } else if (accept(beginsym)) { do { statement(); } while (accept(semicolon)); expect(endsym); } else if (accept(ifsym)) { condition(); expect(thensym); statement(); } else if (accept(whilesym)) { condition(); expect(dosym); statement(); }}
void block(void) { if (accept(constsym)) { do { expect(ident); expect(eql); expect(number); } while (accept(comma)); expect(semicolon); } if (accept(varsym)) { do { expect(ident); } while (accept(comma)); expect(semicolon); } while (accept(procsym)) { expect(ident); expect(semicolon); block(); expect(semicolon); } statement();}
void program(void) { getsym(); block(); expect(period); }Although context-free grammars are commonly used to formalize the syntax of the language recognized by a recursive descent parser (as in the example above), an alternate and more direct way to formalize recursive descent parsing is via parsing expression grammars, which model the structure and behavior of typical recursive descent parsers directly.Recursive descent parsers are particularly easy to implement in functional languages such as Haskell. See Functional Pearls: Monadic Parsing in Haskell (pdf format).*Coco/R - a recursive descent parser generator *ANTLR - a recursive descent parser generator *CodeCodex: Recursive descent parsing - sample recursive descent parser generator code for C and Java* The Dragon book, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4. * Modern Compiler Implementation in Java, Second Edition, Andrew Appel, 2002, ISBN 0-521-82060-X. * Recursive Programming Techniques, W.H. Burge, 1975, ISBN 0-201-14450-6 * Crafting a Compiler with C, Charles N Fischer and Richard J LeBlanc, Jr, 1991, ISBN 0-8053-2166-7. * Parse: A versatile recursive descent Perl module. * pyparsing: A versatile recursive descent Python module. * Compiling with C# and Java, Pat Terry, 2005, ISBN/ISSN: 0-321-26360-X 624 * Algorithms + Data Structures = Programs, Niklaus Wirth, 1975, ISBN 0-13-022418-9 * Compiler Construction, Niklaus Wirth, 1996, ISBN 0-201-40353-6
|
|