0% found this document useful (0 votes)
58 views56 pages

Top-Down Parsing: CS164 Lecture 5-6

The document discusses top-down parsing and recursive descent parsing. It begins by explaining that top-down parsing constructs the parse tree from the top-down and left-to-right as terminals are encountered in the token stream. It then explains recursive descent parsing, where the parser recursively tries all production rules for the starting non-terminal to construct the parse tree. However, recursive descent parsing does not work for grammars with left recursion, as it can get stuck in an infinite loop. The document provides examples and discusses how to eliminate left recursion through grammar transformations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views56 pages

Top-Down Parsing: CS164 Lecture 5-6

The document discusses top-down parsing and recursive descent parsing. It begins by explaining that top-down parsing constructs the parse tree from the top-down and left-to-right as terminals are encountered in the token stream. It then explains recursive descent parsing, where the parser recursively tries all production rules for the starting non-terminal to construct the parse tree. However, recursive descent parsing does not work for grammars with left recursion, as it can get stuck in an infinite loop. The document provides examples and discusses how to eliminate left recursion through grammar transformations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

Top-Down Parsing

CS164
Lecture 5-6

Prof. Bodik CS 164 Lecture 5-6 1


Review

• A parser consumes a sequence of tokens s and


produces a parse tree
• Issues:
– How do we recognize that s  L(G) ?
– A parse tree of s describes how s  L(G)
– Ambiguity: more than one parse tree
(interpretation) for some string s
– Error: no parse tree for some string s
– How do we construct the parse tree?

Prof. Bodik CS 164 Lecture 5-6 2


Ambiguity

• Grammar
E  E + E | E * E | ( E ) | int

• Strings
int + int + int

int * int + int

Prof. Bodik CS 164 Lecture 5-6 3


Ambiguity. Example

This string has two parse trees


E E

E + E E + E

E + E int int E + E

int int int int

+ is left-associative
Prof. Bodik CS 164 Lecture 5-6 4
Ambiguity. Example

This string has two parse trees


E E

E + E E * E

E * E int int E + E

int int int int

* has higher precedence than +


Prof. Bodik CS 164 Lecture 5-6 5
Ambiguity (Cont.)

• A grammar is ambiguous if it has more than


one parse tree for some string
– Equivalently, there is more than one right-most or
left-most derivation for some string
• Ambiguity is bad
– Leaves meaning of some programs ill-defined
• Ambiguity is common in programming languages
– Arithmetic expressions
– IF-THEN-ELSE

Prof. Bodik CS 164 Lecture 5-6 6


Dealing with Ambiguity

• There are several ways to handle ambiguity

• Most direct method is to rewrite the grammar


unambiguously
EE+T|T
T  T * int | int | ( E )

• Enforces precedence of * over +


• Enforces left-associativity of + and *
Prof. Bodik CS 164 Lecture 5-6 7
Ambiguity. Example

The int * int + int has ony one parse tree now
E E

E + T E * E

T int int E + E

T * int int int

int
Prof. Bodik CS 164 Lecture 5-6 8
Ambiguity: The Dangling Else

• Consider the grammar


S  if E then S
| if E then S else S
| OTHER

• This grammar is also ambiguous

Prof. Bodik CS 164 Lecture 5-6 9


The Dangling Else: Example

• The expression
if E1 then if E2 then S3 else S4
has two parse trees
if if

E1 if S4 E1 if

E2 S3 E2 S3 S4

• Typically we want the second form


Prof. Bodik CS 164 Lecture 5-6 10
The Dangling Else: A Fix

• else matches the closest unmatched then


• We can describe this in the grammar (distinguish
between matched and unmatched “then”)

S  MIF /* all then are matched */


| UIF /* some then are unmatched */
MIF  if E then MIF else MIF
| OTHER
UIF  if E then S
| if E then MIF else UIF
• Describes the same set of strings
Prof. Bodik CS 164 Lecture 5-6 11
The Dangling Else: Example Revisited

• The expression if E1 then if E2 then S3 else S4


if if

E1 if E1 if S4

E2 S3 S4 E2 S3

• A valid parse tree • Not valid because the


(for a UIF) then expression is not
a MIF
Prof. Bodik CS 164 Lecture 5-6 12
Ambiguity

• No general techniques for handling ambiguity

• Impossible to convert automatically an


ambiguous grammar to an unambiguous one

• Used with care, ambiguity can simplify the


grammar
– Sometimes allows more natural definitions
– We need disambiguation mechanisms
Prof. Bodik CS 164 Lecture 5-6 13
Precedence and Associativity Declarations

• Instead of rewriting the grammar


– Use the more natural (ambiguous) grammar
– Along with disambiguating declarations

• Most tools allow precedence and associativity


declarations to disambiguate grammars

• Examples …

Prof. Bodik CS 164 Lecture 5-6 14


Associativity Declarations

• Consider the grammar E  E + E | int


• Ambiguous: two parse trees of int + int + int
E E

E + E E + E

E + E int int E + E

int int int int

• Left-associativity declaration: %left +


Prof. Bodik CS 164 Lecture 5-6 15
Precedence Declarations

• Consider the grammar E  E + E | E * E | int


– And the string int + int * int
E E

E * E E + E

E + E int int E * E

int int int int


• Precedence declarations: %left +
%left *
Prof. Bodik CS 164 Lecture 5-6 16
Review

• We can specify language syntax using CFG


• A parser will answer whether s  L(G)
• … and will build a parse tree
• … and pass on to the rest of the compiler

• Next:
– How do we answer s  L(G) and build a parse tree?

Prof. Bodik CS 164 Lecture 5-6 17


Top-Down Parsing

Prof. Bodik CS 164 Lecture 5-6 18


Intro to Top-Down Parsing

• Terminals are seen in order of A


appearance in the token
stream: t1 B t4
t1 t 2 t 3 t 4 t 5
C D
• The parse tree is constructed
– From the top t2 t3 t4
– From left to right

Prof. Bodik CS 164 Lecture 5-6 19


Recursive Descent Parsing

• Consider the grammar


ET+E|T
T  int | int * T | ( E )
• Token stream is: int5 * int2
• Start with top-level non-terminal E

• Try the rules for E in order

Prof. Bodik CS 164 Lecture 5-6 20


Recursive Descent Parsing. Example (Cont.)

• Try E0  T1 + E2
• Then try a rule for T1  ( E3 )
– But ( does not match input token int5
• Try T1  int . Token matches.
– But + after T1 does not match input token *
• Try T1  int * T2
– This will match but + after T1 will be unmatched
• Have exhausted the choices for T1
– Backtrack to choice for E0
Prof. Bodik CS 164 Lecture 5-6 21
Recursive Descent Parsing. Example (Cont.)

• Try E0  T1
• Follow same steps as before for T1
– And succeed with T1  int * T2 and T2  int
– With the following parse tree

E0

T1

int5 * T2

int2
Prof. Bodik CS 164 Lecture 5-6 22
Recursive Descent Parsing. Notes.

• Easy to implement by hand


– An example implementation is provided as a
supplement “Recursive Descent Parsing”

• But does not always work …

Prof. Bodik CS 164 Lecture 5-6 23


Recursive-Descent Parsing

• Parsing: given a string of tokens t1 t2 ... tn, find


its parse tree
• Recursive-descent parsing: Try all the
productions exhaustively
– At a given moment the fringe of the parse tree is: t1
t2 … tk A …
– Try all the productions for A: if A  BC is a
production, the new fringe is t1 t2 … tk B C …
– Backtrack when the fringe doesn’t match the string
– Stop when there are no more non-terminals
Prof. Bodik CS 164 Lecture 5-6 24
When Recursive Descent Does Not Work

• Consider a production S  S a:
– In the process of parsing S we try the above rule
– What goes wrong?

• A left-recursive grammar has a non-terminal S


S + S for some 

• Recursive descent does not work in such cases


– It goes into an loop
Prof. Bodik CS 164 Lecture 5-6 25
Elimination of Left Recursion

• Consider the left-recursive grammar


SS|

• S generates all strings starting with a  and


followed by a number of 

• Can rewrite using right-recursion


S   S’
S’   S’ | 
Prof. Bodik CS 164 Lecture 5-6 26
Elimination of Left-Recursion. Example

• Consider the grammar


S1|S0 (  = 1 and  = 0 )

can be rewritten as
S  1 S’
S’  0 S’ | 

Prof. Bodik CS 164 Lecture 5-6 27


More Elimination of Left-Recursion

• In general
S  S 1 | … | S n | 1 | … | m
• All strings derived from S start with one of 1,
…,m and continue with several instances of 1,
…,n
• Rewrite as
S  1 S’ | … | m S’
S’  1 S’ | … | n S’ | 

Prof. Bodik CS 164 Lecture 5-6 28


General Left Recursion

• The grammar
SA|
AS
is also left-recursive because
S + S  

• This left-recursion can also be eliminated


• See [ASU], Section 4.3 for general algorithm

Prof. Bodik CS 164 Lecture 5-6 29


Summary of Recursive Descent

• Simple and general parsing strategy


– Left-recursion must be eliminated first
– … but that can be done automatically
• Unpopular because of backtracking
– Thought to be too inefficient

• In practice, backtracking is eliminated by


restricting the grammar

Prof. Bodik CS 164 Lecture 5-6 30


Predictive Parsers

• Like recursive-descent but parser can


“predict” which production to use
– By looking at the next few tokens
– No backtracking
• Predictive parsers accept LL(k) grammars
– L means “left-to-right” scan of input
– L means “leftmost derivation”
– k means “predict based on k tokens of lookahead”
• In practice, LL(1) is used

Prof. Bodik CS 164 Lecture 5-6 31


LL(1) Languages

• In recursive-descent, for each non-terminal and


input token there may be a choice of production
• LL(1) means that for each non-terminal and
token there is only one production that could
lead to success
• Can be specified as a 2D table
– One dimension for current non-terminal to expand
– One dimension for next token
– A table entry contains one production

Prof. Bodik CS 164 Lecture 5-6 32


Predictive Parsing and Left Factoring

• Recall the grammar


ET+E|T
T  int | int * T | ( E )

• Impossible to predict because


– For T two productions start with int
– For E it is not clear how to predict

• A grammar must be left-factored before use for


predictive parsing
Prof. Bodik CS 164 Lecture 5-6 33
Left-Factoring Example

• Recall the grammar


ET+E|T
T  int | int * T | ( E )

• Factor out common prefixes of productions


ETX
X+E|
T  ( E ) | int Y
Y*T|

Prof. Bodik CS 164 Lecture 5-6 34


LL(1) Parsing Table Example

• Left-factored grammar
ETX X+E|
T  ( E ) | int Y Y*T|
• The LL(1) parsing table:
int * + ( ) $
T int Y (E)
E TX TX
X +E  
Y *T   

Prof. Bodik CS 164 Lecture 5-6 35


LL(1) Parsing Table Example (Cont.)

• Consider the [E, int] entry


– “When current non-terminal is E and next input is
int, use production E  T X
– This production can generate an int in the first
place
• Consider the [Y,+] entry
– “When current non-terminal is Y and current token
is +, get rid of Y”
– We’ll see later why this is so

Prof. Bodik CS 164 Lecture 5-6 36


LL(1) Parsing Tables. Errors

• Blank entries indicate error situations


– Consider the [E,*] entry
– “There is no way to derive a string starting with *
from non-terminal E”

Prof. Bodik CS 164 Lecture 5-6 37


Using Parsing Tables

• Method similar to recursive descent, except


– For each non-terminal S
– We look at the next token a
– And choose the production shown at [S,a]
• We use a stack to keep track of pending non-
terminals
• We reject when we encounter an error state
• We accept when we encounter end-of-input

Prof. Bodik CS 164 Lecture 5-6 38


LL(1) Parsing Algorithm

initialize stack = <S $> and next (pointer to tokens)


repeat
case stack of
<X, rest> : if T[X,*next] = Y1…Yn
then stack  <Y1… Yn rest>;
else error ();
<t, rest> : if t == *next ++
then stack  <rest>;
else error ();
until stack == < >

Prof. Bodik CS 164 Lecture 5-6 39


LL(1) Parsing Example

Stack Input Action


E$ int * int $ TX
TX$ int * int $ int Y
int Y X $ int * int $ terminal
YX$ * int $ *T
*TX$ * int $ terminal
TX$ int $ int Y
int Y X $ int $ terminal
YX$ $ 
X$ $ 
$ $ ACCEPT
Prof. Bodik CS 164 Lecture 5-6 40
Constructing Parsing Tables

• LL(1) languages are those defined by a parsing


table for the LL(1) algorithm
• No table entry can be multiply defined

• We want to generate parsing tables from CFG

Prof. Bodik CS 164 Lecture 5-6 41


Top-Down Parsing. Review

• Top-down parsing expands a parse tree from


the start symbol to the leaves
– Always expand the leftmost non-terminal
E

T E
+

int * int + int


Prof. Bodik CS 164 Lecture 5-6 42
Top-Down Parsing. Review

• Top-down parsing expands a parse tree from


the start symbol to the leaves
– Always expand the leftmost non-terminal
E
• The leaves at any point
T E form a string A
+
–  contains only terminals
T – The input string is b
int *
– The prefix  matches
– The next token is b
int * int + int
Prof. Bodik CS 164 Lecture 5-6 43
Top-Down Parsing. Review

• Top-down parsing expands a parse tree from


the start symbol to the leaves
– Always expand the leftmost non-terminal
E
• The leaves at any point
T E form a string A
+
–  contains only terminals
T – The input string is b
int * T
– The prefix  matches
int – The next token is b
int * int + int
Prof. Bodik CS 164 Lecture 5-6 44
Top-Down Parsing. Review

• Top-down parsing expands a parse tree from


the start symbol to the leaves
– Always expand the leftmost non-terminal
E
• The leaves at any point
T E form a string A
+
–  contains only terminals
T – The input string is b
int * T
– The prefix  matches
int int – The next token is b
int * int + int
Prof. Bodik CS 164 Lecture 5-6 45
Predictive Parsing. Review.

• A predictive parser is described by a table


– For each non-terminal A and for each token b we
specify a production A  
– When trying to expand A we use A   if b follows
next

• Once we have the table


– The parsing algorithm is simple and fast
– No backtracking is necessary

Prof. Bodik CS 164 Lecture 5-6 46


Constructing Predictive Parsing Tables

• Consider the state S * A


– With b the next token
– Trying to match b
There are two possibilities:
1. b belongs to an expansion of A
• Any A   can be used if b can start a string derived
from 
In this case we say that b  First()

Or…
Prof. Bodik CS 164 Lecture 5-6 47
Constructing Predictive Parsing Tables (Cont.)

2. b does not belong to an expansion of A


– The expansion of A is empty and b belongs to an
expansion of 
– Means that b can appear after A in a derivation of
the form S * Ab
– We say that b 2 Follow(A) in this case

– What productions can we use in this case?


• Any A   can be used if  can expand to 
• We say that   First(A) in this case
Prof. Bodik CS 164 Lecture 5-6 48
Computing First Sets

Definition First(X) = { b | X * b}  { | X * }


1. First(b) = { b }

2. For all productions X ! A1 … An


• Add First(A1) – {} to First(X). Stop if   First(A1)
• Add First(A2) – {} to First(X). Stop if   First(A2)
• …
• Add First(An) – {} to First(X). Stop if   First(An)
• Add  to First(X)

Prof. Bodik CS 164 Lecture 5-6 49


First Sets. Example

• Recall the grammar


ETX X+E|
T  ( E ) | int Y Y*T|
• First sets
First( ( ) = { ( } First( T ) = {int, ( }
First( ) ) = { ) } First( E ) = {int, ( }
First( int) = { int } First( X ) = {+,  }
First( + ) = { + } First( Y ) = {*,  }
First( * ) = { * }
Prof. Bodik CS 164 Lecture 5-6 50
Computing Follow Sets

Definition Follow(X) = { b | S *  X b  }
1. Compute the First sets for all non-terminals first
2. Add $ to Follow(S) (if S is the start non-terminal)

3. For all productions Y ! … X A1 … An


• Add First(A1) – {} to Follow(X). Stop if   First(A1)
• Add First(A2) – {} to Follow(X). Stop if   First(A2)
• …
• Add First(An) – {} to Follow(X). Stop if   First(An)
• Add Follow(Y) to Follow(X)

Prof. Bodik CS 164 Lecture 5-6 51


Follow Sets. Example

• Recall the grammar


ETX X+E|
T  ( E ) | int Y Y*T|
• Follow sets
Follow( + ) = { int, ( } Follow( * ) = { int, ( }
Follow( ( ) = { int, ( } Follow( E ) = {), $}
Follow( X ) = {$, ) } Follow( T ) = {+, ) , $}
Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $}
Follow( int) = {*, +, ) , $}
Prof. Bodik CS 164 Lecture 5-6 52
Constructing LL(1) Parsing Tables

• Construct a parsing table T for CFG G

• For each production A   in G do:


– For each terminal b  First() do
• T[A, b] = 
– If  * , for each b  Follow(A) do
• T[A, b] = 
– If  *  and $  Follow(A) do
• T[A, $] = 

Prof. Bodik CS 164 Lecture 5-6 53


Constructing LL(1) Tables. Example

• Recall the grammar


ETX X+E|
T  ( E ) | int Y Y*T|
• Where in the line of Y we put Y ! * T ?
– In the lines of First( *T) = { * }

• Where in the line of Y we put Y  ?


– In the lines of Follow(Y) = { $, +, ) }

Prof. Bodik CS 164 Lecture 5-6 54


Notes on LL(1) Parsing Tables

• If any entry is multiply defined then G is not


LL(1)
– If G is ambiguous
– If G is left recursive
– If G is not left-factored
– And in other cases as well
• Most programming language grammars are not
LL(1)
• There are tools that build LL(1) tables

Prof. Bodik CS 164 Lecture 5-6 55


Review

• For some grammars there is a simple parsing


strategy
– Predictive parsing

• Next: a more powerful parsing strategy

Prof. Bodik CS 164 Lecture 5-6 56

You might also like