0% found this document useful (0 votes)

20 views

S2 BottomUpParsing

Uploaded by

xsh7p9fjk8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

S2 BottomUpParsing

Uploaded by

xsh7p9fjk8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

LR (Bottom-up) Parsing:

Tokens to AST Tree

Naranker Dulay
[email protected]

https://www.doc.ic.ac.uk/~nd/compilers

n.dulay LR (Bottom-up) Parsing 1

Parsing - main idea
Parsing (syntax analysis) transforms a sequence of tokens into an abstract
syntax tree based on the (typically context free) grammar of the language

Context Free Grammar (CFG)

CFG rules take the form rule ® (rule|token)*

Tokens Parser Abstract Syntax Tree (AST)

IF
LPAREN if
INT(24)
RPAREN
IDENT('A') 24 A B
ELSE
IDENT('B')
n.dulay LR (Bottom-up) Parsing 2
LR Parsing - main idea

LR parsing is also LR Grammar

known as:
(i) bottom-up parsing,
reflecting the direction
the AST is constructed,
LR Parser Generator
(ii) shift-reduce
parsing, reflecting the
two main actions
performed by an LR Parsing Table
parser.

Abstract Syntax
Tokens LR Parser
Tree (AST)

n.dulay LR (Bottom-up) Parsing 3

0 S

DFA to
S’ ● S, $ S’ S ●, $ 1
S ● id, $
S ● V = E, $ id S id ●, $
2

Parsing
V ● id, = V id ●, =
V
= E
V ● = E, $ S V = ● E, $ V = E ●, $
Table
S S 5
E ● int, $ int
3 E ● V, $ E int ●, $ 6
V ● id, $ V
4 E V ●, $ 7
id
V id ●, $ 8

Grammar State ACTION GOTO

to DFA
id int = $ E V S
0 S2 G3 G1
1 A
2 R3 R1
3 S4
r1: S ® id 4 S8 S6 G5 G7
r2: S ® V ‘=’ E 5 R2
r3: V ® id
6 R5
r4: E ® V
7 R4
r5: E ® int
8 R3

n.dulay LR (Bottom-up) Parsing 4

Parsers for Context-Free Grammars
LL and LR grammars are LL(k): Left-to-right scan, Leftmost-derivation,
subsets of CFGs that we k-token look-ahead
can parse in O(n) time LL(0) No such thing.
instead of O(n3) for full
CFGs. LL(1) Most popular LL. Can implement as an
automaton or as a recursive descent parser
k=2 Sometimes useful
k>2 Rarely needed
LL - Top-Down Parsers
LR(k): Left-to-right scan, Rightmost-derivation (in
n Build AST from root reverse), k-token look-ahead
to leaves.
LR(0) Weak. Useful for learning about LR.
SLR(1) Stronger, superseded by LALR(1)
LALR(1) Fast and popular. Close in power to LR(1)
LR - Bottom-Up Parsers
LR(1) Powerful -- needs more memory
n Build AST from leaves k => 2 Possible, but rarely needed/used
to root.
LR(n) is a superset of an LL(n) and hence more powerful.

n.dulay LR (Bottom-up) Parsing 5

Chomsky Hierarchy
In the following
R,S are non-terminals (the name of a rule/production, non-leaf node in AST)
t is a sequence of terminals (tokens from lexical analyser, leaf nodes in AST)
a, b, j are sequences of terminals and non-terminals

type 3 – regular grammars (Deterministic Finite Automata)

R ® t

type 2 – context free grammars (Pushdown Automata)

R ® a

type 1 – context sensitive grammars (Linear Bounded Automata)

a R b ® a j b

type 0 – unrestricted grammars/recursively enumerable (Turing Machines)

a ® b

n.dulay LR (Bottom-up) Parsing 6

LR Parsers

Most programming languages can be described by a Context-Free

Grammar (CFG) and virtually all CFGs for programming languages can
be described by an LR grammar and implemented with an LR parser.

LR Parsers are too complex to build by hand. We need an LR parser-

generator e.g. yacc/bison, SableCC, CUP, Beaver. The principles of LR
operation are “relatively easy” to understand however.

LR Parsers can be implemented efficiently in O(n) time.

n.dulay LR (Bottom-up) Parsing 7

LR Grammar to Parsing Table

There are several methods for generating LR parsing tables. The methods
essentially define parsers/grammars of different expressive power, e.g. LR(0),
SLR(1), LALR(1), LR(1), LR(2), LR(3) etc. Less powerful methods may fail to
produce a parsing table for a given context-free grammar.

The parsers resulting from these methods are similar - in each case their states
consist of so-called LR(k) items of the grammar, but vary in the number of
states and/or in the size of their lookahead sets.

LR(k) Grammar LR(k) Items

NFA DFA ParsingTable

n.dulay LR (Bottom-up) Parsing 8

LR(0) Items

LR(0) parsers are the simplest LR parsers. They don’t use the current token in
order to perform a reduction.

LR(0) items are instances of the rules of the grammar with a dot • in some
position on the right hand side of the rule.

An item indicates how much of a rule has been seen, e.g. the item
E ® E + • int indicates that we have seen an expression and a plus sign
and hope to encounter an integer next. LR(0) items thus indicate
intermediate steps in the recognition of the RHS of a rule.

Example: The rule X ® ABC has 4 LR(0) items

X ® • ABC Initial item

X ® A • BC
X ® AB • C
X ® ABC • Complete (or Reduce) Item

n.dulay LR (Bottom-up) Parsing 9

Example
For LR, we add an auxiliary rule
The Grammar with an end-of-input symbol $. $ is
implied if omitted.
E’® E $
rule r1: E ® E ‘+’ int -- rules labelled for parsing table
rule r2: E ® int

has 8 LR(0) items:

E’® • E Auxiliary rule, Initial item, $ is implied

E’® E • Complete Item or Reduce Item
E ® • E + int Initial item
E ® E • + int
E ® E + • int
E ® E + int • Complete Item
E ® • int
E ® int • Complete Item

n.dulay LR (Bottom-up) Parsing 10

LR(0) to NFA

LR(0) items are used as states of a finite automaton that maintains information
about the progress of a shift-reduce parse.

We build a NFA from the LR(0) items and from this NFA we can build a DFA
using the subset construction.

NFA Transitions: Given a state with item X ® A • BC, add a B transition

and a new state as follows:
B
X → A • BC X → AB • C

In addition, if B is a rule (non-terminal) then foreach initial item B ® • D

add an ε transition and a new state as follows:

X → A • BC
ε B→•D

n.dulay LR (Bottom-up) Parsing 11

Example
Grammar
E’ → • E
E
E’® E
E’ → E • ε
r1: E ® E ‘+’ int
r2: E ® int
E → • E + int ε
E
Items
E → E • + int ε
E’® • E +
E’® E • E → E + • int ε
E ® • E + int int
E ® E • + int E → E + int •
E ® E + • int
E ® E + int • E → • int
E ® • int
int
E ® int •
E → int •

n.dulay LR (Bottom-up) Parsing 12

NFA to DFA
E’ → • E E’ → • E 0
E E → • E + int
E’ → E • ε E → • int
E
1
E → • E + int ε E’ → E •
E E → E • + int
+ int
E → E • + int ε 2
E → E + • int
+
E → E + • int ε int
3
E → E + int •
int
E → E + int •
4
E → int •
E → • int
int
Note: Since it’s easy to determine
E → int • the ε-closures from items, we'll Number states
construct the DFA directly from
now on. from 0
n.dulay LR (Bottom-up) Parsing 13
DFA to LR(0) Parsing Table
T
For each terminal transition X ® Y add P [X, T] = sY (shift Y)
N
For each non-terminal transition X ® Y add P [X, N] = gY (goto Y)

For each state X containing the item R' ® .. • add P [X, $] = a (accept)

For each state X containing an item R ® .. • add P [X, T] = rN (reduce)

for every terminal T where N is R’s rule number

Note: State ACTION GOTO

For LR(0) parsers if there is more
int + $ E
than one action for a table cell
then the grammar is not LR(0), 0 s4 g1
e.g. we may have a shift and a 1 s2 a
reduce (shift-reduce conflict) or 2 s3
two reduces (reduce-reduce
conflict). Blank cells indicate an 3 r1 r1 r1
error. 4 r2 r2 r2

n.dulay LR (Bottom-up) Parsing 14

Model of an LR parser
Tokens T0 T1 ... Tk ... Tn $

Stack
Top State
Parsing Table
Top-1 State LR Parser (shift, reduce,
goto, accept,
error)

Bottom State Note: This a Pushdown Automaton for LR(1) parsing

Operation: We push state 0 onto the stack and then repeatedly perform:
SWITCH ParsingTable[Stack[Top],CurrentToken]
shift Sn: Push state n onto the Stack, Advance Current Token
reduce Rn: Remove L elements from the Stack where L=length of RHS of rule n.
Push ParsingTable[Stack[Top],LHS of R]; i.e. the “Goto” action
reduce also generates an AST node for the rule
accept a: Accept input stream (i.e. parse was successful)
error: Report Error
goto Gn: This case is not selected directly. Looked-up in the reduce case
above.
n.dulay LR (Bottom-up) Parsing 15
Example
For LR, we add an auxiliary rule
Grammar E’ ® E $ with an end-of-input symbol $. $
is implied if omitted
rule r1: E ® E ‘+’ int -- rules labelled for parsing table
rule r2: E ® int

LR(1) Parsing Table

STATE ACTION (Terminals) GOTO (Non-Terminals)

int + $ E
0 s2 g1
1 s3 a
2 r2 r2
3 s4
4 r1 r1
Abbreviations: s (shift), r (reduce), a (accept), g (goto), blank (error)
Note: s and g are followed by a state number, r is followed by a rule number.
n.dulay LR (Bottom-up) Parsing 16
Example Contd LR(1) Parsing Table
State ACTION GOTO

int + $ E
E’® E $ 0 s2 g1
r1: E ® E ‘+’ int 1 s3 a
r2: E ® int 2 r2 r2
3 s4
4 r1 r1

Stack Tokens Action

0 int + int $ T[0,int] = s2
E
0 2 + int $ T[2,+] = r2, pop 1 elem, push T[0,E] r1
0 1 + int $ T[1,+] = s3
E
0 1 3 int $ T[3,int] = s4
r2
0 1 3 4 $ T[4,$] = r1, pop 3 elems, push T[0,E]
0 1 $ T[1,$] = a int + int $
n.dulay LR (Bottom-up) Parsing 17
LR(1) and LALR(1) Parsers

LR parsers essentially differ in how they perform reduction:

LR(0) A reduce item X ® A• always causes a reduction (the current token is

not used in LR(0) parsing)
SLR(1) A reduce item X ® A• causes a reduction only if the current token is
in FOLLOW(X)
LR(1) A reduce item X ® A•, t causes a reduction only if the current token
is equal to the look-ahead token t
LALR(1) Like LR(1) but we combine LR(1) states that differ in the look-ahead
token of the item only.

LR(1) is the most powerful of these methods and sufficient for most programming
languages. Its disadvantage is that it requires large parsing tables. LALR(1) is a
modification of LR(1) that retains virtually all the benefits of LR(1), while producing
much smaller parsing tables. We’ll look at LR(1) first, then LALR(1).

n.dulay LR (Bottom-up) Parsing 18

SLR(1) Optional Material

SLR(1) parsers are similar to LR(0) parsers except that we only add reduce
actions into a SLR(1) parsing table “if appropriate”:

SLR(1): Add a reduce action for an item A ® B • only if the current

token can FOLLOW A somewhere in the grammar.

An SLR(1) parsing table has the same states and gotos as an LR(0)
parsing table, only the actions differ.

This change increases the expressiveness of SLR(1) parsers over LR(0) parsers
considerably and many programming language constructs can be parsed by
SLR(1) that LR(0) cannot parse. We won’t look at SLR parsers in any further
detail but rather consider the more powerful LR(1) and LALR(1) parsers.

n.dulay LR (Bottom-up) Parsing 19

FIRST and FOLLOW Sets

The FIRST set for a sequence of rules (non-terminals) and tokens (terminals)
a, is the set of all tokens that could start a derivation of a, plus ε if a could
derive ε.

FIRST(a) = {t:(a Þ* tb)} È (if a Þ* ε then {ε} else {})

Þ* zero or more derivations

Þ+ one or more derivations
We say that a non-terminal A is NULLABLE if A Þ* ε

The FOLLOW set for a rule (non-terminal) A, is the set of all tokens
(terminals) that could immediately follow A, plus $ if A can end the input.

FOLLOW(A) = {t:(S Þ+ aAtb)} È (if S Þ* aA then {$} else {})

where S is the start symbol

n.dulay LR (Bottom-up) Parsing 20
FIRST Set

The FIRST set for a token T is {T}, for ε is {ε}

The FIRST set for a rule A, is the set of all tokens that could start a derivation
of A, plus ε if A could derive ε
foreach alternative of A of the form A ® b1 b2 ... bn
Include FIRST(b1) - {ε} in FIRST(A)
If ε in FIRST(b1)
Include FIRST (b2) - {ε} in FIRST(A)
If ε in FIRST(b2)
...
Include FIRST (bn) - {ε} in FIRST(A)
If ε in FIRST(bn)
Include ε in FIRST(A)

The FIRST set for a sequence ABC.. is formed as for the rule case above.

n.dulay LR (Bottom-up) Parsing 21

FIRST Set Example
Expr’ ® Expr $
Expr ® Term Expr2
Expr2 ® ‘+’ Term Expr2 | ε
Term ® Factor Term2
Term2 ® ‘*’ Factor Term2 | ε
Factor ® ‘(’ Expr ‘)’ | id

FIRST(Factor) = { (, id }
FIRST(Term2) = { *, ε }
FIRST(Term) = FIRST(Factor) = { (, id }
FIRST(Expr2) = { +, ε }
FIRST(Expr) = FIRST(Term) = { (, id }

n.dulay LR (Bottom-up) Parsing 22

FOLLOW Sets

The FOLLOW set for a rule A, is the set of all tokens that could immediately
follow A. Note: ε is never in the FOLLOW set.

(i) Foreach derivation of the form B ® C A D

Include FIRST(D) - {ε} in FOLLOW(A)
if D Þ* ε include FOLLOW(B) in FOLLOW(A)

This last if can be written

if {ε} in FIRST(D) include FOLLOW(B) in FOLLOW(A)

Note: C & D can be empty. If D is empty then we also include FOLLOW(B)

in FOLLOW(A)

(ii) If a rule A can end the input we include $ in FOLLOW(A)

Recall we use $ as an End-of-Input token.

n.dulay LR (Bottom-up) Parsing 23

FOLLOW Set Example
Expr’ ® Expr $
Expr ® Term Expr2
Expr2 ® ‘+’ Term Expr2 | ε
Term ® Factor Term2
Term2 ® ‘*’ Factor Term2 | ε
Factor ® ‘(’ Expr ‘)’ | id

FOLLOW(Expr) = { ), $ }
FOLLOW(Expr2) = FOLLOW(Expr) + FOLLOW(Expr2) = { ), $ }
FOLLOW(Term) = FIRST(Expr2) + FOLLOW(Expr) + FOLLOW(Expr2)
= { +, ), $ }
FOLLOW(Term2) = FOLLOW(Term) + FOLLOW(Term2) = { +, ), $ }
FOLLOW(Factor) = FIRST(Term2) + FOLLOW(Term) + FOLLOW(Term2)
= { *, +, ), $ }
n.dulay LR (Bottom-up) Parsing 24
LR(1) Items

An LR(1) item is a pair [ LR(0)item, t ]

(i) a LR(0) item as before, plus

(ii) a look-ahead token t

Intuitively the LR(1) item

[ X ® A • B C, t ] [ X ® A • B C, t ]

indicates that A is on top of the stack and that we only want to

recognise B if B is followed by a string derivable from Ct i.e. if the
current token is in FIRST(Ct)

n.dulay LR (Bottom-up) Parsing 25

LR(1) NFA Transitions
Given a state with the following LR(1) item: [X ® A•BC, t] add the
following NFA transition and state
B
[X → A • BC, t] [X → AB • C, t]

Also if B is a rule (non-terminal), then for every rule B ® D add an ε-

transition and a new state for every token u in FIRST(Ct) i.e add a transition
iff the string following B is derivable from Ct.

[X → A • BC, t]
ε [B → • D, u]

We include the initial item for the auxiliary rule i.e. [X’ ® •X, $] to start
the construction of the DFA, which proceeds as before.

n.dulay LR (Bottom-up) Parsing 26

Example
Grammar
0
S’➝ ● S, $
S ® id | V ‘=’ E ε
V ® id
E ® V | int S ➝ ● id, $
S ➝ ● V = E, $

Grammar Expanded
ε

V ➝ ● id, =
S’® S $
S ® id
S ® V ‘=’ E
V ® id 0
S’➝ ● S, $
E ® V S ➝ ● id, $
E ® int S ➝ ● V = E, $
V ➝ ● id, =

n.dulay LR (Bottom-up) Parsing 27

Example contd (1)
Grammar

0 S
S’® S $ S’➝ ● S, $ S’➝ S ●, $ 1
S ® id S ➝ ● id, $
S ➝ ● V = E, $ id S ➝ id ●, $
S ® V ‘=’ E V ➝ ● id, = V ➝ id ●, = 2
V ® id V
=
E ® V S ➝ V ● = E, $ S ➝ V = ● E, $
E ® int ε
3

E ➝ ● int, $
E ➝ ● V, $ 4
ε

V ➝ ● id, $

n.dulay LR (Bottom-up) Parsing 28

Example contd (2)
Grammar

0 S
S’® S $ S’➝ ● S, $ S’➝ S ●, $ 1
S ® id S ➝ ● id, $
S ➝ ● V = E, $ id S ➝ id ●, $
S ® V ‘=’ E V ➝ ● id, = V ➝ id ●, = 2
V ® id V
=
E ® V S ➝ V ● = E, $ S ➝ V = ● E, $
E ® int E ➝ ● int, $
3 E ➝ ● V, $
V ➝ ● id, $
4

n.dulay LR (Bottom-up) Parsing 29

Example Completed

0 S
S’➝ ● S, $ S’➝ S ●, $ 1
S ➝ ● id, $
S ➝ ● V = E, $ id S ➝ id ●, $
V ➝ ● id, = V ➝ id ●, = 2
V
= E
S ➝ V ● = E, $ S ➝ V = ● E, $ S ➝ V = E ●, $ 5
E ➝ ● int, $ int
3 E ➝ ● V, $ E ➝ int ●, $ 6
V ➝ ● id, $ V
4 E ➝ V ●, $
Grammar 7
id
S’® S $ V ➝ id ●, $ 8
S ® id
S ® V ‘=’ E
V ® id
E ® V
E ® int
n.dulay LR (Bottom-up) Parsing 30
Parsing Table for Example
r1: S ® id Remember for LR(1) we only perform a reduction
r2: S ® V ‘=’ E X ® A•, t if the current token is t
r3: V ® id
r4: E ® V
r5: E ® int
State ACTION GOTO
id int = $ E V S
0 s2 g3 g1
1 a
2 r3 r1
3 s4
4 s8 s6 g5 g7
5 r2
6 r5
7 r4
8 r3

n.dulay LR (Bottom-up) Parsing 31

LALR(1) Parsers

LALR(1) parsers are like LR(1) parsers but we perform an optimisation where we
merge LR(1) states that have the same LR(0) items but differ in their look-ahead
token.
An LALR item thus consists of an LR(0) item and a set of look-ahead tokens:

An LALR(1) item is a pair [ LR(0) item, {t0, .., tn} ]

Because of this merging, it is possible for LALR(1) parsing to produce spurious

reductions before an error is detected, whereas LR(1) parsing would detect the error
immediately, for example, for the grammar:
A ® ‘(‘ A ‘)’
A ® ‘+’
The input string +) will be caught immediately after shifting + in LR(1) while
LALR(1) will shift + and then do the reduction A ® + before catching the error if
the look-ahead set for A ® +• is { $, ) }

n.dulay LR (Bottom-up) Parsing 32

LALR(1) Example

In the following example, LR(1) states 2 and 3 have the same LR(0) items. We
can therefore combine them

int X ➝ E ● T, =
2
V ➝ id ●, + int
X ➝ E ● T, =,$
Y 23
V ➝ id ●, +,/
Y X ➝ E ● T, $
V ➝ id ●, / 3

LR(1) States LALR(1) State

Note: It is possible to construct the DFA of LALR(1) items directly from

the DFA of LR(0) items by using a technique known as propagating
lookaheads. For details see Aho p240 (Optional).

n.dulay LR (Bottom-up) Parsing 33

Ambiguity and Conflicts
Ambiguous grammars (i.e. where we can derive more than 1 parse tree
for some input) cannot be LR(k) for any k. An LR(k) parser will reach a
state where given the entire stack contents and the next k input tokens it
cannot decide whether to shift or reduce.
Ambiguity can be removed by rewriting the grammar, or handled by
augmenting the grammar with additional features that allow the parser to
resolve the ambiguity, e.g. by defining the associativity and precedence of
operators.

A ® id
A ® id ‘[‘ Expr ’]’ Do we shift or do we reduce? => Shift-Reduce conflict

Expr ® id Which reduce should we perform?

Var ® id => Reduce-Reduce conflict

n.dulay LR (Bottom-up) Parsing 34

Shift-Reduce Conflicts
Consider the following grammar: S ® if E then S else S
S ® if E then S
S ® id

What is the interpretation for the input: if a then if b then c else d

In a LR parsing table we will have a state with LR(0) items:
[S ® if E then S • ] and
[S ® if E then S • else S ]

If we shift our interpretation will be: if a then if b then c else d

On the other hand if we reduce our interpretation becomes:

if a then if b then c else d

LR parser generators give priority to shifting. This is analogous to matching the

longest sequence in Lexical Analysis. If this is not appropriate or we want to
remove the shift-reduce conflict, then we need to rewrite the grammar.

n.dulay LR (Bottom-up) Parsing 35

Shift-Reduce Conflicts Contd.

We can remove a shift-reduce conflict by rewriting the grammar e.g.:

Consider the grammar:

S ® if E then S else S
S ® if E then S
S ® other
E ® 0 | 1

We can rewrite this grammar as:

S ® MatchedS
MatchedS ® if E then MatchedS else MatchedS
MatchedS ® other
S ® UnMatchedS
UnMatchedS ® if E then S
UnMatchedS ® if E then MatchedS else UnMatchedS

This works by permitting only a matched statement to come before an else in an

if statement, forcing all else-parts to become matched as soon as possible.
n.dulay LR (Bottom-up) Parsing 36
Reduce-Reduce Conflicts

A reduce-reduce conflict occurs if there is a choice of reducing by 2 (or more) rules.

A common disambiguating rule used by LR parser generators for reduce-reduce

conflicts is to give priority to the earliest rule listed in the grammar source file,
alternatively we can rewrite the grammar.

Precedence. The following grammar is ambiguous because it doesn’t capture the

associatively or precedence of + and *

Expr ® Expr ‘+’ Expr | Expr ‘*’ Expr | ‘(‘ Expr ‘)’ | int

We can rewrite this unambiguously as:

Expr ® Expr ‘+’ Term | Term

Term ® Term ‘*’ Factor | Factor
Factor ® ‘(‘ Expr ‘)’ | int

n.dulay LR (Bottom-up) Parsing 37

Error Recovery (Optional Material)

On encountering an error entry, we can either stop and report a single error to the
programmer, or we can attempt to recover from the error and so allow parsing to
continue and further errors to be reported.
Many error recovery schemes exist, some attempt recovery by scanning forward in the
token stream, some modify the parse stack, others do both.

Example. YACC (Yet-Another Compiler-Compiler, Bison in GNU) supports extra

Error-Recovery Rules that tell the parser how to resynchronise on error, e.g.

Expr ® ‘(‘ error ‘)’

Expressions ® error ‘;’ Expression

The above rules specify that if a syntax error is encountered in an expression that the
parser should skip to the next ) or the next ; In YACC, error is a special terminal
symbol but has shift actions entered into the parsing table as if it were a normal
token. See Appel p79 or Aho p264 for more details.

n.dulay LR (Bottom-up) Parsing 38

Parse Trees
In a parse tree, we build leaf nodes for each input token (when we shift) and
non-leaf nodes for each rule (when we reduce).

LPAREN

INT(24) Factor Term Expr

RPAREN If-Statement

IDENT("A") Statement

ELSE
Else-Part
IDENT("B") Statement

n.dulay LR (Bottom-up) Parsing 39

Abstract Syntax Trees (ASTs)
Although parse trees fully represent the syntactic structure of the input, they
hold unnecessary information and are inconvenient to work with. For example
we don’t normally care about keywords and bracketing symbols in the later
stages (semantic analysis, code generation) of the compiler. An Abstract
Syntax Tree (AST) is a refined version of the parse tree that retains only the
information that is useful for later stages of the compiler, or for error reporting
or at runtime. For example the AST for an if statement might be:

INT(24)

IDENT("A") If-Statement

IDENT("B")

n.dulay LR (Bottom-up) Parsing 40

Abstract Syntax Trees Contd.
ASTs can be constructed either with a separate pass, or by attaching AST
construction code directly to grammar rules, e.g.

Statement ® print ‘(‘ Expr:E ‘)’ | return new PrintNode (E)

Statement ® ident:D ‘=‘ Expr:E | return new AssignNode(D,E)
Statement ® if ‘(‘ Expr:E ’)’ Statement:S | return new IfNode(E, S, null)
Statement ® if ‘(‘ Expr:E ‘)’ Statement:S1 else Statement:S2
| return new IfNode (E, S1, S2)
Statement ® Statement:S1 ‘;’ Statement:S2 | return new StatListNode (S1, S2)
Expr ® Expr:E Op:O Term:T | return new ExprNode (E, O, T)
Expr ® Term:T | return T
Term ® ‘(‘ Expr:E ‘)’ | return E
Term ® ident:D | return new IdNode (D.stringvalue)
Term ® int:N | return new IntNode (N.intvalue)
Op ® ‘+’ | return new ArithOpNode (‘+’)
Op ® ‘-’ | return new ArithOpNode (‘-’)

n.dulay LR (Bottom-up) Parsing 41

Summary

Bottom-up parsing is a very powerful technique for recognising the syntax

of programming languages. Although more complicated to understand than
top-down parsers, it is popular since LR grammars are usually simpler and
more natural than LL grammars, and LR parser generators can produce
efficient parsers with good error reporting.

For more details on bottom up parsing and further exercises see

[Cooper – Chapter 3]
[Appel – Chapter 3]
[Aho – Chapter 4]

n.dulay LR (Bottom-up) Parsing 42

Flex-Bison (C++) Example
(OPTIONAL MATERIAL)

n.dulay LR (Bottom-up) Parsing 43

Grammar

program ® statement_sequence

statement_sequence ® statement | statement_sequence ';' statement

statement ® if_statement | while_statement | assign_statement

assign_statement ® identifier ':=' expression

if_statement ® 'if' expression 'then' statement_sequence 'end'

while_statement ® 'while' expression 'do' statement_sequence 'end’

expression ® expression '+' expression

n.dulay LR (Bottom-up) Parsing 44

scanner.l

%{
#include <string>
#include "astnode.h"
#include "parser.hpp"
%}

digit [0-9]
number {digit}+
letter [a-zA-Z]
identifier {letter}+
whitespace [ \t\n]+
comment //[^\n]*\n

• For this example we’ll use C++ code with our bison grammar.
astnode.h will hold C++ class definitions for our AST nodes.
• parser.hpp is the include file generated by bison-c++.

n.dulay LR (Bottom-up) Parsing 45

scanner.l

if return IF;
then return THEN;
else return ELSE;
while return WHILE;
do return DO;
":=" return BECOMES;
"=" return EQUALS;
"<" return LESSTHAN;
"+" return PLUS;
"-" return MINUS;
"(" return LPAREN;
")" return RPAREN;
";" return SEMICOLON;

n.dulay LR (Bottom-up) Parsing 46

scanner.l

{number} yylval.string = new std::string(yytext,yyleng);

return INTEGER;
{identifier} yylval.string = new std::string(yytext,yyleng);
return IDENTIFIER;
{whitespace} { }
. return ERROR;

• To keep our flex code very simple, token attributes (e.g. for numbers) will be
generated in bison. We’ll just pass a C++ string.

n.dulay LR (Bottom-up) Parsing 47

parser.y

#include "astnode.h"
#include <cstdio>
#include <cstdlib>

extern int yylex(); /* Lexical analyser generated by flex */

void yyerror(const char *message) {

std::printf("Error: %s\n", message);
std::exit(1);
}

StatSeq ast; / Pointer to root of Abstract Syntax Tree */

n.dulay LR (Bottom-up) Parsing 48

parser.y

%union {
int token;
std::string *string;
Identifier *id;
Expression *expression;
Statement *statement;
StatSeq *statseq;
ASTnode *node;
}

%token <token> IF THEN ELSE WHILE DO READ WRITE

%token <token> BECOMES EQUALS LESSTHAN PLUS MINUS
%token <token> LPAREN RPAREN SEMICOLON
%token <string> INTEGER ERROR
%token <id> IDENTIFIER

n.dulay LR (Bottom-up) Parsing 49

parser.y

%type <id> identifier

%type <statseq> program statement_sequence
%type <statement> statement begin_statement assign_statement
%type <statement> if_statement while_statement
%type <expression> expression number
%type <token> comparator

/* Associativity of operators */
%left PLUS MINUS

/* Start symbol. Will default to first non-terminal symbol */

%start program

n.dulay LR (Bottom-up) Parsing 50

parser.y

program:
statement_sequence
{ ast = $1; }
;

statement_sequence:
statement
{ $$ = new StatSeq();
$$->statements.push_back($1);
}
| statement_sequence SEMICOLON statement
{ $1->statements.push_back($3);
}
;

statement:
if_statement { $$ = $1; }
| while_statement { $$ = $1; }
| assign_statement { $$ = $1; }
;
n.dulay LR (Bottom-up) Parsing 51
parser.y

if_statement:
IF expression THEN statement_sequence END
{ $$ = new IfStatement(*$2, *$4); }
;

while_statement:
WHILE expression DO statement_sequence END
{ $$ = new WhileStatement(*$2, *$4); }
;

assign_statement:
identifier BECOMES expression
{ $$ = new Assignment(*$1, *$3); }
;

n.dulay LR (Bottom-up) Parsing 52

parser.y

expression:
identifier
{ $$ = $1; }
| number
{ $$ = $1; }
| expression PLUS expression
{ $$ = new Operator(*$1, $2, *$3); }
| expression MINUS expression
{ $$ = new Operator(*$1, $2, *$3); }
| expression comparator expression
{ $$ = new Operator(*$1, $2, *$3); }
| LPAREN expression RPAREN
{ $$ = $2; }
;

n.dulay LR (Bottom-up) Parsing 53

parser.y

identifier:
IDENTIFIER
{ $$ = new Identifier(*$1);
delete $1;
}
;

number:
INTEGER
{ $$ = new Number(atol($1->c_str()));
delete $1;
}
;

comparator:
EQUALS
| LESSTHAN
;

n.dulay LR (Bottom-up) Parsing 54

astnode.h

#include <vector>
#include <iostream>

class ASTnode {
public:
virtual ~ASTnode() {}
virtual void semantic_check () {}
};

class Expression : public ASTnode {};

typedef std::vector<Expression*> ExpressionList;

class Statement : public ASTnode {};

typedef std::vector<Statement*> StatementList;

class StatSeq: public Statement {

public:
StatementList statements;
StatSeq () {}
};
n.dulay LR (Bottom-up) Parsing 55
astnode.h

class IfStatement : public Statement {

public:
Expression& expr;
StatSeq& thenS;

IfStatement(Expression& expr, StatSeq& thenS)

: expr(expr), thenS(thenS) {}
};

class WhileStatement : public Statement {

public:
Expression& expr;
StatSeq& doS;

WhileStatement(Expression& expr, StatSeq& doS)

: expr(expr), doS(doS) {}
};

n.dulay LR (Bottom-up) Parsing 56

astnode.h

class Assignment : public Statement {

public:
Identifier& var;
Expression& expr;
Assignment(Identifier& var, Expression& expr)
: var(var), expr(expr) {}
};

class Number : public Expression {

public:
int value;
Number(int value) : value(value) {}
};

class Identifier : public Expression {

public:
std::string id;
Identifier(const std::string& id) : id(id) {}
};

n.dulay LR (Bottom-up) Parsing 57

main.cpp

#include <iostream>
#include "astnode.h"

using namespace std;

extern int yyparse();

extern StatSeq* ast;

int main(int argc, char **argv) {

yyparse();
return 0;
}

n.dulay LR (Bottom-up) Parsing 58

makefile

all: parser
OBJS = scanner.o parser.o main.o
CPPFLAGS =

parser.cpp: parser.y
bison -d –v -o $@ $^

parser.hpp: parser.cpp

scanner.cpp: scanner.l parser.hpp

flex -o $@ $^

%.o: %.cpp
g++ -c $(CPPFLAGS) -o $@ $<

parser: $(OBJS)
g++ -o $@ $(OBJS)

n.dulay LR (Bottom-up) Parsing 59

Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Aveo 2009 Ecotec ECM Connector PDF
No ratings yet
Aveo 2009 Ecotec ECM Connector PDF
4 pages
CS346 Bottom Up Parser
No ratings yet
CS346 Bottom Up Parser
64 pages
LR(0) and SLR(0)
No ratings yet
LR(0) and SLR(0)
102 pages
Compilef Design Unit 2 AKTU As Per 2023-24 Syllabus
No ratings yet
Compilef Design Unit 2 AKTU As Per 2023-24 Syllabus
46 pages
Syntax Analysis 2
No ratings yet
Syntax Analysis 2
70 pages
LALR Parsers
No ratings yet
LALR Parsers
5 pages
LR Parser
No ratings yet
LR Parser
8 pages
Chapter 6-1 note
No ratings yet
Chapter 6-1 note
54 pages
LR (K) Parsing: CPSC 388 Ellen Walker Hiram College
No ratings yet
LR (K) Parsing: CPSC 388 Ellen Walker Hiram College
30 pages
LR(0)
No ratings yet
LR(0)
102 pages
Lecture3 Parser Full
No ratings yet
Lecture3 Parser Full
30 pages
Unit 02 - Part 03
No ratings yet
Unit 02 - Part 03
50 pages
Bottom Up Parsing1
No ratings yet
Bottom Up Parsing1
69 pages
LR Parsers
No ratings yet
LR Parsers
52 pages
LR 0 Notes
No ratings yet
LR 0 Notes
14 pages
Sectlrparse S
No ratings yet
Sectlrparse S
19 pages
LR 1 Notes
No ratings yet
LR 1 Notes
13 pages
Bottom Up Parsing
No ratings yet
Bottom Up Parsing
24 pages
Unit III
No ratings yet
Unit III
211 pages
Bottom Up Parsing
No ratings yet
Bottom Up Parsing
39 pages
Bottomupparser
No ratings yet
Bottomupparser
58 pages
6LR1 Lalr
No ratings yet
6LR1 Lalr
21 pages
SLR Parser
No ratings yet
SLR Parser
18 pages
Syntax Analyzer 2-up to LALR
No ratings yet
Syntax Analyzer 2-up to LALR
74 pages
Syntax Analyzer 2-up to LR(0)
No ratings yet
Syntax Analyzer 2-up to LR(0)
73 pages
Bottom-Up Parsing Including LR (0), SLR
No ratings yet
Bottom-Up Parsing Including LR (0), SLR
55 pages
LR (0) Parser
No ratings yet
LR (0) Parser
8 pages
Bottom Up Parser
No ratings yet
Bottom Up Parser
61 pages
Module 3
No ratings yet
Module 3
29 pages
LR Parsing: Dewan Tanvir Ahmed Assistant Professor, CSE Buet
No ratings yet
LR Parsing: Dewan Tanvir Ahmed Assistant Professor, CSE Buet
60 pages
SLR Parsing
No ratings yet
SLR Parsing
66 pages
UNIT-3-LR(0)-PARSER
No ratings yet
UNIT-3-LR(0)-PARSER
20 pages
Compiler Design(Unit-II)
No ratings yet
Compiler Design(Unit-II)
89 pages
Syntax Analysis: COP5621 Compiler Construction
No ratings yet
Syntax Analysis: COP5621 Compiler Construction
56 pages
LRParsers
No ratings yet
LRParsers
46 pages
Lec 9
No ratings yet
Lec 9
28 pages
M2 - P4 LR Parser
No ratings yet
M2 - P4 LR Parser
38 pages
Bottom Up Parsing
No ratings yet
Bottom Up Parsing
44 pages
Compiler Construction 1 1 Compiler Construction 1 2
No ratings yet
Compiler Construction 1 1 Compiler Construction 1 2
12 pages
CC lec 15
No ratings yet
CC lec 15
14 pages
slr clr lalr
No ratings yet
slr clr lalr
20 pages
LR Parsing Table Costruction
100% (1)
LR Parsing Table Costruction
47 pages
CD_Chap3_III_Bottom Up Parsing (2)
No ratings yet
CD_Chap3_III_Bottom Up Parsing (2)
37 pages
CC LR Parser
No ratings yet
CC LR Parser
37 pages
unit 3 cd
No ratings yet
unit 3 cd
34 pages
CD Unit3 Part1
No ratings yet
CD Unit3 Part1
22 pages
LR (0) Parsing
100% (1)
LR (0) Parsing
14 pages
Compiler Design Unit-2
No ratings yet
Compiler Design Unit-2
29 pages
Syntax Analysis (Part-II)
No ratings yet
Syntax Analysis (Part-II)
69 pages
CD_CHAPTER-3 (1)
No ratings yet
CD_CHAPTER-3 (1)
53 pages
Chapter 6 - Compiler Construction
No ratings yet
Chapter 6 - Compiler Construction
13 pages
CS6109-MODULE-6
No ratings yet
CS6109-MODULE-6
43 pages
CC 5
No ratings yet
CC 5
77 pages
Unit 3 - Sessions 6 - 7 (A)
No ratings yet
Unit 3 - Sessions 6 - 7 (A)
25 pages
Unit3.2 bottomupparsars
No ratings yet
Unit3.2 bottomupparsars
71 pages
mod3
No ratings yet
mod3
29 pages
LALR Parser For A Grammar: Compiler Design
No ratings yet
LALR Parser For A Grammar: Compiler Design
8 pages
LR
No ratings yet
LR
4 pages
Lrparser HaLrparser Handout Ndout
No ratings yet
Lrparser HaLrparser Handout Ndout
16 pages
Collision Practice
100% (1)
Collision Practice
11 pages
Chapter 4 Population Ecology
100% (1)
Chapter 4 Population Ecology
29 pages
Analysis of Algorithms: Issues
50% (2)
Analysis of Algorithms: Issues
37 pages
Pregnant Women Invitation Letter
No ratings yet
Pregnant Women Invitation Letter
3 pages
Multitec RO - 1777.82-10 Instruciones de Operación
No ratings yet
Multitec RO - 1777.82-10 Instruciones de Operación
20 pages
3aua00000145616 Revb
No ratings yet
3aua00000145616 Revb
316 pages
Taper Bushings Technical Data
No ratings yet
Taper Bushings Technical Data
2 pages
SY Series LEDEEN Actuators: Pneumatic and Hydraulic Actuators
100% (1)
SY Series LEDEEN Actuators: Pneumatic and Hydraulic Actuators
8 pages
YMYGMB2 ZAgendafor 620 SEIAAmeeting
No ratings yet
YMYGMB2 ZAgendafor 620 SEIAAmeeting
3 pages
Elec
No ratings yet
Elec
13 pages
Substation Visit Guna report_12.09.2022
No ratings yet
Substation Visit Guna report_12.09.2022
20 pages
ROOTS OF EASTER FROM BABEL TO ROME
No ratings yet
ROOTS OF EASTER FROM BABEL TO ROME
2 pages
Pdfcaie Igcse Computer Science 0478 Theory v1 PDF
No ratings yet
Pdfcaie Igcse Computer Science 0478 Theory v1 PDF
14 pages
SUP031O
No ratings yet
SUP031O
7 pages
Modern Tool Usage
No ratings yet
Modern Tool Usage
3 pages
Council Drainage Easements Information Sheet
No ratings yet
Council Drainage Easements Information Sheet
4 pages
HY USMLE Review Part I
No ratings yet
HY USMLE Review Part I
18 pages
0 Complete Book Electricity and Magnetism
100% (10)
0 Complete Book Electricity and Magnetism
134 pages
PM 100.08 Deferral Process
No ratings yet
PM 100.08 Deferral Process
3 pages
Physical Systems Simulation using SIMSCAPE
No ratings yet
Physical Systems Simulation using SIMSCAPE
138 pages
C5. DC-AC Converter - Update
No ratings yet
C5. DC-AC Converter - Update
50 pages
AFCAT Syllabus 2023
No ratings yet
AFCAT Syllabus 2023
1 page
Rosenthal Road Final Geotechnical Report
No ratings yet
Rosenthal Road Final Geotechnical Report
47 pages
Z706-HDS
No ratings yet
Z706-HDS
1 page
Inquiry Based Lesson Plan: Teachers: Abigail Vogt Subject: Biology - 10 Standard
No ratings yet
Inquiry Based Lesson Plan: Teachers: Abigail Vogt Subject: Biology - 10 Standard
5 pages
Slovak Forests
No ratings yet
Slovak Forests
2 pages
DS - en 1991-1-7 DK Na - 2007 e
No ratings yet
DS - en 1991-1-7 DK Na - 2007 e
6 pages
7 Reasons You're Stuck at Medium: Hard Truths About Getting Big & Strong
No ratings yet
7 Reasons You're Stuck at Medium: Hard Truths About Getting Big & Strong
8 pages
Class 9
100% (1)
Class 9
38 pages