0% found this document useful (0 votes)
28 views

Unit 3 - Sessions 6 - 7 (A)

Compiler design unit-3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Unit 3 - Sessions 6 - 7 (A)

Compiler design unit-3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

18CSC304J

COMPILER DESIGN

UNIT 3
SESSION 6 & 7(a)
Topics that will be covered in this Session

• LR Parsers
• Why LR Parsers?
• Items and LR(0) Automaton
• Closure of Item Sets
• LR Parsing Algorithm
LR Parsers
LR Parsers
• LR parsers use an efficient, bottom-up syntax analysis technique that can be used to parse a
large class of context-free grammars
• Generally, the technique is called LR(k) parsing
• The “L” is for left-to-right scanning of the input
• The “R” is for constructing a rightmost derivation in reverse
• The “k” for the number of input symbols of lookahead that are used in making parsing
decisions
• When (k) is omitted, k is assumed to be 1
Why LR Parsers?
Why LR Parsing is Attractive?

• LR parsers can be constructed to recognize all programming language constructs for which
context-free grammars can be written
• It can be efficiently implemented
• The class of grammars that can be parsed using LR methods is a proper superset of the class of
grammars that can be parsed with predictive parsers
• An LR parser can detect a syntactic error as early as possible
Drawbacks of LR Parsing

• It is too much work to construct an LR parser by hand for a typical programming-language


grammar. One needs a special tool – an LR parser generator
• Fortunately, many such generators are available (eg) Yacc – Yet Another Compiler Compiler
Items
Items
• An LR parser makes shift-reduce decisions by maintaining states to keep track of where we are
in a parse
• States represent sets of “items”
• An LR(0) item (item for short) of a grammar G is a production of G with a dot at some position of
the body
• The production A→XYZ yields the four items
A→•XYZ
A→X•YZ
A→XY•Z
A→XYZ•
• The production A → ε generates only one item, A → •
Items – cont..
• An item indicates how much of a production we have seen at a given point in the parsing
process
• For example, the item A → • X Y Z indicates that we hope to see a string derivable from XYZ
next on the input
• A → X • Y Z indicates that we have just seen on the input a string derivable from X and that
we hope next to see a string derivable from YZ
• A → X Y Z • indicates that we have seen the body XYZ and that it may be time to reduce XYZ
to A
Closure of Itemsets
Augmented Grammar
• One collection of sets of LR(0) items, called the canonical LR(0) collection, provides the basis for
constructing a deterministic finite automaton that is used to make parsing decisions
• To construct the canonical LR(0) collection for a grammar, we define an augmented grammar
and two functions CLOSURE and GOTO
• If G is a grammar with start symbol S, then G’ the augmented grammar is G with a new start
symbol S’ and the production S’ → S
• Its purpose is to indicate the parser when it should stop parsing and announce acceptance of
the input
• Acceptance occurs only when the parser is about to reduce by S’ → S
EXAMPLE
The augmented grammar is
Consider the grammar
E’ → E A new production is added
E→E+T|T
E→E+T|T
T→T*F|F
T→T*F|F
F → ( E ) | id
F → ( E ) | id
The CLOSURE operation
• If I is the set of items for a grammar G, then CLOSURE(I) is the set of items constructed from I
by the two rules:
1. Initially, add every item in I to closure(I)
2. If A → •Вβ is in CLOSURE(I) and B → γ is a production, then add the item B → • γ to I if
it is not already there. Apply this rule until no more new items can be added to
CLOSURE(I)
Consider the If I is the set of one item { [ E’→• E ] }, then
augmented grammar CLOSURE(I) contains the set of items

E’ → E E’ → • E Note:
E→E+T|T E→•E+T All the sets of items of interest is divided into
E→•T two classes:
T→T*F|F
Kernel Items: The initial item, S’ → •S,
F → ( E ) | id T→•T*F and all items whose dots are not at the left
T→•F end
Nonkernel Items: All items with their dots
F→•(E)
at the left end, except for S’ → •S
F → • id
The GOTO Function
• The second useful function is GOTO(I,X) where I is a set of items and X is a grammar symbol
• GOTO(I,X) is defined to be the closure of the set of all items [A →  X • β] such that [ A →  •X β]
is in I
• The GOTO function is used to define the transitions in the LR(0) automaton for a grammar
• The states of the automaton correspond to sets of items, and GOTO*I,X) specifies the transition
from the state for I under input X
Example:
If I is the set of two items { [ E’ → E •], [E → E • + T ] }, then GOTO(I,+) contains the items
E→E+•T
Note: The grammar considered is
T→•T*F
Examine I for items with + immediately to E→E+T|T
T→•F
the right of the dot. We have E → E•+ T T→T*F|F
F→•(E)
Move the dot over the + to get E → E + • T F → ( E ) | id
F → • id
Take the closure of this singleton set
LR(0) Automaton
LR(0) Automaton LR(0) Automaton for the expression grammar

• One collection of sets of LR(0)


items, called the canonical LR(0)
collection, provides the basis for
constructing a deterministic
finite automaton that is used to
make parsing decisions
• Such an automaton is called an
LR(0) automaton
• The states of this automaton are
the sets of items from the LR(0)
collection and the transitions
are given by the GOTO function
The LR(0) Automaton
• The start state of the LR(0) automaton is CLOSURE( { [S’→ • S] } )m where S’ is the start
symbol of the augmented grammar
• All states are accepting states
• We say “state j” to refer to the state corresponding to the set of items I j

How can LR(0) automata help with shift-reduce decisions?


• Suppose that the string γ of grammar symbols takes the LR(0) automaton from the start
state 0 to some state j
• Then, shift on next input symbol a if state j has a transition on a
• Otherwise, we choose to reduce
• The items in state j will tell us which production to use
LR Parsing Algorithm
The LR-Parsing Algorithm

• The LR parser consists of an input, an output, a


stack, a driver program and a parsing table that
has two parts (action and goto)

• The driver program is the same for all LR parsers

• Only the parsing table changes from one parser


to another

• A stack is used to store a string of the form


s0X1s1X2s2….Xmsm where

Xi – a grammar symbol

si – state symbol

• Each state symbol summarizes the information


contained in the stack below it
The LR Parsing Algorithm – cont..
Structure of the LR Parsing table
• The parsing table consists of two parts:
1. A parsing action function “action”
2. A goto function “goto”
• The LR parser behaves as follows:
If sm is the state currently on top of the stack and ai, the current input symbol, then the
parser consults action[sm,ai], which can have one of the four values:
1. Shift s, where s is a state
2. Reduce by a grammar production A→β
3. Accept
4. Error
• The function goto takes a state and a grammar symbol as arguments and produces a state
The LR Parsing Algorithm – cont..
LR Parser Configurations
• To describe the behavior of an LR parser, it helps to have a notation representing the complete
state of the parser: its stack and the remaining input
• The configuration of an LR parser is a pair:

• The first component is the stack contents (top on the right)


• The second component is the remaining input
• This configuration represents the right sentential form

• Instead of grammar symbols, the stack holds states from which grammar symbols can be
recovered
• Xi is the grammar symbol represented by state si
• s0 is the start state of the parser and it does not represent any grammar symbol
• s0 serves as the bottom marker of the stack and it pays an important role in parsing
The LR Parsing
Algorithm – cont..
Example: LR Parsing
Consider the expression grammar Moves of an LR Parser for the string id*id+id
(1) E → E + T (2) E → T Stack Input Action
(3) T → T * F (4) T → F 0 id * id + id $ Shift
(5) F → ( E ) (6) F → id 0 id 5 * id + id $ Reduce by F → id
0F3 * id + id $ Reduce by T → F
The LR Parsing Table
0T2 * id + id $ Shift
0T2*7 id + id $ Shift
0 T 2 * 7 id 5 + id $ Reduce by F → id
0 T 2 * 7 F 10 + id $ Reduce by T → T * F
0T2 + id $ Reduce by E → T
0E1 + id $ Shift
0E1+6 id $ Shift
0 E 1 + 6 id 5 $ Reduce by F → id
0E1+6F3 $ Reduce by T → F
0E1+6T9 $ Reduce by E → E + T
0E1 $ Accept
Techniques for constructing LR parsing table

1. Simple LR (SLR)
• Easiest to implement
• The least powerful of the three
• It may fail to produce a parsing table for certain grammars on which the other methods
succeed
2. Canonical LR (CLR)
• The most powerful
• The most expensive
3. Lookahead LR (LALR)
• Intermediate in power and cost between the other two
LR Grammar

• A grammar for which we can construct an LR parsing table is said to be an LR grammar


• LR grammars can describe more languages than LL grammars
• The class of grammars that can be parsed using LR methods is a proper superset of the class of
grammars that can be parsed with predictive or LL methods

You might also like