CD Unit3 Part1
CD Unit3 Part1
WHY LR PARSING:
LR parsers can be constructed to recognize virtually all programming-language
constructs for which context-free grammars can be written.
The LR parsing method is the most general non-backtracking shift-reduce parsing
method known, yet it can be implemented as efficiently as other shift-reduce
methods.
The class of grammars that can be parsed using LR methods is a proper subset of the
class of grammars that can be parsed with predictive parsers.
An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-
right scan of the input.
The disadvantage is that it takes too much work to constuct an LR parser by hand for a
typical programming-language grammar. But there are lots of LR parser generators available
to make this task easy.
MODELS OF LR PARSERS
The schematic form of an LR parser is shown below.
The program uses a stack to store a string of the form s0X1s1X2...Xmsm where sm is on top.
Each Xi is a grammar symbol and each si is a symbol representing a state. Each state symbol
summarizes the information contained in the stack below it. The combination of the state
symbol on top of the stack and the current input symbol are used to index the parsing table
and determine the shiftreduce parsing decision. The parsing table consists of two parts: a
parsing action function action and a goto function goto. The program driving the LR parser
behaves as follows: It determines sm the state currently on top of the stack and ai the current
input symbol. It then consults action[sm,ai], which can have one of four values:
shift s, where s is a state
reduce by a grammar production A -> b
accept
error
The function goto takes a state and grammar symbol as arguments and produces a state.
For a parsing table constructed for a grammar G, the goto table is the transition function of a
deterministic finite automaton that recognizes the viable prefixes of G. Recall that the viable
prefixes of G are those prefixes of right-sentential forms that can appear on the stack of a
shiftreduce parser because they do not extend past the rightmost handle.
A configuration of an LR parser is a pair whose first component is the stack contents and
whose second component is the unexpended input:
(s0 X1 s1 X2 s2... Xm sm, ai ai+1... an$)
This configuration represents the right-sentential form
X1 X1 ... Xm ai ai+1 ...an
in essentially the same way a shift-reduce parser would; only the presence of the states on the
stack is new. Recall the sample parse we did (see Example 1: Sample bottom-up parse) in
which we assembled the right-sentential form by concatenating the remainder of the input
buffer to the top of the stack. The next move of the parser is determined by reading ai and
sm, and consulting the parsing action table entry action[sm, ai]. Note that we are just looking
at the state here and no symbol below it. We'll see how this actually works later.
The configurations resulting after each of the four types of move are as follows:
If action[sm, ai] = shift s, the parser executes a shift move entering the configuration
(s0 X1 s1 X2 s2... Xm sm ai s, ai+1... an$)
Here the parser has shifted both the current input symbol ai and the next symbol.
If action[sm, ai] = reduce A -> b, then the parser executes a reduce move, entering the
configuration,
(s0 X1 s1 X2 s2... Xm-r sm-r A s, ai ai+1... an$)
where s = goto[sm-r, A] and r is the length of b, the right side of the production. The parser
first popped 2r symbols off the stack (r state symbols and r grammar symbols), exposing state
sm-r. The parser then pushed both A, the left side of the production, and s, the entry for
goto[sm-r, A], onto the stack. The current input symbol is not changed in a reduce move.
The output of an LR parser is generated after a reduce move by executing the semantic action
associated with the reducing production. For example, we might just print out the production
reduced.
If action[sm, ai] = accept, parsing is completed.
OPERATOR PRECEDENCE PARSING
Precedence Relations
Bottom-up parsers for a large class of context-free grammars can be easily developed
using operator grammars.Operator grammars have the property that no production right side
is empty or has two adjacent nonterminals. This property enables the implementation of
efficient operator-precedence parsers. These parser rely on the following three precedence
relations:
Relation Meaning
a <· b a yields precedence to b
a =· b a has the same precedence as b
These operator precedence relations allow to delimit the handles in the right sentential
forms: <· marks the left end, =· appears in the interior of the handle, and ·> marks the right
end.
Repeat: Let X be the top stack symbol, and a the symbol pointed to by
ip if $ is on the top of the stack and ip points to $ then return
else
Let a be the top terminal on the stack, and b the symbol pointed
to by ip
if a <· b or a =· b then
push b onto the stack
repeat
else error()
end
do: place an edge from the group of gb to the group of fa if a <· b, otherwise if a ·> b
Output: sets of LR(1) items that are the set of items valid for one or more viable prefixes of
G'
Method:
closure(I)
begin
repeat
I1:S’->S.,$
I2:S->C.C,$
C->.Cc,$
C->.d,$
I3:C->c.C,c/d
C->.Cc,c/d
C->.d,c/d
I4: C->d.,c/d
I5: S->CC.,$
I6: C->c.C,$
C->.cC,$
C->.d,$
I7:C->d.,$
I8:C->cC.,c/d
I9:C->cC.,$
state c d $ S C
0 S36 S47 1 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 R3 R3
5 R1
89 R2 R2 R2
HANDLING ERRORS
The LALR parser may continue to do reductions after the LR parser would have spotted an
error, but the LALR parser will never do a shift after the point the LR parser would have
discovered the error and will eventually find the error.
DANGLING ELSE
The dangling else is a problem in computer programming in which an optional else clause in
an If–then(–else) statement results in nested conditionals being ambiguous. Formally,
the context-free grammar of the language is ambiguous, meaning there is more than one
correct parse tree.
In many programming languages one may write conditionally executed code in two forms:
the if-then form, and the if-then-else form – the else clause is optional:
The actions may include insertion or deletion of symbols from the stack or the input
or both, or alteration and transposition of input symbols. We must make our choices
so that the LR parser will not get into an infinite loop. A safe strategy will assure that
at least one input symbol will be removed or shifted eventually, or that the stack will
eventually shrink if the end of the input has been reached. Popping a stack state that
covers a non terminal should be avoided, because this modification eliminates from
the stack a construct that has already been successfully parsed.
*****