CD_Unit3
CD_Unit3
• It is the process of "reducing" a Input string w to the start symbol of the grammar.
• At each reduction step, a specific substring matching the body of a production is replaced by the non
terminal at the head of that production
• A reduction is the reverse of a step in a derivation, therefore bottom up parser is rightmost
derivation in reverse.
Example:
• Consider the grammar: S → aABe
A → Abc | b
B→d
• The sentence/Input string(w) to be recognized is abbcde.
HANDLE
• A "handle" is a substring that matches the right side of a production, and whose reduction to the non-
terminal on the left side of the production represents one step along the reverse of a rightmost derivation.
EXAMPLE OF HANDLE
HANDLE PRUNING
• Shift reduce parsing is a bottom-up parsing that reduces a string w to the start symbol of the grammar.
• It scans and parses the input text in one forward pass without backtracking.
Stack implementation of shift reduce parsing
• Handle pruning must solve the following two problems to perform parsing
• Locating the substring to be reduced in a right sentential form.
• Determining what production to choose in case there is more than one productions with that substring on the
right side.
• The type of data structure to use in a shift reduce parser.
SHIFT REDUCE PARSERS (SR PARSERS)
• Operator grammar
– small, but an important class of grammars (parse only small class of grammars)
– we may have an efficient operator precedence parser (a shift-reduce parser) for an
operator grammar.
• In an operator grammar, no production rule can have:
– ε at the right side
– two adjacent non-terminals at the right side.
• Example :
E→AB E→EOE E→E+E |
A→a B→b E→id O→+|*|/ E*E | E/E | id
not operator grammar Not operator grammar An operator grammar
PRECEDENCE RELATIONS
• Operator precedence relation describes relation between pair of terminals which guides the selection of handles.
• The determination of correct precedence relations between terminals are based on the traditional notions of associativity
and precedence of operators. (Unary minus causes a problem).
IDEA BEHIND OPERATOR PRECEDENCE PARSING
OPERATOR PRECEDENCE ALGORITHM
OPERATOR PRECEDENCE RELATIONS
+ - * / ^ id ( ) $
+ .> .> <. <. <. <. <. .> .>
• Disadvantages:
– It cannot handle the unary minus (the lexical analyzer should handle the unary
minus).
– Small class of grammars.
– Difficult to decide which language is recognized by the grammar.
• Advantages:
– simple
– powerful enough for expressions in programming languages
ERROR RECOVERY IN OPERATOR PRECEDENCE PARSING
• Error Cases:
1. No relation holds between the terminal on the top of stack and the next input
symbol.
2. A handle is found (reduction step), but there is no production with this handle
as a right side
• Error Recovery:
3. Each empty entry is filled with a pointer to an error routine.
4. Decides the popped handle “looks like” which right hand side. And tries to
recover from that situation.
COMPUTATION OF LEADING AND TRAILING
Leading is defined for every non-terminal. Terminals that can be the first terminal in a string derived from that non-terminal.
Trailing is defined for every non-terminal. • Terminals that can be the last terminal in a string derived from that non-terminal.
• Step 2: After computing LEADING and TRAILING, the table is constructed between all the terminals in the grammar including the ‘$’ symbol.
PRECEDENCE RELATION TABLE
Algorithm:
set p to point to the first symbol of w$ ;
repeat forever
if $ is on top of the stack and p points to $ then return else
begin
let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by p;
if ( a <. b or a =· b ) then
push b onto the stack;
advance p to the next input symbol;
end
else if a .> b then repeat pop
stack
until the top of stack terminal is related by <. to the terminal most recently popped
else error();
end
PARSING THE INPUT
Step 4 : parsing the input string (id+id)*id$
PRECEDENCE FUNCTIONS
• Compilers using operator-precedence parsers need not store the table of precedence relations.
• In most cases, the table can be encoded by two precedence functions f and g that map terminal symbols to integers.
We attempt to select f and g so that, for symbols a and b.
ALGORITHM FOR CONSTRUCTING PRECEDENCE RELATIONS
LR PARSERS
• LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars.
• In the LR parsing, "L" stands for left-to-right scanning of the input.
• "R" stands for constructing a right most derivation in reverse.
• "K" is the number of input symbols of the look ahead used to make number of parsing decision.
• LR parsing is divided into four parts: LR (0) parsing, SLR parsing, CLR parsing and LALR parsing.
• LR parsers can usually recognize all programming language construct that can be specified by context-free grammars.
• LR parsers detect errors fast.
• Drawback: it is too much work to construct an LR parser by hand.
• Fortunately, we can use an LR parser generator such as YACC.
• An LR(0) parser is a Shift Reduce parser that uses zero tokens of look ahead to determine what action to take.
• This means that in any configuration of the parser, the parser must have an unambiguous action to choose- either it
shifts a specific symbol or applies a specific reduction.
• If there are ever two or more choices to make, the parser fails and the grammar is not LR(0).
• An LR parser makes shift-reduce decisions by maintaining states to keep track of parsing. States represent a set of
items.
• To find
1. Closure of items ets
2. Construct canonical LR(0) collection
CONSTRUCTION OF THE C ANONIC AL LR(0) COLLECTION
• To create the SLR parsing tables for a grammar G, we will create the canonical LR(0) collection of the grammar G’.
Algorithm
.
C is { closure({S’→ S}) }
repeat the followings until no more set of LR(0) items can be added to C.
for each I in C and each grammar symbol X
if goto(I,X) is not empty and not in C An LR(0) item of a grammar G is a production of G with a • at
some position of the right-hand side
add goto(I,X) to C Thus, a production
A XY Z
• goto function is a DFA on the sets in C. has four items:
[A • X Y Z]
[A X • Y Z]
[A X Y • Z]
[A X Y Z •]
Note that production A has one item [A •]
GOTO FUNCTION
CLOSURE FUNCTION
Steps
• Defining 2 functions: goto[list of non-terminals] and action[list of terminals] in the parsing table. Below is the SLR
parsing table.
no shift/reduce or
no reduce/reduce conflict
⇓
so, it is a SLR grammar.
EXAMPLE OF SLR PARSER
no shift/reduce or
no reduce/reduce conflict
Using the SLR parser, the given input string id*id+id has been fully parsed. ⇓
so, it is a SLR grammar.
SLR PARSER
SLR PARSER EXAMPLE
Construct SLR parsing
table for the given
grammar
S → AA
A → aA | b
1 Find augmented
production
S` → •S
S → •AA
A → •aA
A → •b
2 Constructing DFA
SLR PARSER EXAMPLE
no shift/reduce or
no reduce/reduce conflict
⇓
so, it is a SLR grammar.
SLR PARSER EXAMPLE
ISSUES OF SLR / LR(0) PARSER
EXAMPLE OF SLR CONFLICT (SHIFT –REDUCE)
• To avoid some of invalid reductions, the states need to carry more information.
• Extra information is put into a state by including a terminal symbol as a second component in an item.
. .
states of a LALR parser.
. .
• Assume that we can introduce a shift/reduce conflict. In this case, a state A → α ,a and B → β aγ,b
• This means that a state of the canonical LR(1) parser must have: A → α ,a and B → β aγ,c.
But, this state has also a shift/reduce conflict. i.e. The original canonical LR(1) parser has a conflict. (Reason
for this, the shift operation does not depend on lookaheads)
Reduce / Reduce Conflict
• But, we may introduce a reduce/reduce conflict during the shrink process for the creation of the states of a LALR
parser.
EXAMPLE
LALR PARSER
LALR PARSER EXAMPLE
Unambiguous LR(1) grammar:
SL=R
|R
L*R
| id
RL
Augment with S’ S
LALR items (next slide)
LALR PARSER PARSING TABLE
2. Draw DFA
EXAMPLE
EXAMPLE
ERROR CORRECTION AND RECOVERY IN LR PARSING
• Canonical LR parser uses full LR(1) parse tables and will never make a single reduction before recognizing the error when a
syntax error occurs on the input
• SLR and LALR may still reduce when a syntax error occurs on the input, but will never shift the erroneous input symbol.
ERROR RECOVERY IN LR PARSING
• Panic mode
• Pop until state with a goto on a nonterminal A is found, (where A represents a major
programming construct), push A
• Discard input symbols until one is found in the FOLLOW set of A
• Phrase-level recovery
• Implement error routines for every error entry in table
• Error productions
• Pop until state has error production, then shift on stack
• Discard input until symbol is encountered that allows parsing to continue
LL, SLR, CLR, LALR SUMMARY
%{C declarations%}
yacc declarations
%%
Grammar rules
%%
Additional C code
• Comments enclosed in /* ...*/
LEX AND YACC
ü For example, you use lex to identify keywords, identifiers, strings, comments, whitespace, and so
on. yacc is for parsing your grammar.
WORK WITH LEX AND YACC
[0-9]+
call yylex()
90
DEFINITIONS SECTION
%{
#include <stdio.h>
#include <stdlib.h>
%}
It is a terminal
%token ID NUM
%start expr
EXPR TO PARSE
91
START SYMBOL
92
RULES SECTION
9
3
THE POSITION OF RULES
94
THE POSITION OF RULES
95
POSITION OF RULES
96
STEPS FOR YACC
1. Create a LEX file with ‘.l’ externsion and fill the definition and rule section. (lex_file_name.l)
2. The user sub-routine section is not necessary.
3. Create a YACC file with ‘.y’ extension and save it. (yacc_file_name.y)
4. Compile the LEX file. (lex lex_file_name.l)
5. Compile the YAAC file. (yacc – d yacc_file_name.y)
6. Compile both using C compiler. (cc lex.yy.c y.tab.c)
7. Execute the object file. ./a.out
TO RECOGNIZE A VALID IDENTIFIER
// fn.l
%option noyywrap
digit[0-9]
letter[a-zA-Z]
%{
#include "y.tab.h"
%}
%%
{letter}({letter}|{digit})* {return VALID;}
. {return INVALID;}
%%
// FN.Y
// arith.l
%option noyywrap
%{
#include "y.tab.h"
%}
%%
[a-zA-Z_][a-zA-Z_0-9]* return IDENTIFIER;
[0-9]+(\.[0-9]*)? return NUMBER;
[+/*] return OPERATOR;
[ \t] ;
. return yytext[0];
\n return 0;
%%
%{ F : OPERATOR E
#include<stdio.h> | '-' E
|
int yyerror();
;
int yylex(); %%
int valid = 1;
int yyerror()
%} {
%token NUMBER IDENTIFIER OPERATOR valid = 0;
%% printf("\n The given Arithmetic Expression is Invalid.");
return 0;
START : E }
E : IDENTIFIER F int main()
| NUMBER F {
printf("\nEnter the expression:\n");
| '-' NUMBER F
yyparse();
| '-' IDENTIFIER F if (valid == 1)
| '(' E ')' F printf ("\n The given Arithmetic Expression is Valid.");
}
IMPLEMENTATION OF C ALCULATOR USING LEX AND YACC