UNIT-2(CD)
UNIT-2(CD)
Syntax Analysis, discussion on CFG, LMD, RMD, parse trees, Role of a parser, classification of parsing
techniques, Brute force approach, left recursion, left factoring, Top down parsing, First and Follow,
LL(1) Grammars, Non-Recursive predictive parsing, Error recovery in predictive parsing.
CFG: Context Free Grammar is a formal grammar which is used to generate all possible strings in a
given formal language. CFG can be defined by 4 tuples as:
G={ V,T,P,S}
P is a set of productions
S is a start symbol.
In CFG, the start symbol is used to drive the string. A grammar is said to be the CFG if every production
is in the form of:
A→ (VUT)* where A ε V
Derivation: Derivation beginning with start symbol. Deriving terminals from non-terminal is called
derivation.
Derivation tree/ Parse tree: It is graphical representation of the derivation and it shows the order in
which the productions are applied to derive the string. In parse tree, the root node always representing the
start symbol and all the interior nodes are denoting the variables and all the leaf nodes are denoting the
terminals.
LMD: The Left Most Derivation of a string, the leftmost non-terminal is replaced first at each step.
RMD: The Right Most Derivation of a string, the rightmost non-terminal is replaced first at each step.
AMBIGUITY: A grammar that produces more than one parse tree for a string is said to be ambiguous.
(or) An ambiguous grammar is one that produces more than one leftmost derivation or more than one
rightmost derivation for the same string.
A → β A'
A' → αA' | ε
For any number of A-productions. First, group the productions as
The processing of finding a parse tree for string of tokens is called parsing.Parsing can be of two types.
They are
1) Top-down parsing
2) Bottom-up parsing
Top-down parsing starting from the root node and working up to the leaf nodes. Creating the nodes of
the parse tree in preorder. Top-down parsing used to a leftmost derivation for an input string.
Example: The sequence of parse trees in fig, for the input id+id*id is a top-down parser
W=id+id*id
TOP-DOWN PARSING:
A general form of top-down parsing, called recursive descent parsing, which may require
backtracking to find the correct A-production to be applied. Predictive parsing, a special
case of recursive descent parsing, where no backtracking is required. Predictive parsing chooses the
correct A-production.
Algorithm:
void A ( ) {
Choose an A-production, A → X1 X2 ... Xk ;
for ( i = 1 to k ) {
if ( Xi is a nonterminal )
call procedure Xi ( ) ;
else if ( Xi equals the current input symbol a )
advance the input to the next symbol;
else /* an error has occurred */;
}
}
Recursive decent parsing can also perform using a stack. Such a parser is called a predicative parser.
The predicative parser reads an input string. Then, using a stack symbol and a parsing table, it
produces the output.
The parser is controlled by a program that considers X, the symbol on top of the stack, and ‘a’ is the
current input symbol. If X is a non-terminal, the parser chooses an X-production by consulting
entry M [X, a] of the parsing table M. Otherwise, it checks for a match between the terminal X and
current input symbol a.
Construct the predictive parse table for the following steps:
1. Eliminate left recursion and left factoring in grammar ‘G’.
2. Find first and follows on the symbols in grammar ‘G’.
3. Construct the predicative parse table.
4. Check if the given input string can be accepted by the parser / parsing action.
Algorithm: Predictive parsing
W=id+id * id or a+b*c
id + * ( ) $
E E -> TE’ E -> TE’
E’ E -> +TE’ E’ -> ε E’ -> ε
T T -> FT’ T -> FT’
T’ T’ -> ε T’ -> *FT’ T’ -> ε T’ -> ε
F F -> id F -> (E)
An error is detected during predicative parsing when the terminal on the top of the stack does not
match the current input symbol or the parse table entry M[A,a] is empty.
i) Panic Mode: Panic-mode error recovery is based on the idea of skipping symbols on the input until a
token. If the parser table entry M[A, a] is blank, then the input symbol ‘a’ is skipped. If the entry is
"synch," then the non-terminal on top of the stack is popped. If a terminal on top of the stack does not
match the input symbol, then we pop the top of the stack.