0% found this document useful (0 votes)
10 views

UNIT-2(CD)

This document covers the syntax analysis phase of compiler design, focusing on Context Free Grammar (CFG), parsing techniques, and error recovery methods. It discusses various parsing strategies, including top-down and bottom-up parsing, as well as concepts like left recursion and left factoring. Additionally, it outlines algorithms for predictive parsing and error recovery techniques, providing examples and classification of parsing techniques.

Uploaded by

jessilsj139
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

UNIT-2(CD)

This document covers the syntax analysis phase of compiler design, focusing on Context Free Grammar (CFG), parsing techniques, and error recovery methods. It discusses various parsing strategies, including top-down and bottom-up parsing, as well as concepts like left recursion and left factoring. Additionally, it outlines algorithms for predictive parsing and error recovery techniques, providing examples and classification of parsing techniques.

Uploaded by

jessilsj139
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

UNIT-II Compiler Design (R22)

Syntax Analysis, discussion on CFG, LMD, RMD, parse trees, Role of a parser, classification of parsing
techniques, Brute force approach, left recursion, left factoring, Top down parsing, First and Follow,
LL(1) Grammars, Non-Recursive predictive parsing, Error recovery in predictive parsing.

CFG: Context Free Grammar is a formal grammar which is used to generate all possible strings in a
given formal language. CFG can be defined by 4 tuples as:

G={ V,T,P,S}

Where V is a finite set of non-terminal symbols

T is a finite set of terminal symbols

P is a set of productions

S is a start symbol.

In CFG, the start symbol is used to drive the string. A grammar is said to be the CFG if every production
is in the form of:

A→ (VUT)* where A ε V

Derivation: Derivation beginning with start symbol. Deriving terminals from non-terminal is called
derivation.
Derivation tree/ Parse tree: It is graphical representation of the derivation and it shows the order in
which the productions are applied to derive the string. In parse tree, the root node always representing the
start symbol and all the interior nodes are denoting the variables and all the leaf nodes are denoting the
terminals.

Examples: Refer Class notes

LMD: The Left Most Derivation of a string, the leftmost non-terminal is replaced first at each step.

RMD: The Right Most Derivation of a string, the rightmost non-terminal is replaced first at each step.

Examples: Refer Class notes

Role of the parser:


The second phase of the compiler is syntax analysis. It is also called Hierarchical analysis or parsing. The
parser takes the tokens from lexical analyzer and checks the syntax of the tokens with help of
grammar(G) and also to create a syntax tree or parse tree.
There are three general types of parsers for grammars: universal, top-down, and bottom-up. Universal
parsing is not using in compiler. The methods commonly used in compilers can be classified as being
either top-down or bottom-up. Top-down methods build parse trees from the top (root) to the bottom
(leaves), while bottom-up methods start from the leaves and working up to the root. In either case, the
input to the parser is scanned from left to right, one symbol at a time.

1 prepared by: B Yugandhar


The most efficient top-down and bottom-up methods work only for sub-classes of grammars, these
classes are LL and LR grammars. Top-down parsing support LL grammars while bottom-up parsers
support LR Grammars. The first "L" in LL stands for scanning the input from left to right, the second
"L" for producing a leftmost derivation. The first "L" in LR stands for scanning the input from left
to right, the second "R” for producing a rightmost derivation in reverse order. Top-down parser
cannot handle left recursive grammars and bottom-up parsing support any type of grammars.

AMBIGUITY: A grammar that produces more than one parse tree for a string is said to be ambiguous.
(or) An ambiguous grammar is one that produces more than one leftmost derivation or more than one
rightmost derivation for the same string.

2 prepared by: B Yugandhar


Elimination ambiguity:
1. Elimination of Left Recursion:
Top-down parsing methods cannot handle left-recursive grammars,so we needed to eliminate left
recursion. If each production of the form A → Aα | β then to rewrite the productions are

A → β A'
A' → αA' | ε
For any number of A-productions. First, group the productions as

where no βi begins with an A. Then, replace the A-productions by

Note: Refer class notes for Examples.

Algorithm: Eliminating left recursion.


INPUT: Grammar G with no cycles or ε-productions.
OUTPUT: An equivalent grammar with no left recursion.
METHOD:
1) arrange the nonterminals in some order A1 , A2 , …. , An .
2) for ( each i from 1 to n ) {
3) for ( each j from 1 to i - 1 ) {
4) replace each production of the form Ai → Ajγ by the
productions Ai → δ1'γ | δ2 γ | ….| δk γ , where Aj → δ1 | δ2 | . . . | δk
5) }
6) Eliminate the left recursion

2. Elimination of Left factoring:


When the choice between two alternative A-productions is not clear, we may be able to rewrite theproductions.
For example, if we have the two productions
stmt → if expr then stmt else stmt
| if expr then stmt
On seeing the input if, we cannot immediately tell which production to choose to expand stmt. In
general, if A → αβ1 | αβ2 | γ are two A-productions, and the input begins with a string derived from α,
we do not know whether to expand A to αβ1 OR αβ2. That is, left-factored, the original productions
become
A →α A' | γ
A' → β1 | β 2

Algorithm: Left factoring a grammar.


INPUT: Grammar G.
OUTPUT: An equivalent left-factored grammar.
METHOD: If α ≠ ε then replace all of the A-productions A → αβ1 | αβ2 | ...| αβn | γ ,
Where γ represents all alternatives that do not begin with α by
A → αA' | γ
A' → β1 | β2 | ... | βn
Here A' is a new nonterminal.
Note: Refer class notes for Examples.

3 prepared by: B Yugandhar


CLASSIFICATION OF PARSING TECHNIQUES:

The processing of finding a parse tree for string of tokens is called parsing.Parsing can be of two types.
They are
1) Top-down parsing
2) Bottom-up parsing

Top-down parsing starting from the root node and working up to the leaf nodes. Creating the nodes of
the parse tree in preorder. Top-down parsing used to a leftmost derivation for an input string.
Example: The sequence of parse trees in fig, for the input id+id*id is a top-down parser

W=id+id*id

4 prepared by: B Yugandhar


A Bottom-up parse corresponds to the construction of parse tree for an input string begging at the
leaves and working up towards the root.

TOP-DOWN PARSING:

A general form of top-down parsing, called recursive descent parsing, which may require
backtracking to find the correct A-production to be applied. Predictive parsing, a special
case of recursive descent parsing, where no backtracking is required. Predictive parsing chooses the
correct A-production.

Top-down parsing are two types.


i) Backtracking/Recursive decent parsing/ Brute force Technique.
ii) Non- backtracking/ predicative parsing / non-recursive predicative parsing/ Table driven parsing.
Recursive descent parsing: - A recursive-descent parsing program consists of a set of procedures, one
for each nonterminal. Execution begins with the procedure for the start symbol, which halts and
announces success.
General recursive-descent may require backtracking; that is, it may require repeated scans over
the input. Backtracking is not very efficient.

For example, consider the grammar


S → cAd
A → ab | a
w=cad

5 prepared by: B Yugandhar


To construct the parse tree for this string top-down, we initially create a tree consisting of a single
node labeled ‘S’. An input pointer points ‘c’ the first symbol of ‘W’ then use the first production for
‘S’ to expand the tree and obtain the tree of fig(a). The left most leaf, labeled ‘c’ matches the first
symbol of ‘W’, so we now advance the input pointer to ‘a’. The second symbol of W, and consider the
next leaf Labeled ‘A’. We can then expand ‘A’ using the first alternative for A to obtain the tree of
fig(b). Now have a match for the second input symbol. So, we advance the input pointer to ‘d’. The
third input symbol and compared against the next leaf, labeled ‘b’. Since ‘b’ does not match ‘d’, we
report failure and go back to ‘A’ to see there is another alternative for ‘A’ to obtain the tree of fig(c).
In going back to ‘A’, we must reset the input pointer to position 2, which means that the procedure for
A must store the input in a local variable.

Algorithm:
void A ( ) {
Choose an A-production, A → X1 X2 ... Xk ;
for ( i = 1 to k ) {
if ( Xi is a nonterminal )
call procedure Xi ( ) ;
else if ( Xi equals the current input symbol a )
advance the input to the next symbol;
else /* an error has occurred */;
}
}

Note: Refer class notes for Examples.

FIRST and FOLLOW:


The construction of both top-down and bottom-up parsers is supported by two functions, FIRST and
FOLLOW, associated with a grammar G. During top-down parsing, FIRST and FOLLOW allow us to
choose which production to apply, based on the next input symbol.

To compute FIRST(X) for all grammar symbols X.

1. If X is a terminal, then FIRST (X) = {X}.


2. If X is a nonterminal and X → γ1 γ2 . . . γk is a production, then FIRST (γ1)
3. If X → ε is a production, then add ε to FIRST (X).

To compute FOLLOW(A) for all non-terminals A.

1. Place $ in FOLLOW(A), where A is the start symbol.


2. If there is a production A → αBβ, where β is terminal then FOLLOW(B)= β
3. If there is a production A → αB, then FOLLOW(B)=FOLLOW(A)
4. If there is a production A → αBβ, where β is non-terminal then
i) FIRST(β) contains ε, then FOLLOW(B)= FIRST(β) – ε U FOLOW(A).
ii) FIRST(β) does not contains ε, then FOLLOW(B)= FIRST(β)

Note: Refer class notes for Examples.

6 prepared by: B Yugandhar


LL (1) Grammars:
A recursive-descent parsers needing no backtracking, can be constructed for a class of grammars
called LL(I), The first "L" in LL(1) stands for scanning the input from left to right, the second "L" for
producing a leftmost derivation, and the "1" for using one input symbol of look ahead at each step to
make decisions.

Algorithm: Construction of a predictive parsing table.


INPUT: Grammar G.
OUTPUT: Parsing table M.
METHOD: For each production A → α of the grammar, do the following:
1. For each terminal a in FIRST (A), add A → α to M[A, a] .
2. If ε is in FIRST(α) , then for each terminal b in FOLLOW(A) , add A → α to M [A, b] and $ is in
FOLLOW(A) , add A → α to M[A, $] as well.

Note: Refer class notes for Examples.

Predictive Parsers/Non-recursive Predictive Parsing:

Recursive decent parsing can also perform using a stack. Such a parser is called a predicative parser.
The predicative parser reads an input string. Then, using a stack symbol and a parsing table, it
produces the output.

7 prepared by: B Yugandhar


Predictive parser has an input buffer, a stack, a parsing table and output stream. The input buffer
contains the input string followed by $, where $ is an end input marker. The stack contains a sequence
of grammar symbols with $ on the bottom. Initially, the stack contains the start symbol of the grammar
on top of $. The parsing table is a two dimensional array M[A,a] where A is a variable, a is a terminal
or a $.

The parser is controlled by a program that considers X, the symbol on top of the stack, and ‘a’ is the
current input symbol. If X is a non-terminal, the parser chooses an X-production by consulting
entry M [X, a] of the parsing table M. Otherwise, it checks for a match between the terminal X and
current input symbol a.
Construct the predictive parse table for the following steps:
1. Eliminate left recursion and left factoring in grammar ‘G’.
2. Find first and follows on the symbols in grammar ‘G’.
3. Construct the predicative parse table.
4. Check if the given input string can be accepted by the parser / parsing action.
Algorithm: Predictive parsing

8 prepared by: B Yugandhar


Example: Construct a predictive parsing for the following grammar and the given string is
accepted or not.
E -> E+T|T
T -> T*F|F
F -> (E)|id

W=id+id * id or a+b*c

Step-1: Eliminate Left Recursive


E -> TE’
E’ -> +TE’|ε
T -> FT’
T’ -> *FT’|ε
F -> (E)|id

Step-2: Find First and follow Functions


First Follow
E (,id $,)
E’ +,ε $,)
T (,id +,$,)
T’ +,ε +,$,)
F (,id *,+,$,)

Step3: Parsing Table

id + * ( ) $
E E -> TE’ E -> TE’
E’ E -> +TE’ E’ -> ε E’ -> ε
T T -> FT’ T -> FT’
T’ T’ -> ε T’ -> *FT’ T’ -> ε T’ -> ε
F F -> id F -> (E)

9 prepared by: B Yugandhar


Step-4: Parsing Action

MATCHED STACK INPUT ACTION


E$ id+id * id$
TE’$ id+id * id$ E->TE’
FT’E’$ id+id * id$ T->FT’
id T’E’$ id+id * id$ F->id
id T’E’$ +id * id$ Match id
id E’$ +id * id$ T’->Є
id +TE’$ +id * id$ E’-> +TE’
id+ TE’$ id * id$ Match +
id+ FT’E’$ id * id$ T-> FT’
id+ idT’E’$ id * id$ F-> id
id+id T’E’$ * id$ Match id
id+id * FT’E’$ * id$ T’-> *FT’
id+id * FT’E’$ id$ Match *
id+id * idT’E’$ id$ F-> id
id+id * id T’E’$ $ Match id
id+id * id E’$ $ T’-> Є
id+id * id $ $ E’-> Є

Error Recovery in Predicative Parsing:

An error is detected during predicative parsing when the terminal on the top of the stack does not
match the current input symbol or the parse table entry M[A,a] is empty.

i) Panic Mode: Panic-mode error recovery is based on the idea of skipping symbols on the input until a
token. If the parser table entry M[A, a] is blank, then the input symbol ‘a’ is skipped. If the entry is
"synch," then the non-terminal on top of the stack is popped. If a terminal on top of the stack does not
match the input symbol, then we pop the top of the stack.

10 prepared by: B Yugandhar


Example:

STACK INPUT ACTION


E$ )id * +id$ -
E$ id * +id$ Skip )
TE’$ id * +id$ E→ TE’
FT’E’$ id * +id$ T→ FT’
idT’E’$ id * +id$ F→id
T’E’$ * +id$ Match id
*FT’E’$ * +id$ T’→*FT’
FT’E’$ +id$ Match *
T’E’$ +id$ Pop stack
E’$ + id$ T’→ Є
+TE’$ +id$ E’→ +TE’
TE’$ id$ Match +
FT’E’$ id$ T→ FT’
idT’E’$ id$ F→ id
T’E’$ $ Match id
E’$ $ T’-> Є
$ $ E’-> Є

11 prepared by: B Yugandhar


ii) Phrase-level recovery: Phrase-level error recovery is implemented by filling in the blank entries in
the predictive parsing table with pointers to error routines. These routines may change, insert, or delete
symbols on the input and also pop from the stack.

Deference between Recursive descent parsing and predictive parsing:


Recursive descent parsing:
1. It is general parsing technique, but not widely used as it is not efficient.
2. It involves back tracking.
3. Problems: Ambiguity, Left recursive, Left factoring, Backtracking.
Predictive parsing:
1. It is efficient.
2. Does not require back tracking.
3. It need special form of grammars (called as LL(1)).
4. Predictive parsing is a special form of recursive descent parsing without backtracking.
5. Non-recursive (table driven) predictive parser is also known as LL(1) parser.

Difficulties in top-down parsing:


1. Backtracking is the major difficulty with top-down parsing.
2. Left recursive grammars cannot be parsed by top-down parsers.
3. Grammar must be left factored before applying it as an input to top-down parser.
4. Top-down parsers cannot parse the ambiguous grammar.
5. Top-down parsers are slow and debugging is very difficult.

12 prepared by: B Yugandhar

You might also like