0% found this document useful (0 votes)
29 views

Cd notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

Cd notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 194

UNIT - 1

CD
(Compiler Design)
What is Compiler?
Compiler is a software which converts a
program written in high level language called
Source Language to low level language
(Object/Target/Machine Language).

It can be shown with the help of following


diagram:
Phases of a Compiler
There are two major phases of compilation, which
in turn have many parts. Each of them take input
from the output of the previous level and work in
a coordinated way.

• Analysis Phase – An intermediate representation


is created from the give source code :
• Synthesis Phase – Equivalent target program is
created from the intermediate representation.
Machine Code
Analysis Phase includes:
• Lexical Analyzer
• Syntax Analyzer
• Semantic Analyzer
• Intermediate Code Generator
A LANGUAGE FOR SPECIFYING LEXICAL
ANALYZER
There is a wide range of tools for constructing lexical analyzers.
Lex
YACC

Lex is a computer program that generates lexical analyzers. Lex is commonly used
with the yacc parser generator.
Creating a lexical analyzer

• First, a specification of a lexical analyzer is prepared by creating a


program lex.l in the Lex language. Then, lex.l is run through the Lex compiler to
produce a C program lex.yy.c.

• Finally, lex.yy.c is run through the C compiler to produce an object progra


m a.out, which is the lexical analyzer that transforms an input stream into a
sequence of tokens.

Fig1.11 Creating a lexical analyzer with lex

Lex Specification
A Lex program consists of three parts:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Definitions include declarations of variables, constants, and regular
definitions

Ø Rules are statements of the form


p1 {action1}
p2 {action2}

pn {actionn}
where pi is regular expression and actioni describes what action the lexical analyzer
should take
when pattern pi matches a lexeme. Actions are written in C code.


User subroutinesare auxiliary procedures needed by the actions. These can be
compiledseparately and loaded with the lexical analyzer.
Flex regular expressions

In addition to the usual regular expressions, Flex introduces some new notations.

[abcd]

A bracketed expression describes a set of characters. Expression [abcd] is equivalent to


(a|b|c|d).

[0-9]

In brackets, a dash indicates a range of characters. For example, [a-zA-Z] matches any
single letter. If you want a dash as one of the characters, put it first.

[^abcd]

This indicates any character except a, b, c or d. For example, [^a-zA-Z] matches any
nonletter.
The input character is read from secondary storage. But reading in this
way from secondary storage is costly. Hence buffering technique is used
A block of data is first read into a buffer, and then scanned by lexical
analyzer
There are two methods used in this context
1. One Buffer Scheme
2. Two Buffer Scheme
One Buffer Scheme:
In this scheme, only one buffer is used to store the input string. But the
problem with this scheme is that if lexeme is very long then it crosses the
buffer boundary, to scan rest of the lexeme the buffer has to be refilled,
that makes overwriting the first part of lexeme.
Two Buffer Scheme:
To overcome the problem of one buffer scheme, in this method two buffers
are used to store the input string. The first buffer and second buffer are
scanned alternately. When end of current buffer is reached the other
buffer is filled.
Initially both the bp and fp are pointing to the first character of first
buffer. Then the fp moves towards right in search of end of lexeme. as
soon as blank character is recognized, the string between bp and fp is
identified as corresponding token. To identify, the boundary of first
buffer end of buffer character should be placed at the end first buffer.
Similarly end of second buffer is also recognized by the end of buffer
mark present at the end of second buffer. When fp encounters first eof,
then one can recognize end of first buffer and hence filling up second
buffer is started. in the same way when second eof is obtained then it
indicates of second buffer. Alternatively both the buffers can be filled up
until end of the input program and stream of tokens is identified.
This eof character introduced at the end is calling Sentinel which is used
to identify the end of buffer.
What is Syntax analysis?

Syntax analysis is a second phase of the compiler design process that comes after
lexical analysis. It analyses the syntactical structure of the given input. It checks if the
given input is in the correct syntax of the programming language in which the input
which has been written. It is known as the Parse Tree or Syntax Tree.

The Parse Tree is developed with the help of pre-defined grammar of the language.
The syntax analyzer also checks whether a given program fulfils the rules implied by
a context-free grammar. If it satisfies, the parser then creates the parse tree of that
source program. Otherwise, it will display error messages.

Role of the parser:

The parser obtains a string of tokens from the lexical analyzer and verifies that the
string can be the grammar for the source language. It detects and reports any syntax
errors and produces a parse tree from which intermediate code can be generated.

Functions of the parser:

1. It verifies the structure generated by the tokens based on the grammar.


2. It constructs the parse tree.
3. It reports the errors.

CONTEXT-FREE GRAMMARS

The syntax of a programming language is described by a context free grammar (CFG).


A Context-Free Grammar G = (V, Σ, P, S )is a quadruple that consists of terminals, non-
terminals, start symbol and productions.

 Terminals: These are the basic symbols from which strings are formed.
 Non-Terminals: These are the syntactic variables that denote a set of strings.
These help to define the language generated by the grammar.
 Start Symbol: One non-terminal in the grammar is denoted as the “Start-
symbol” and the set of strings it denotes is the language defined by the
grammar.
 Productions: It specifies the manner in which terminals and non-terminals can
be combined to form strings. Each production consists of a non-terminal,
followed by an arrow, followed by a string of non-terminals and terminals.

Derivation

A derivation is basically a sequence of production rules, in order to get the input


string. During parsing, we take two decisions for some sentential form of input:

 Deciding the non-terminal which is to be replaced.


 Deciding the production rule, by which, the non-terminal will be replaced.
To decide which non-terminal to be replaced with production rule, we can have two
options.
Left-most Derivation
If the sentential form of an input is scanned and replaced from left to right, it is
called left-most derivation. The sentential form derived by the left-most derivation is
called the left-sentential form.
Right-most Derivation
If we scan and replace the input with production rules, from right to left, it is known
as right-most derivation. The sentential form derived from the right-most derivation
is called the right-sentential form.
Example
Production rules:
E→E+E
E→E*E
E → id
Input string: id + id * id
The left-most derivation is:
E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id
Notice that the left-most side non-terminal is always processed first.
The right-most derivation is:
E→E+E
E→E+E*E
E → E + E * id
E → E + id * id
E → id + id * id

Parse Tree

A parse tree is a graphical depiction of a derivation. It is convenient to see how


strings are derived from the start symbol. The start symbol of the derivation
becomes the root of the parse tree. Let us see this by an example from the last topic.
We take the left-most derivation first:

Step 1:

E→E*E

Step 2:

E→E+E*E
Step 3:

E → id + E * E

Step 4:

E → id + id * E

Step 5:

E → id + id * id
Types of Parsers in Compiler Design
Parser is that phase of compiler which takes token string as input and with the help of
existing grammar, converts it into the corresponding parse tree. Parser is also known as
Syntax Analyzer.

Parser is mainly classified into 2 categories: Top-down Parser, and Bottom-up Parser. These
are explained as following below.
1. Top-down Parser:
Top-down parser is the parser which generates parse for the given input string with
the help of grammar productions by expanding the non-terminals i.e. it starts from
the start symbol and ends on the terminals. It uses left most derivation.

Further Top-down parser is classified into 2 types: Recursive descent parser, and Non-
recursive descent parser.
I. Recursive descent parser:
It is also known as the with backtracking parser. It basically generates the parse tree
by using backtracking.

II. Non-recursive descent parser:


It is also known as LL(1) parser or predictive parser or without backtracking parser
or dynamic parser. It uses parsing table to generate the parse tree instead of
backtracking.
2. Bottom-up Parser:
Bottom-up Parser is the parser which generates the parse tree for the given input
string with the help of grammar productions by compressing the non-terminals i.e. it
starts from non-terminals and ends on the stat symbol. It uses reverse of the right
most derivation.

Further Bottom-up parser is classified into 2 types: LR parser, and Operator


precedence parser.

I. LR parser:
LR parser is the bottom-up parser which generates the parse tree for the given string
by using unambiguous grammar. It follows reverse of right most derivation.
LR parser is of 4 types:

a) LR(0)
b) SLR(1)
c) LALR(1)
d) CLR(1)

II. Operator precedence parser:


It generates the parse tree form given grammar and string but the only condition is
two consecutive non-terminals and epsilon never appear in the right-hand side of any
production.

Recursive Descent Parser


• Recursive descent parser is a top-down parser.
• It requires backtracking to find the correct production to be applied.
• The parsing program consists of a set of procedures, one for each non-terminal.
• Process begins with the procedure for start symbol.
• Start symbol is placed at the root node and on encountering each non-terminal, the
procedure concerned is called to expand the non-terminal with its corresponding
production.
• Procedure is called recursively until all non-terminals are expanded.
• Successful completion occurs when the scan over entire input string is done. i.e., all
terminals in the sentence are derived by parse tree.
To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter
of the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down
parser advances to the next input letter (i.e. ‘e’). The parser tries to expand non-terminal
‘X’ and checks its production from the left (X → oa). It does not match with the next input
symbol. So the top-down parser backtracks to obtain the next production rule of X, (X →
ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.

Predictive Parser
Predictive parser is a parser which has the capability to predict which production is to be
used to replace the input string. The predictive parser does not suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to
the next input symbols.
Non-recursive predictive parsing is also known as LL(1) parser. This parser follows the
leftmost derivation (LMD).

LL(1)means:
Here, first L is for Left to Right scanning of inputs, the second L is for left most derivation
procedure, and 1 = Number of Look Ahead Symbols
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse
tree. Both the stack and the input contains an end symbol $ to denote that the stack is
empty and the input is consumed. The parser refers to the parsing table to take any
decision on the input and stack element combination.

We will see two grammar transformations that improve the chance to get a LL(1) grammar:

 Elimination of left-recursion
 Left-factorization
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
OPredictive parsing uses a stack and
a
parsing table to parse the input Lals
Input
and generate a parse tree.

OBoth the stack and the input


contains an end symbol $to denote
that the stack is empty and the input Parser Output
is consumed.

OThe parser refers to the parsing Stack


table to take any decision on the
Parsing Table
input and stack element
combination. ACTION GOTO
Why using FIRST and FOLLOW: 24

During parsing FIRST and FOLLOW help us to choose


which production to apply, based on the next input signal.

We know that we need of backtracking in syntax analysis, which is


really a complex process to implement. There can be easier way to
sort out this problem by using FIRST AND FOLLOW.

If the compiler would have come to know in advance, that what is the
frst character of the string produced when a production rule is
applied", and comparing it to the current character or token in the input
string it sees, it can wisely take decision on which production rule to
apply
FOLLOW is used only if the current non terminal can derive E.
25

Rules of FIRST

FIRST always find out the terminal symbol from the grammar.
When we check out FIRST for any symbol then if we find any
terminal symbol in first place then we take it. And not to see the
next symbol.

If a grammar is

A a then
If a grammar is
FIRST (A )={a)
A a B then FIRST (A )={ a)
26
Rules of FIRST

If a grammar is

AaBI
ethen FIRST (A)={a,c}
If a grammar is

A BcDIE

BeDI(A)
Here B is non terminal. So, we check the transition of B and
find the FIRST of A.
then
FIRST (A )={ e,(, t}
PIRST ad PoLLDD

fIRST
it(K) et ttose teunuals LSile etict the
ssps cetivabe o tat
Tk XYZ
fiist e) =
thst (XYZ) = x 1X ateunua

fist() Ptst (K),Aixt (x)


=
does
uot coutain E.
fist(K) tontai E,
ean tist(x) fist(*)e UPst Y2).
ETA
A +TAE Hst all Non-Tetnnial ?
-

T F8
B F8le
F (E) |d
Fhst ( E)

tist (TA) - fot (A) - et, ¬f

fist (T)

fist (f&)-> fixt(B) {*e?


de(A) = +/e
Aint (P) (B) */¬

C/id/e

C, d,3
SqBDh
cC
foi fnt =2
C6C
ks) = a
D EF

) 3E
27
Rules of FOLLOW
For doing FOLLOW operation we need FIRST operation mostly. In FOLLOW
we use a $ sign for the start symbol. FOLLOW always check the right
portion of the symbol.
If a grammar is

ABAc; Ais start symbol.


Here firstly check if the selected symbol stays in right side of the grammar.
We see that c is right in A.
then FOLLOW (A) = {c. $)
oLDO3 ROLeS t» find foLDO
elenuct foLoo(c)
hat Su stotu
$uan
Aymbot
b of and dicates t tud ofaat

n o t prseut a foLLOO Ret

A dBB a pzoductim
PEan fotlo(B) = fist(F) it first() alas uot
Coutain E.

fiost (P) fes U follus(A) hn

fist(6) sntan .

ETA
AtTA.e Eollos()-t,)3
TFB folous (A)= folou(E ) =
$, 5

F (E) |d fotlon (T) first(A) -{ t, 6?


¬
gt fist A) Catains

oUas(r) - firt(4)- fe3 U fallo(E)

te-te?
t,,)
ollo (8)- fotlo:(T)- f+,$, )3
f l s (F) -
Fixt (B) =
f*, ¬
fiMt (B) Cautaun E

fottns (f) Fist(8) - fe3 u f6uas(T)

i+,t, *
28

Rules of FOLLOW

1f a grammar is

BA

A "BC
Here we see that there is nothing at the right side of A'. So

FOLLOW(A')= FOLLOW (A)={ $)


Because A' follows the start symbol.
SA
tello8 =
A98Ad fhol Pst ond
bbBC6

anlnUaL i L(4) 6i uot?


To dack stutfi

to be LL(1).
aganuna
beolloed
O Oditan to
AF
Aist (x) fist()- ¢
mtain e and fist(«) olos nat
fist(B)
Coutan E, Aun

fint () fouao(A) ¢ =

S aSAE
A cfe
A CE
S aSA |E Prst (c)0Hist(e)
Cand.
GrhOFint (asA)
N Fixt()

Cmsl.
Cond2 fst (c) &llas (A)
f t (asA )0 follar()

Uot LL

You might also like