Cd notes
Cd notes
CD
(Compiler Design)
What is Compiler?
Compiler is a software which converts a
program written in high level language called
Source Language to low level language
(Object/Target/Machine Language).
Lex is a computer program that generates lexical analyzers. Lex is commonly used
with the yacc parser generator.
Creating a lexical analyzer
Lex Specification
A Lex program consists of three parts:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Definitions include declarations of variables, constants, and regular
definitions
User subroutinesare auxiliary procedures needed by the actions. These can be
compiledseparately and loaded with the lexical analyzer.
Flex regular expressions
In addition to the usual regular expressions, Flex introduces some new notations.
[abcd]
[0-9]
In brackets, a dash indicates a range of characters. For example, [a-zA-Z] matches any
single letter. If you want a dash as one of the characters, put it first.
[^abcd]
This indicates any character except a, b, c or d. For example, [^a-zA-Z] matches any
nonletter.
The input character is read from secondary storage. But reading in this
way from secondary storage is costly. Hence buffering technique is used
A block of data is first read into a buffer, and then scanned by lexical
analyzer
There are two methods used in this context
1. One Buffer Scheme
2. Two Buffer Scheme
One Buffer Scheme:
In this scheme, only one buffer is used to store the input string. But the
problem with this scheme is that if lexeme is very long then it crosses the
buffer boundary, to scan rest of the lexeme the buffer has to be refilled,
that makes overwriting the first part of lexeme.
Two Buffer Scheme:
To overcome the problem of one buffer scheme, in this method two buffers
are used to store the input string. The first buffer and second buffer are
scanned alternately. When end of current buffer is reached the other
buffer is filled.
Initially both the bp and fp are pointing to the first character of first
buffer. Then the fp moves towards right in search of end of lexeme. as
soon as blank character is recognized, the string between bp and fp is
identified as corresponding token. To identify, the boundary of first
buffer end of buffer character should be placed at the end first buffer.
Similarly end of second buffer is also recognized by the end of buffer
mark present at the end of second buffer. When fp encounters first eof,
then one can recognize end of first buffer and hence filling up second
buffer is started. in the same way when second eof is obtained then it
indicates of second buffer. Alternatively both the buffers can be filled up
until end of the input program and stream of tokens is identified.
This eof character introduced at the end is calling Sentinel which is used
to identify the end of buffer.
What is Syntax analysis?
Syntax analysis is a second phase of the compiler design process that comes after
lexical analysis. It analyses the syntactical structure of the given input. It checks if the
given input is in the correct syntax of the programming language in which the input
which has been written. It is known as the Parse Tree or Syntax Tree.
The Parse Tree is developed with the help of pre-defined grammar of the language.
The syntax analyzer also checks whether a given program fulfils the rules implied by
a context-free grammar. If it satisfies, the parser then creates the parse tree of that
source program. Otherwise, it will display error messages.
The parser obtains a string of tokens from the lexical analyzer and verifies that the
string can be the grammar for the source language. It detects and reports any syntax
errors and produces a parse tree from which intermediate code can be generated.
CONTEXT-FREE GRAMMARS
Terminals: These are the basic symbols from which strings are formed.
Non-Terminals: These are the syntactic variables that denote a set of strings.
These help to define the language generated by the grammar.
Start Symbol: One non-terminal in the grammar is denoted as the “Start-
symbol” and the set of strings it denotes is the language defined by the
grammar.
Productions: It specifies the manner in which terminals and non-terminals can
be combined to form strings. Each production consists of a non-terminal,
followed by an arrow, followed by a string of non-terminals and terminals.
Derivation
Parse Tree
Step 1:
E→E*E
Step 2:
E→E+E*E
Step 3:
E → id + E * E
Step 4:
E → id + id * E
Step 5:
E → id + id * id
Types of Parsers in Compiler Design
Parser is that phase of compiler which takes token string as input and with the help of
existing grammar, converts it into the corresponding parse tree. Parser is also known as
Syntax Analyzer.
Parser is mainly classified into 2 categories: Top-down Parser, and Bottom-up Parser. These
are explained as following below.
1. Top-down Parser:
Top-down parser is the parser which generates parse for the given input string with
the help of grammar productions by expanding the non-terminals i.e. it starts from
the start symbol and ends on the terminals. It uses left most derivation.
Further Top-down parser is classified into 2 types: Recursive descent parser, and Non-
recursive descent parser.
I. Recursive descent parser:
It is also known as the with backtracking parser. It basically generates the parse tree
by using backtracking.
I. LR parser:
LR parser is the bottom-up parser which generates the parse tree for the given string
by using unambiguous grammar. It follows reverse of right most derivation.
LR parser is of 4 types:
a) LR(0)
b) SLR(1)
c) LALR(1)
d) CLR(1)
Predictive Parser
Predictive parser is a parser which has the capability to predict which production is to be
used to replace the input string. The predictive parser does not suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to
the next input symbols.
Non-recursive predictive parsing is also known as LL(1) parser. This parser follows the
leftmost derivation (LMD).
LL(1)means:
Here, first L is for Left to Right scanning of inputs, the second L is for left most derivation
procedure, and 1 = Number of Look Ahead Symbols
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse
tree. Both the stack and the input contains an end symbol $ to denote that the stack is
empty and the input is consumed. The parser refers to the parsing table to take any
decision on the input and stack element combination.
We will see two grammar transformations that improve the chance to get a LL(1) grammar:
Elimination of left-recursion
Left-factorization
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
OPredictive parsing uses a stack and
a
parsing table to parse the input Lals
Input
and generate a parse tree.
If the compiler would have come to know in advance, that what is the
frst character of the string produced when a production rule is
applied", and comparing it to the current character or token in the input
string it sees, it can wisely take decision on which production rule to
apply
FOLLOW is used only if the current non terminal can derive E.
25
Rules of FIRST
FIRST always find out the terminal symbol from the grammar.
When we check out FIRST for any symbol then if we find any
terminal symbol in first place then we take it. And not to see the
next symbol.
If a grammar is
A a then
If a grammar is
FIRST (A )={a)
A a B then FIRST (A )={ a)
26
Rules of FIRST
If a grammar is
AaBI
ethen FIRST (A)={a,c}
If a grammar is
A BcDIE
BeDI(A)
Here B is non terminal. So, we check the transition of B and
find the FIRST of A.
then
FIRST (A )={ e,(, t}
PIRST ad PoLLDD
fIRST
it(K) et ttose teunuals LSile etict the
ssps cetivabe o tat
Tk XYZ
fiist e) =
thst (XYZ) = x 1X ateunua
T F8
B F8le
F (E) |d
Fhst ( E)
fist (T)
C/id/e
C, d,3
SqBDh
cC
foi fnt =2
C6C
ks) = a
D EF
) 3E
27
Rules of FOLLOW
For doing FOLLOW operation we need FIRST operation mostly. In FOLLOW
we use a $ sign for the start symbol. FOLLOW always check the right
portion of the symbol.
If a grammar is
A dBB a pzoductim
PEan fotlo(B) = fist(F) it first() alas uot
Coutain E.
fist(6) sntan .
ETA
AtTA.e Eollos()-t,)3
TFB folous (A)= folou(E ) =
$, 5
te-te?
t,,)
ollo (8)- fotlo:(T)- f+,$, )3
f l s (F) -
Fixt (B) =
f*, ¬
fiMt (B) Cautaun E
i+,t, *
28
Rules of FOLLOW
1f a grammar is
BA
A "BC
Here we see that there is nothing at the right side of A'. So
to be LL(1).
aganuna
beolloed
O Oditan to
AF
Aist (x) fist()- ¢
mtain e and fist(«) olos nat
fist(B)
Coutan E, Aun
fint () fouao(A) ¢ =
S aSAE
A cfe
A CE
S aSA |E Prst (c)0Hist(e)
Cand.
GrhOFint (asA)
N Fixt()
Cmsl.
Cond2 fst (c) &llas (A)
f t (asA )0 follar()
Uot LL