Unit I Bks Lexical Analysis V - Re - and - Fsa
Unit I Bks Lexical Analysis V - Re - and - Fsa
Sharma
UNIT I
Unit I: Syllabus
• Introduction to Compiler
• Major Data Structures in Compiler
• Types of Compiler
• Front-End and Back-End of Compiler
• Compiler Structure:
– Analysis-synthesis model of compilation
• Various phases of Compiler
Learn Compiler Design: From B. K. Sharma
Unit I: Syllabus
• Lexical Analysis:
– Input Buffering
– Specification and Recognition of Tokens
– Design of a Lexical Analyzer Generator
• Lex
Learn Compiler Design: From B. K. Sharma
Summary of Lecture 8: Regular Expression and
Lexical Analyzer
1: The regular expression can be used to describe
a sequence of pattern that defines a string.
2: Operations that can be performed on Regular
Expressions are : Union /Or, Concatenation and
Kleene Closure.
Learn Compiler Design: From B. K. Sharma
What is Automata?
Specification of Tokens
There are 3 specifications of tokens
1) Strings
2) Language
3) Regular expression
In the First Step, the Lexical Analyzer splits the code into
units such as if, (, x, ==, y and so on called lexemes.
In the second step, the lexical Analyzer assign each of these
lexemes to a class.
Here is where Regular Expressions come into play:
the act of assigning strings to a class is done with a pattern-
matching operation, where each string is matched to a regular
expression.
Learn Compiler Design: From B. K. Sharma
Chomsky Hierarchy
Learn Compiler Design: From B. K. Sharma
DFA
Non-deterministic (NFA)
In NFA, there exist many paths for specific input from the current
state to the next state.
Learn Compiler Design: From B. K. Sharma
q0 q1
Learn Compiler Design: From B. K. Sharma
Example: Identifier
letter followed by letters or digits
RE and FA
A scanner generator (e.g., lex) bridges the gap between
regular expressions and FAs.
Learn Compiler Design: From B. K. Sharma
Thompson construction
NFA
Subset construction
DFA
Minimized DFA
Scanner Program
Learn Compiler Design: From B. K. Sharma
start
i f L()
start a
i f L(a)
Learn Compiler Design: From B. K. Sharma
N(s)
start
i f L(s) L(t)
N(t)
overlap
Alternative:
start
i N(s) N(t) f
Concatenate
3.(e) a*
Learn Compiler Design: From B. K. Sharma
3.(e) a*
Learn Compiler Design: From B. K. Sharma
3.(e) a*
Learn Compiler Design: From B. K. Sharma
3.(e) a*
Learn Compiler Design: From B. K. Sharma
start
i f
N(s)
where : i is new start state and f is new final state
ε-move i to f (to accept null string)
ε -moves i to old start, old final(s) to f
ε -move old final to old start
Learn Compiler Design: From B. K. Sharma
overlap overlap
Learn Compiler Design: From B. K. Sharma
a*b (a | b)
Learn Compiler Design: From B. K. Sharma
overlap overlap
Learn Compiler Design: From B. K. Sharma
overlap overlap
Learn Compiler Design: From B. K. Sharma
a ( b | c )*
Learn Compiler Design: From B. K. Sharma
1. a, b, & c a b c
S0 S1 S0 S1 S0 S1
b
S1 S2
2. b | c
S0 S5
c
S3 S4
Learn Compiler Design: From B. K. Sharma
b
S2 S3
3. ( b | c )* S0 S1 S6 S7
c
S4 S5
Learn Compiler Design: From B. K. Sharma
4. a ( b | c )*
b
S S5
4
a
S0 S1 S2 S3 S8 S9
c
S6 S7
Concatenate
Learn Compiler Design: From B. K. Sharma
b | c
a
S0 S1
Learn Compiler Design: From B. K. Sharma
(a | b)* abb
Learn Compiler Design: From B. K. Sharma
overlap
Learn Compiler Design: From B. K. Sharma
Summary of Lecture 9: Regular Expression and
Lexical Analyzer
1: Abstract model of a computing entity or digital
computers that decides whether to accept or reject a
string is called
2: In Finite Automaton, the automaton is exactly one of a
finite number of states at any given time.
3: Two main types of automata are DFA and NFA.
4: Three specifications of tokens are : string, language and
regular expressions.
5. Regular language is accepted by Finite Automata and is
described by regular expression.
6: Tokens are specified by regular expressions and recognized
by Finite Automata.
7: Thompson’s Construction is used to convert regular expressions to NFA with
-moves
Learn Compiler Design: From B. K. Sharma
(a | b)* a(a | b)
(a | b)*a (a | b)
Draw the transition diagram for identifiers.