0% found this document useful (0 votes)
27 views

Unit I Bks Lexical Analysis V - Re - and - Fsa

Uploaded by

siyadogra98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Unit I Bks Lexical Analysis V - Re - and - Fsa

Uploaded by

siyadogra98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Learn Compiler Design: From B. K.

Sharma

UNIT I

Regular Expressions and Finite Automata


Learn Compiler Design: From B. K. Sharma

Unit I: Syllabus
• Introduction to Compiler
• Major Data Structures in Compiler
• Types of Compiler
• Front-End and Back-End of Compiler
• Compiler Structure:
– Analysis-synthesis model of compilation
• Various phases of Compiler
Learn Compiler Design: From B. K. Sharma

Unit I: Syllabus
• Lexical Analysis:
– Input Buffering
– Specification and Recognition of Tokens
– Design of a Lexical Analyzer Generator
• Lex
Learn Compiler Design: From B. K. Sharma
Summary of Lecture 8: Regular Expression and
Lexical Analyzer
1: The regular expression can be used to describe
a sequence of pattern that defines a string.
2: Operations that can be performed on Regular
Expressions are : Union /Or, Concatenation and
Kleene Closure.
Learn Compiler Design: From B. K. Sharma

Mapping of Lesson with Course Outcome


(CO)
Lesson CO
Lesson 9: Regular Understand Lexical
Expression and Finite Analysis and
Automata implement it using lex
tool.
Learn Compiler Design: From B. K. Sharma
Lesson 9: Regular Expression and Finite
Automata : Learning Outcomes
At the end of this lesson, students will be able to

1: List 3 specifications of tokens

2: Explain issues in lexical analysis.

3: Explain role of finite automata in lexical


analysis.

4: Convert Regular Expression into NFA using


Thompson’s Construction
Learn Compiler Design: From B. K. Sharma

Active Learning Activity: Diagnostic


Assessment
One- Minute Paper

What is Automata?

Name different types of automata.


Learn Compiler Design: From B. K. Sharma

Specification of Tokens
There are 3 specifications of tokens

1) Strings

2) Language

3) Regular expression

A string over an alphabet is a finite sequence of symbols


drawn from that alphabet.

A language is any countable set of strings over some fixed


alphabet.

Each Regular Expression denotes a language.


Learn Compiler Design: From B. K. Sharma

Tasks of Lexical Analyzer:


The Lexical Analyzer in a Compiler performs two
tasks:
First Step: Reads the source code from left to
right character by character and divide into pieces
called lexemes according to some criteria such as
space, tab, separators, punctuation symbol, etc.
These pieces are called lexemes.
Second Step: Assign each lexemes to a predefined
class such as Identifier, Keyword, Operator,
Delimiter, etc. These classes are called tokens.
Each Token is a pair: <Lexeme, Class>
Learn Compiler Design: From B. K. Sharma

Tasks of Lexical Analyzer:


The second step within Lexical Analysis is
performed using Regular Expressions, so
that the sequence of characters that forms
each lexeme is matched with a token class.

When the lexeme ”if” is extracted in the


first step, and then in the second step this
lexeme is matched with the regular
expression:
Keyword: if | else | while | for | return |....
Learn Compiler Design: From B. K. Sharma

Tasks of Lexical Analyzer:


Let us consider the code given below:
if (x==y)
z =12;
else
z = 3;
i f ( x = = y ) \n \t z = 1 2 ; \n e l s e \n \t z = 3 ; \n

In the First Step, the Lexical Analyzer splits the code into
units such as if, (, x, ==, y and so on called lexemes.
In the second step, the lexical Analyzer assign each of these
lexemes to a class.
Here is where Regular Expressions come into play:
the act of assigning strings to a class is done with a pattern-
matching operation, where each string is matched to a regular
expression.
Learn Compiler Design: From B. K. Sharma

Tasks of Lexical Analyzer: Important Take-


away
First, and foremost, during the Lexical Analysis everything is
a character (or a sequence of characters called a string).
Secondly, the objective of the Lexical Analysis is to generate
a list of pairs <lexeme, class>.

Formally, each of these pairs is called Token.


A lexeme is a piece of the source code that is matched to
one of the classes defined by the language’s grammar.
Thirdly, lexemes are extracted from the source code
according to rules that are specific to the language.
After that, each lexeme is assigned to a class, using regular
expressions.
Learn Compiler Design: From B. K. Sharma

Two main Issues in Lexical Analysis


How to specify tokens (patterns)?
How to recognize the tokens given a token
specification?

How to specify tokens:

Tokens are specified by regular expressions.


Each regular expression specifies a token.
How to recognize the tokens given a token
specification?

Tokens are recognized by FA.


Learn Compiler Design: From B. K. Sharma

Two other Issues in Lexical Analysis:


Lookahead, and
ambiguities
“Lookahead” may be required to decide where one
token ends and the next token begins.
Even our simple example has lookahead issues:
i vs. if
= vs. ==
A way to resolve ambiguities:
Is “if” two variables ‘i’ and ‘f’?
Is “==“ two equal signs ‘=‘ ‘=‘?
Learn Compiler Design: From B. K. Sharma

Regular Expressions and Lexical Analyzer


Regular expressions are used to represent the
language for Lexical Analyzer.
What is that language?
Regular Language.
What is the grammar for Regular Language?
Regular grammar generates regular languages which
are all languages accepted by Finite Automaton.
The language accepted by finite automata and
described by an expression called regular
expression.
Learn Compiler Design: From B. K. Sharma

Chomsky Hierarchy
Learn Compiler Design: From B. K. Sharma

What is Finite Automata?

Abstract model of a computing entity or digital


computers that

decides whether to accept or reject a string.

A finite-state machine or finite-state automaton,


finite automaton, or simply a state machine, is a
mathematical model of computation.

It is an abstract machine that can be in exactly


one of a finite number of states at any given
time.
Learn Compiler Design: From B. K. Sharma

Finite Automata AKA FSM


Two types of FAs:
Deterministic (DFA) and Non-Deterministic(NFA)
Deterministic (DFA):
In DFA, there is only one path for specific input from the current
state to the next state.

DFA
Non-deterministic (NFA)
In NFA, there exist many paths for specific input from the current
state to the next state.
Learn Compiler Design: From B. K. Sharma

Finite Automata AKA FSM


FA for ‘a’ FA for ‘ab’ FA for (a | b)

FA for a* FA for a*b* FA for (a | b)*


a b


q0 q1
Learn Compiler Design: From B. K. Sharma

Regular Expressions and Lexical Analyzer


A finite state machine (or Finite automata)
recognizes legal strings from a language. „

Example: Identifier
letter followed by letters or digits

Regular Expression: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])*


Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper

NFA, in its name has ’non-deterministic’


because of :

a) The result is undetermined

b) The choice of path is non-deterministic

c) The state to be transited next is non-deterministic

d) All of the mentioned


Learn Compiler Design: From B. K. Sharma

Regular Expressions and Lexical Analyzer

Regular Scanner generator


Finite
expression Automaton
String stream

Regular Finite Automaton is Scanner


created by Scanner program
expression
generator such as
lex
Tokens

RE and FA
A scanner generator (e.g., lex) bridges the gap between
regular expressions and FAs.
Learn Compiler Design: From B. K. Sharma

Inside the scanner generator


RE

Thompson construction

NFA

Subset construction

DFA

Minimization Hopcroft’s Algorithm

Minimized DFA

DFA simulation Scanner


generator

Scanner Program
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

1. For  in the regular expression, construct NFA

start
i  f L()

2. For a   in the regular expression, construct NFA

start a
i f L(a)
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


3.(a) Union or OR operation or Alternative
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


3.(b) If s, t are regular expressions, N(s), N(t) their
NFAs, s|t (union) has NFA:

 N(s) 
start
i f L(s)  L(t)

 N(t) 

where i and f are new start / final states, and -moves


are introduced from i to the old start states of N(s) and
N(t) as well as from all of their final states to f.
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

3.(c) Concatenation Operation


Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


3.(d) If s, t are regular expressions, N(s), N(t) their
NFAs s.t (concatenation) has NFA:
start
i N(s) N(t) f L(s) L(t)

overlap
Alternative:

start   
i N(s) N(t) f

Concatenate

where i is the start state of N(s) (or new under the


alternative) and f is the final state of N(t) (or new).
Overlap maps final states of N(s) to start state of N(t).
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

3.(e) a*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

3.(e) a*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

3.(e) a*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

3.(e) a*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction

3.(f) If s is a regular expressions, N(s) its NFA, s*


(Kleene star) has NFA:


start
i   f
N(s)


where : i is new start state and f is new final state
ε-move i to f (to accept null string)
ε -moves i to old start, old final(s) to f
ε -move old final to old start
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 1
Let’s try aa*b
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 1
Let’s try aa*b

overlap overlap
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper

Draw NFA for the following regular


expression:

a*b (a | b)
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 2
Let’s try a*b(a|b)
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 2
Let’s try a*b(a|b)

overlap overlap
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 3
Let’s try ab ( a | b )*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 3
Let’s try ab ( a | b )*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 3
Let’s try ab ( a | b )*

overlap overlap
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper

Draw NFA for the following regular


expression:

a ( b | c )*
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 4
Let’s try a ( b | c )*

1. a, b, & c a b c
S0 S1 S0 S1 S0 S1

b
 S1 S2 
2. b | c
S0 S5
 c 
S3 S4
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 4
Let’s try a ( b | c )*

b
 S2 S3 
 
3. ( b | c )* S0 S1 S6 S7
 c 
S4 S5

Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example: 4
Let’s try a ( b | c )*


4. a ( b | c )*
b
 S S5 
4
a   
S0 S1 S2 S3 S8 S9
 c 
S6 S7
Concatenate

Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example:4
Let’s try a ( b | c )*
Of course, a human would design something simpler ...

b | c
a
S0 S1
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example:5
Let’s try (a | b)* abb
Learn Compiler Design: From B. K. Sharma

Active Learning Activity


One- Minute Paper

Draw NFA for the following regular


expression:

(a | b)* abb
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example:5
Let’s try (a | b)* abb
Learn Compiler Design: From B. K. Sharma

RE To NFA: Thompson Construction


Example:5
Let’s try (a | b)* abb

overlap
Learn Compiler Design: From B. K. Sharma
Summary of Lecture 9: Regular Expression and
Lexical Analyzer
1: Abstract model of a computing entity or digital
computers that decides whether to accept or reject a
string is called
2: In Finite Automaton, the automaton is exactly one of a
finite number of states at any given time.
3: Two main types of automata are DFA and NFA.
4: Three specifications of tokens are : string, language and
regular expressions.
5. Regular language is accepted by Finite Automata and is
described by regular expression.
6: Tokens are specified by regular expressions and recognized
by Finite Automata.
7: Thompson’s Construction is used to convert regular expressions to NFA with
-moves
Learn Compiler Design: From B. K. Sharma

Regular Expressions and FSA: Questions


How is Finite Automata useful for lexical analysis?
Construct the NFA and DFA for the following regular
expression: ( a + b )*abb

Construct minimum state expression DFA for the following


regular expression:

(a | b)* a(a | b)

Show the construction of NFA for following regular


expression.

(a | b)*a (a | b)
Draw the transition diagram for identifiers.

You might also like