0% found this document useful (0 votes)

15 views

Reg Exp 2 DFA

Uploaded by

Mr Super

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Reg Exp 2 DFA

Uploaded by

Mr Super

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Regular Expressions and Deterministic Finite Automata

Given an alphabet Σ, a finite set of symbols, a language over the alphabet Σ is any set of strings made up
of the symbols from Σ. For example, if Σ = {a, b} , then the following are some examples of languages
over Σ:

L1 = { aab, aba, bab, aa }

L2 = { w | w has equal number of a’s and b’s } = { abab, aaabbb, abba, λ , … }, here λ is the empty string.

L3 = { w | w is made up only a’s and has a length which is a prime number } = { aa, aaa, aaaaa, … }

We define 3 operations on Languages. Let L, L1, and L2 be languages. Then,

1. L1.L2 = { w1.w2 | w1 ∈ L1 and w2 ∈ L2 }, where w1.w2 is the string concatenation of w1 and w2.
2. L1 ∪ L2 = { w | w ∈ L1 or w ∈ L2 }, called the union
3. L* = { λ } ∪ L ∪ L.L ∪ L.L.L ∪ …

I. Regular Expressions

Regular expressions are a mathematical mechanism to define a class of languages called regular
languages. Given an alphabet of symbols, Σ, a regular expression is defined as follows:

1. Every symbol in Σ is a regular expression.

2. ϵ is a regular expression
3. if r and s are regular expressions, then so are the following
(rs)
(r + s)
(r)∗

(rs) is called the concatenation of r and s , (r + s) is called the union of r and s , and (r)∗ is called the
Kleene closure of r . The parentheses may be left out with the understanding that the ∗ operator has
highest precedence, the concatenation operator has the next level of precedence, and the + operator the
lowest precedence. Some examples of regular expressions over the alphabet {a, b} are:

r1 = ab(a+b)*ab
r2 = (a+b)*
r3 = aa+bb
Each regular expression r represents a language L(r) which is defined as follows:

1. L(a) = { a }, for any a in Σ

2. L(ϵ ) = { λ }
3. L(rs ) = L( r ).L( s )
4. L(r + s ) = L( r ) + L( s )
5. L(r∗ ) = L(r )*

Apply this definition to the earlier 3 examples of regular expressions, we get the following:

L(ab(a+b)*ab) = { w | w starts with ab and ends with ab }

L((a+b)*) = set of all strings made up of any number of a’s and b’s in any order including the empty string.

L(aa+bb) = { aa, bb }

II. Deterministic Finite Automata

A deterministic Finite Automata (DFA) is a mathematical model of a simple computational device that
reads a string of symbols over the input alphabet Σ, and either accepts or reject the input. The set of
strings accepted by the DFA is referred to as the language of the DFA.

A deterministic finite automata (DFA) is defined as a 4-tuple (Q,T,S,F), where

Q is a finite set of states

S ∈ Q is designated as a start state
F ⊆ Q is a designated set of final states
T is a transition function from Q x Σ → Q

A DFA can be pictured as a graph with states as the nodes and the transitions as directed edges from one
node to another. The transitions/edges will be labeled by the alphabet symbol. Start state will be
designated by an arrow mark and final states will be designated by double circles. Here are DFAs for the
three regular expressions discussed before:
The DFA transition functions can also be represented in tabular form as follows:

ab(a+b)*ab

Start State = 1

Final States = 6

FROM SYMBOL TO

1 a 2

1 b 5
2 a 5

2 b 3
3 a 4

3 b 3

4 a 4
4 b 6

5 a 5

5 b 5
6 a 4

6 b 3
(a+b)*

Start State = 1

Final States = 1

FROM SYMBOL TO

1 a 1

1 b 1

aa+bb

Start State = 1

Final States = 4, 5

FROM SYMBOL TO

1 a 2

1 b 3

2 a 4

2 b 6

3 a 6
3 b 5

4 a 6
4 b 6

5 a 6

5 b 6

6 a 6

6 b 6

How does a DFA work?

A DFA can be used to verify if a string belongs to a language or not. All strings that are "accepted" by a
DFA belong to it’s language and those that are "rejected" do not belong to it’s language. How do we
determine "acceptance" and "rejection"?

A configuration for a DFA is a pair, (q, s), where q is a state and s is a string made up of symbols from the
alphabet. Given an input string, w, and a DFA with start state q0, the initial configuration is (q0,w). DFA
moves from one configuration to the next as follows:

(q, ax) => (T(q,a), x)

until it reaches the following configuration

(p,λ )

We say that a string w is accepted by a DFA if (q0,w) =>* (f,λ ) and f is a final state; otherwise it is rejected.

Let us see if the input string abaaab is accepted or rejected by the DFA for ab(a+b)*ab shown earlier.

(1,abaaab) => (2,baaab) => (3,aaab) => (4,aab) => (4,ab) => (4,b) => (5,λ )

Since 5 is a final state, the DFA accepts the string abaaab.

The input string abaaba is rejected because (1,abaaba) => (2,baaba) => (3,aaba) => (4,aba) => (4,ba) =>
(6,a) => (4,λ ) and 4 is not a final state.

Language of DFA, D, L(D) = set of all strings accepted by D

III. Regular Expression to DFA (Direct Algorithm)

It turns out that for every regular expression there is an equivalent DFA (i.e. the language defined by the
regular expression equals the language accepted by the equivalent DFA).

This equivalent DFA is what the PLY and similar compiler-compiler systems use to extract the tokens from
the input string!

ALGORITHM: Convert Regular Expression to DFA

INPUT: regular expression, r

OUTPUT: DFA, D, such that Language(D) = L( r )

METHOD: (To illustrate each step of the algorithm, we will use the regular expression (a+b)*abb as an
example, however the method is general that it will work for any regular expression)

Step 1: Expression Tree

Augment r with a special end symbol # to get r#, e.g. (a+b)*abb#

Using the following grammar, construct an expression tree for r#

re : term | re PLUS term

term : factor | term factor
factor : niggle | factor STAR
niggle : LETTER | EPSILON | LPAREN re RPAREN

Step 2: Unique Number for Leaf Nodes

Assign a unique integer to each leaf node (except for the ϵ leaf) of the expression tree.
Step 3: nullable(n), firstpos(n), lastpos(n)

Traverse the tree to compute nullable(n), firstpos(n), and lastpos(n) for each node, n in the tree using
the following definitions:
Node n nullable(n) firstpos(n) lastpos(n)
Leaf ϵ true {} {}

Leaf i false {i} {i}

lastpos(c1)
nullable(c1) or
(c1 + c2) firstpos(c1) ∪ firstpos(c2) ∪
nullable(c2)
lastpos(c2)

if
nullable(c2)
then
if nullable(c1) then
nullable(c1) and firstpos(c1) ∪ firstpos(c2) lastpos(c1)
(c1 . c2)
nullable(c2) else
∪
firstpos(c1) lastpos(c2)
else

lastpos(c2)

(c1)* true firstpos(c1) lastpos(c1)

The intuition behind these functions are as follows. Let L(n) be the language generated by the subtree
rooted at node n.

nullable(n) = L(n) contains the empty string λ

firstpos(n) = set of positions under n than can match the first symbol of a string in L(n)
lastpos(n) = set of positions under n than can match the last symbol of a string in L(n)

For the example regular expression, the following shows the values of these functions:
Step 4: followpos(n)

Compute followpos(n) for leaf nodes/positions.

followpos(i) = set of positions that can follow position i in any generated string.

followpos(n) can be computed using the following algorithm:

for each node n in the tree do

if n is a concat node with left child c1 and right child c2 then
for each i in lastpos(c1) do
followpos(i) = followpos(i) U firstpos(c2)
else if n is a Kleene star node
for each i in lastpos(n) do
followpos(i) = followpos(i) U firstpos(n)
else
pass

Applying the algorithm to our example, we get the following values of followpos(n):
Node n followpos(n)

1 { 1, 2, 3 }

2 { 1, 2, 3 }

3 {4}

4 {5}

5 {6}

6 {}

Step 5: Generate DFA

s0 = firstpos(root-node); designate it the start state

states = { s0 } and is unmarked
while (there is an unmarked state T in states) do
mark T
for each input symbol 'a' in the alphabet do
let U be the union of followpos(p) for all positions p in T such that
the symbol at position p is 'a'
if U is not empty and not in states then
add U as an unmarked state in states
trans[T,a] = U
Designate any state containing the #-position as a final state

Applying this algorithm to our example, we get:

Initially

s0 = {1,2,3}

states = { {1,2,3} }

Iteration 1 or while loop

T = {1,2,3}

Of the elements of T, 1,3 correspond to a and 2 corresponds to b

{1,2,3} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3} on b transitions to followpos(2) = {1,2,3}

i.e.

trans[{1,2,3},a] = {1,2,3,4}

trans[{1,2,3},b] = {1,2,3}
Iteration 2 or while loop

T = {1,2,3,4}

Of the elements of T, 1,3 correspond to a and 2,4 corresponds to b

{1,2,3,4} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3,4} on b transitions to followpos(2) U followpos(3) = {1,2,3,5}

i.e.

trans[{1,2,3,4},a] = {1,2,3,4}

trans[{1,2,3,4},b] = {1,2,3,5}

Iteration 3 or while loop

T = {1,2,3,5}

Of the elements of T, 1,3 correspond to a and 2,5 corresponds to b

{1,2,3,5} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3,5} on b transitions to followpos(2) U followpos(5) = {1,2,3,6}

i.e.

trans[{1,2,3,5},a] = {1,2,3,4}

trans[{1,2,3,5},b] = {1,2,3,6}

Iteration 4 or while loop

T = {1,2,3,6}

Of the elements of T, 1,3 correspond to a, 2 corresponds to b, and 6 corresponds to #

{1,2,3,6} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3,6} on b transitions to followpos(2) = {1,2,3}

i.e.

trans[{1,2,3,6},a] = {1,2,3,4}

trans[{1,2,3,6},b] = {1,2,3}

We designate {1,2,3,6} as a final state since it contains the position of #

Note: The "marking" of states is not shown above, but we can worry about this in the implementation!
Taking all the values of T and the values of trans, we obtain the following DFA

Pathways 3E 2 Reading Writing Student's Book
0% (2)
Pathways 3E 2 Reading Writing Student's Book
20 pages
30 Essentials For Using AI
100% (2)
30 Essentials For Using AI
137 pages
ELLP Oral Language Matrix - Input-Listening
No ratings yet
ELLP Oral Language Matrix - Input-Listening
1 page
Re To DFA
No ratings yet
Re To DFA
6 pages
Unit 2: Role of Lexical Analyzer
No ratings yet
Unit 2: Role of Lexical Analyzer
11 pages
Theory of Computation: Sathyabama
No ratings yet
Theory of Computation: Sathyabama
92 pages
Regular Expressions: Reading: Chapter 3
No ratings yet
Regular Expressions: Reading: Chapter 3
39 pages
Optimization of DFA Based Pattern Matchers
50% (2)
Optimization of DFA Based Pattern Matchers
17 pages
Automata
No ratings yet
Automata
11 pages
CD 0309
No ratings yet
CD 0309
34 pages
Flat CH 2
No ratings yet
Flat CH 2
86 pages
CT2
No ratings yet
CT2
21 pages
Patterns, Automata, and Regular Expressions
No ratings yet
Patterns, Automata, and Regular Expressions
4 pages
Module 2flat
No ratings yet
Module 2flat
26 pages
Answer Fo Auomata
No ratings yet
Answer Fo Auomata
61 pages
FLAT - Ch. 2 (Lecture Notes)
No ratings yet
FLAT - Ch. 2 (Lecture Notes)
30 pages
Lecture 3 Lexical Analyzer
No ratings yet
Lecture 3 Lexical Analyzer
44 pages
Theory of Automata ASSIGNMENT 1
No ratings yet
Theory of Automata ASSIGNMENT 1
10 pages
Regular Expression, DFA and NFA: Prepared By: Prof. J. S. Dhobi Prof. M. D. Mehta
No ratings yet
Regular Expression, DFA and NFA: Prepared By: Prof. J. S. Dhobi Prof. M. D. Mehta
82 pages
Chapter 7
50% (2)
Chapter 7
66 pages
Regular Expression
No ratings yet
Regular Expression
106 pages
Unit II Regular Expression
No ratings yet
Unit II Regular Expression
176 pages
Compiler Design Lab manual
No ratings yet
Compiler Design Lab manual
32 pages
All TOC E-Lecture Notes
No ratings yet
All TOC E-Lecture Notes
57 pages
CH 3 - Regular Languages Amd Regular Grammars
No ratings yet
CH 3 - Regular Languages Amd Regular Grammars
67 pages
ch2 Engineering
No ratings yet
ch2 Engineering
78 pages
AUTOMATA
No ratings yet
AUTOMATA
120 pages
Deterministic Finite Automata
No ratings yet
Deterministic Finite Automata
3 pages
Unit 2 Pattern Matches
No ratings yet
Unit 2 Pattern Matches
36 pages
Unit 1 RE DFA Direct
No ratings yet
Unit 1 RE DFA Direct
34 pages
Lecture 4 Lexical Analyzer
No ratings yet
Lecture 4 Lexical Analyzer
43 pages
Kleene's Theorem: Department of Computer Science
No ratings yet
Kleene's Theorem: Department of Computer Science
46 pages
Regular Expressions: Definitions Equivalence To Finite Automata
No ratings yet
Regular Expressions: Definitions Equivalence To Finite Automata
29 pages
Theoretical Computer Science previous year question paper
No ratings yet
Theoretical Computer Science previous year question paper
69 pages
Spring 2024 Compiler Constructoin A Lab 3-2
No ratings yet
Spring 2024 Compiler Constructoin A Lab 3-2
16 pages
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
No ratings yet
CS 346: Compilers: Lexical Analyzer Lexical Analyzer
52 pages
Model _I_TOC_QP_B_Scheme of Evaluation
No ratings yet
Model _I_TOC_QP_B_Scheme of Evaluation
11 pages
FLAT - Ch.2
No ratings yet
FLAT - Ch.2
86 pages
Toc Unit-Ii
No ratings yet
Toc Unit-Ii
23 pages
Deterministic Finite State Automata: Sipser Pages 31-46
No ratings yet
Deterministic Finite State Automata: Sipser Pages 31-46
19 pages
Toc
No ratings yet
Toc
1 page
Compiler Design CA1
No ratings yet
Compiler Design CA1
10 pages
Scsb1303 Toc Notes
No ratings yet
Scsb1303 Toc Notes
112 pages
Phases of Compiler PDF
No ratings yet
Phases of Compiler PDF
63 pages
Minimization of DFA
100% (1)
Minimization of DFA
25 pages
Unit 1.2
No ratings yet
Unit 1.2
93 pages
Non Deterministic Finite Automata (NFA)
No ratings yet
Non Deterministic Finite Automata (NFA)
26 pages
Theory of Automata Notes
No ratings yet
Theory of Automata Notes
29 pages
Chapter 2 RegularExpressions (3)
No ratings yet
Chapter 2 RegularExpressions (3)
95 pages
Regular Expression: Operations On Regular Language
No ratings yet
Regular Expression: Operations On Regular Language
33 pages
Flat Unit
No ratings yet
Flat Unit
18 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Chapter 2 RegularExpressions
No ratings yet
Chapter 2 RegularExpressions
95 pages
Conversion DFA To Regular Expression
No ratings yet
Conversion DFA To Regular Expression
6 pages
02 Automata
No ratings yet
02 Automata
78 pages
toc unit2
No ratings yet
toc unit2
24 pages
CS375 Automata 2 PDF
No ratings yet
CS375 Automata 2 PDF
73 pages
Toc U2
No ratings yet
Toc U2
31 pages
3B-Formal Languages
No ratings yet
3B-Formal Languages
24 pages
Gold PDF
No ratings yet
Gold PDF
48 pages
Final Revision FLAT
No ratings yet
Final Revision FLAT
22 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
KCP_KAT_VON_D_VEGAN_01.15.24
No ratings yet
KCP_KAT_VON_D_VEGAN_01.15.24
6 pages
بسم الله الرحمن الرحیم
No ratings yet
بسم الله الرحمن الرحیم
8 pages
Requierment Gatehring
No ratings yet
Requierment Gatehring
7 pages
-ایمان او مادیت معرکه
No ratings yet
-ایمان او مادیت معرکه
147 pages
NASCIMENTO, Gabriel - Racism in English Language Teaching? Autobiographical Narratives of Black English Language Teachers
No ratings yet
NASCIMENTO, Gabriel - Racism in English Language Teaching? Autobiographical Narratives of Black English Language Teachers
26 pages
ELT METHODOLOGY AND INNOVATION CLASS PROJECT (Fitra)
No ratings yet
ELT METHODOLOGY AND INNOVATION CLASS PROJECT (Fitra)
9 pages
The Grammar of English Comparing To The Grammar of Kazakh As A Foreign
No ratings yet
The Grammar of English Comparing To The Grammar of Kazakh As A Foreign
7 pages
Maharlika Name and Origin
No ratings yet
Maharlika Name and Origin
2 pages
Costa Calida Chronicle July 2011
No ratings yet
Costa Calida Chronicle July 2011
112 pages
Subjectguide
No ratings yet
Subjectguide
25 pages
Desuggestopedia: Prepared By: Dr. Khalid Al-Nafisah
No ratings yet
Desuggestopedia: Prepared By: Dr. Khalid Al-Nafisah
25 pages
Cohesive Devices
No ratings yet
Cohesive Devices
10 pages
Vowels and Consonants
No ratings yet
Vowels and Consonants
4 pages
ORAL-COMMUNICATION Module Week 2
100% (3)
ORAL-COMMUNICATION Module Week 2
14 pages
In The 21St Century: Informative Essay
No ratings yet
In The 21St Century: Informative Essay
3 pages
Proiectarea Pe Unitati 8
No ratings yet
Proiectarea Pe Unitati 8
12 pages
Developing and Designing A Lesson
No ratings yet
Developing and Designing A Lesson
75 pages
DynamoPrimer Print
No ratings yet
DynamoPrimer Print
477 pages
(English-Arabic) CS50P - Introduction (DownSub - Com)
No ratings yet
(English-Arabic) CS50P - Introduction (DownSub - Com)
5 pages
noun phrases
No ratings yet
noun phrases
23 pages
Bài tập mệnh đề quan hệ
No ratings yet
Bài tập mệnh đề quan hệ
121 pages
The Teaching of English In: C o S I T, I
No ratings yet
The Teaching of English In: C o S I T, I
6 pages
Central Dialects
No ratings yet
Central Dialects
25 pages
Unit 7 - Food
No ratings yet
Unit 7 - Food
2 pages
Com Skills Textbk
No ratings yet
Com Skills Textbk
117 pages
Affixes For FCE
No ratings yet
Affixes For FCE
13 pages
Lets Write
No ratings yet
Lets Write
1 page
Syllabus of CAAN Admin 6th Level
No ratings yet
Syllabus of CAAN Admin 6th Level
7 pages
The Polynesian Tattoo Handbook Sample
No ratings yet
The Polynesian Tattoo Handbook Sample
9 pages
Domain Driven Design and Development in Practice
No ratings yet
Domain Driven Design and Development in Practice
26 pages
10-HealthCare_Intelligence_Platform-SujataKhedkar
No ratings yet
10-HealthCare_Intelligence_Platform-SujataKhedkar
56 pages

Uploaded by

Uploaded by

Regular Expressions and Deterministic Finite Automata

L1 = { aab, aba, bab, aa }

We define 3 operations on Languages. Let L, L1, and L2 be languages. Then,

1. Every symbol in Σ is a regular expression.

1. L(a) = { a }, for any a in Σ

L(ab(a+b)*ab) = { w | w starts with ab and ends with ab }

II. Deterministic Finite Automata

A deterministic finite automata (DFA) is defined as a 4-tuple (Q,T,S,F), where

Q is a finite set of states

How does a DFA work?

(q, ax) => (T(q,a), x)

Since 5 is a final state, the DFA accepts the string abaaab.

Language of DFA, D, L(D) = set of all strings accepted by D

III. Regular Expression to DFA (Direct Algorithm)

ALGORITHM: Convert Regular Expression to DFA

INPUT: regular expression, r

OUTPUT: DFA, D, such that Language(D) = L( r )

Step 1: Expression Tree

Augment r with a special end symbol # to get r#, e.g. (a+b)*abb#

Using the following grammar, construct an expression tree for r#

re : term | re PLUS term

Step 2: Unique Number for Leaf Nodes

Leaf i false {i} {i}

(c1)* true firstpos(c1) lastpos(c1)

nullable(n) = L(n) contains the empty string λ

Compute followpos(n) for leaf nodes/positions.

followpos(n) can be computed using the following algorithm:

for each node n in the tree do

Step 5: Generate DFA

s0 = firstpos(root-node); designate it the start state

Applying this algorithm to our example, we get:

Iteration 1 or while loop

Of the elements of T, 1,3 correspond to a and 2 corresponds to b

{1,2,3} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3} on b transitions to followpos(2) = {1,2,3}

Of the elements of T, 1,3 correspond to a and 2,4 corresponds to b

{1,2,3,4} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3,4} on b transitions to followpos(2) U followpos(3) = {1,2,3,5}

Iteration 3 or while loop

Of the elements of T, 1,3 correspond to a and 2,5 corresponds to b

{1,2,3,5} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3,5} on b transitions to followpos(2) U followpos(5) = {1,2,3,6}

Iteration 4 or while loop

Of the elements of T, 1,3 correspond to a, 2 corresponds to b, and 6 corresponds to #

{1,2,3,6} on a transitions to followpos(1) U followpos(3) = {1,2,3,4}

{1,2,3,6} on b transitions to followpos(2) = {1,2,3}

We designate {1,2,3,6} as a final state since it contains the position of #

You might also like