Formal Language & Automata Theory
Formal Language & Automata Theory
Alphabet
An alphabet is a finite, non-empty set of symbols. It is denoted by the Greek letter Σ (Sigma).
Example:
The alphabet forms the basic building blocks of strings and languages.
String
Language
Grammar
V = {S}
Σ = {a, b}
P = {S → aSb, S → ab}
S=S
Productions
Denoted as A → α, where:
o A is a non-terminal
Example:
In a grammar:
S → aSb
S → ab
These are production rules.
Derivation
S → aSb | ab
Derive "aabb":
1. S → aSb
2. aSb → aaSbb
A tree representation showing how the start symbol leads to a string in the language.
MODULE 2:
1. Regular Expressions and Languages
Basic Symbols
Operations
Examples
Regular Languages
A language is regular if it can be described by a regular expression or recognized by a finite
automaton.
A DFA is a finite state machine that accepts or rejects strings from a language based on a
unique computation path.
Definition
A DFA is a 5-tuple:
M = (Q, Σ, δ, q₀, F) where:
Σ: Input alphabet
δ: Transition function (Q × Σ → Q)
Working
For each input symbol, the machine follows one unique transition.
Example
Start: q0
Accepting: F = {q1}
Transition:
Every regular expression can be converted into an equivalent DFA and vice versa.
Conversions
RE to NFA: Use Thompson’s construction.
This equivalence proves that regular expressions and DFAs recognize the same class of
languages — the regular languages.
An NFA is similar to a DFA but allows multiple transitions for the same input symbol or ε-
transitions.
Definition
An NFA is a 5-tuple:
M = (Q, Σ, δ, q₀, F) where:
Properties
Example
Start: q0
Accepting: F = {q2}
Transitions:
δ(q1, b) = {q2}
Conclusion
NFA and DFA recognize the same class of languages — regular languages.
Regular Grammar
A → aB or A → a (right-linear grammar)
or A → Ba or A → a (left-linear)
Equivalence
Every regular grammar can be converted to a finite automaton, and vice versa.
Example
Grammar:
S → aA
A → bB
B→ε
Recognizes string "ab"
2. Intersection: L₁ ∩ L₂ is regular.
3. Complement: ¬L is regular.
6. Difference: L₁ - L₂ is regular.
These properties are useful for proving language regularity or constructing complex
languages from simple ones.
Statement
1. |y| > 0
2. |xy| ≤ p
Use
Choose a string w in L, assume it’s regular, apply the lemma, and show contradiction.
Example
Minimization is the process of reducing the number of states in a DFA while preserving its
language.
Steps:
1. Remove unreachable states.
2. Merge equivalent states (states that cannot be distinguished by any input string).
Hopcroft’s Algorithm
Example
If two states q₁ and q₂ always lead to same behavior for all inputs, they are merged.
Why minimize?
Simplifies implementation.
MODULE 3:
1. Context-Free Grammars (CFG) and Context-Free Languages (CFL)
S: Start symbol (S ∈ V)
Example:
V = {S}
Σ = {(, )}
P = {S → SS | (S) | ε}
S=S
A language L is context-free if there exists a CFG G such that L = L(G), i.e., all strings derivable
from G.
Applications of CFLs
Language translation
Every context-free grammar can be converted to an equivalent grammar in CNF, where all
productions are:
A → BC (B, C ∈ V, A ∈ V)
A → a (a ∈ Σ)
S → ε (only if ε ∈ L)
Why CNF?
It ensures that derivation always begins with a terminal symbol, making it useful for top-
down parsing.
Conversion Process
A PDA is a 7-tuple:
M = (Q, Σ, Γ, δ, q₀, Z₀, F) where:
Q: Set of states
Σ: Input alphabet
Γ: Stack alphabet
Working
Every CFG has an equivalent PDA that accepts the same language.
Conclusion:
CFLs = Languages accepted by PDAs
4. Parse Trees
A parse tree (derivation tree) shows how a string is derived from a CFG.
Properties
Example
For grammar:
S → aSb | ε
Parse tree for aabb:
/|\
a S b
/\
a S b
Parse trees help in understanding syntactic structure, essential in compiler syntax analysis.
5. Ambiguity in CFG
A CFG is ambiguous if there exists a string that can be generated by more than one parse
tree or leftmost derivation.
Example:
Grammar:
E → E + E | E * E | id
String: id + id * id
Can be parsed as:
(id + id) * id or
id + (id * id)
Inherent Ambiguity
Some CFLs are inherently ambiguous — no unambiguous grammar exists for them.
Why it matters?
Statement
If L is an infinite CFL, then ∃ a constant p (pumping length), such that any string z ∈ L, |z| ≥
p, can be split into z = uvwxy such that:
1. |vwx| ≤ p
2. vx ≠ ε
3. ∀i ≥ 0: u vⁱ w xⁱ y ∈ L
Application:
To prove a language is not context-free, assume it is, apply the lemma, and derive
contradiction.
Example
L = {aⁿbⁿcⁿ | n ≥ 1}
This is not a CFL. Proof uses pumping lemma and shows imbalance after pumping.
Definition
For every (state, input symbol, top of stack), there is at most one possible transition
(i.e., δ is deterministic).
Language Acceptance
DPDAs can recognize only a subset of CFLs, called Deterministic CFLs (DCFLs).
Applications
Substitution
Difference
Example
MODULE 4:
1. Context-Sensitive Grammars (CSG) and Languages
🔹 Context-Sensitive Grammar (CSG)
Definition:
A∈V
α, β ∈ (V ∪ Σ)*
γ ∈ (V ∪ Σ)+ (γ ≠ ε)
Length of RHS ≥ length of LHS
This restriction means no contraction (the output cannot be shorter than the input). The only
exception is the rule:
S → ε is allowed only if ε ∈ L(G) and S doesn’t appear on the right-hand side of any
rule.
Key Property:
Example:
S → abc | aSBC
CB → BC
aB → ab
bB → bb
bC → bc
cC → cc
This grammar ensures that for each a, there’s a corresponding b and c. These rules simulate
counting.
Applications of CSGs:
Definition:
That means:
If the input is of length n, the LBA can use only O(n) cells on the tape.
Formal Structure:
An LBA is a 7-tuple:
M = (Q, Σ, Γ, δ, q₀, q_accept, q_reject) where:
Γ: Tape alphabet (Σ ⊂ Γ)
Σ: Input alphabet
δ: Transition function (Q × Γ → Q × Γ × {L, R})
q₀: Start state
q_accept: Accepting state
q_reject: Rejecting state
With a restriction:
The machine never moves outside the segment of the tape where the input is written.
Example:
Unlike finite automata and PDA, an LBA can remember how many of each symbol exists
and ensure strict matching — hence suitable for CSLs.
Theorem:
A language is context-sensitive if and only if it is accepted by a linear bounded
automaton.
This means:
Why Important?
A Turing Machine (TM) is the most powerful computational model in automata theory,
capable of simulating any algorithm. It is an abstract machine introduced by Alan Turing
in 1936 to formalize the concept of computation.
Formal Model
Working
Power
Key Points:
Difference:
🔁 Closure Properties
Operation Decidable RE
Union ✅ ✅
Intersection ✅ ✅
Complement ✅ ❌
Concatenation ✅ ✅
Kleene Star ✅ ✅
Difference ✅ ❌
Homomorphism ✅ ✅
Inverse Homomorphism ✅ ✅
🔹 Variants:
1. Multi-tape TM: Multiple tapes and heads, but equivalent in power to single-tape
TMs (just faster).
2. Multi-track TM: A single tape with multiple tracks (parallel memory cells).
3. TM with stay option: Instead of moving L or R, head can stay (S).
4. Offline TM: Input tape is read-only, useful for compiler simulations.
5. Two-way infinite tape: Both sides of the tape are infinite (same power).
6. Non-deterministic TM: Multiple choices for a move — see next topic.
Theoretical Equivalence
All variants recognize exactly the recursively enumerable languages — only the efficiency
changes, not power.
NTM allows multiple possible transitions for a given input and state.
Think of NTM as "trying all possible computations in parallel."
Acceptance
If at least one computation path accepts, the NTM accepts the string.
If all paths reject or loop, the string is not accepted.
🔁 Equivalence
✅ Conclusion:
o α ∈ (V ∪ Σ)+
α → β, where:
o β ∈ (V ∪ Σ)*
No restriction on the form of rules (can shrink, grow, or stay the same length).
Generative Power
🔁 Equivalence Theorem
For every unrestricted grammar, there is a Turing Machine that accepts the same
language.
For every TM, there is an unrestricted grammar generating the same language.
Example
S → aSBC | abc
CB → BC
aB → ab, bB → bb
bC → bc, cC → cc
🔹 What is an Enumerator?
It generates strings of a language one by one, possibly with repetitions or in infinite time.
Formal Definition:
🔁 Theorem:
Why It Matters
MODULE 6:
1. Church-Turing Thesis
The Church-Turing Thesis is a hypothesis (not a formal theorem) which proposes that:
“Any function that can be effectively computed (intuitively, by any algorithm) can be
computed by a Turing machine.”
🔹 Why Important?
🔹 Limitations:
It’s a thesis, not a provable theorem.
It doesn’t imply everything can be computed — only that what can be computed
can be done by a TM.
A Universal Turing Machine is a Turing machine that can simulate any other Turing
machine.
It takes as input a description of another TM and an input string, and simulates the
computation.
Formal Input:
🔁 Importance:
🔹 Real-life Analogy:
Defined as:
LD = { wᵢ | TM Mᵢ does not accept wᵢ }
Where:
All TMs and strings are enumerated as M₁, M₂, ..., w₁, w₂, ...
It is based on Cantor's diagonalization idea.
💥 Key Result:
LD is not Turing-recognizable.
It shows that there exist languages which are not even recursively enumerable, i.e.,
no TM can accept all strings in LD.
🔁 Why Important?
x ∈ A ⇔ f(x) ∈ B
Reductions are done using a computable function f such that:
🔹 Use of Reductions
🔹 Rice’s Theorem
Formal Statement:
🔹 Non-trivial Property
Does M = M?
Is q₀ the start state?
Is L(M) = ∅?
Halting Problem
Emptiness Problem ❌
Universality Problem Is L(M) = Σ*? ❌
Equivalence Problem Is L(M₁) = L(M₂)? ❌
Finiteness Problem Is L(M) finite? ❌
🔁 Why Undecidable?
These problems can’t be solved by any Turing machine for all possible inputs.
Their solution would require solving the Halting Problem, which is impossible.
📌 Summary Table
Concept Key Point
Church-Turing Thesis Every computable function = TM-computable
Universal TM TM that simulates other TMs
LU & LD LU is RE but not decidable; LD is not RE
Reduction Technique to transfer (un)decidability
Rice’s Theorem Any non-trivial property of L(M) is undecidable
Language Problems Halting, emptiness, universality, etc. are undecidable
1 Define and differentiate between an alphabet, a string, and a languag. 2 Explain the
structure and components of a formal gramma. 3 Describe the process of derivation in the
context of formal grammar. 4 Illustrate the Chomsky hierarchy of languages with
examples for each typ. 5 Compare and contrast regular, context-free, context-sensitive,
and recursively enumerable language.
🔹 Practice Questions- Given a specific grammar, identify its type within the
Chomsky hierarcy.- Construct derivation trees for given strings using a
specified grammr.- Determine whether a particular language belongs to a
specific class in the Chomsky hierarcy.
🔹 Practice Questins
🔹 Practice Quetions
Design a CSG for a language where the number of a's equals the number of b's and
cs cmbined.
Construct an LBA that accepts strings of the ormaⁿbⁿcⁿ.
Determine whether a given language is context-sensitive and provide
jutification.
🔹 Practie Questions
🔹 Pratice Questions
🧠 Context-Sensitive Languages