Syntax Analysis I 2024
Syntax Analysis I 2024
Dr. R Guru
Associate Professor
Dept. of Computer Science and Engineering,
SJCE, JSSSTU, Mysuru
Introduction
• Every programming language has rules that prescribe the
syntactic structure of programs.
• The syntax of programming language can be described by
context-free grammars.
• A grammar gives precise, easy-to-understand, syntactic
specification of a programming language.
• A grammar gives structure of a programming language which
is useful in translation of source program to object code and
for the detection of errors.
The role of the parser
• The input for the parser is a stream of tokens from lexical
analysis and output is a parse tree, in which tokens are
grouped hierarchically with collective meaning.
• It should report: any syntax errors, recover from commonly
occurring errors, collecting information about various tokens,
performing type checking, generating intermediate code, etc.
• The most efficient top-down and bottom-up methods work
only for subclasses of grammars, but several of these classes,
particularly, LL and LR grammars, are expressive enough to
describe most of the syntactic constructs in modern
programming languages.
• Three general types of parsers for grammar are: Universal
parser ,Top-down parser, Bottom-up parser.
• Universal parser :Universal parsing methods such as the
Cocke-Younger-Kasami algorithm and Earley's algorithm can
parse any grammar. These general methods are too inefficient
to use in production compilers.
• Top-down : Build parse tree from root to the leaves.
• Bottom-up: Build parse tree from leaves to the root.
• Input to the parser is scanned from left to right, one symbol at
a time.
Reading Assignment
S S S
c A d c A d c A d
a b a
First and Follow
The construction of both top down and bottom parser is aided
with two function
1. FIRST
2. FOLLOW
During top-down parsing , FIRST and FOLLOW allow us to
choose which production to apply, based on the next input
symbol.
FIRST(α)
• FIRST(α), where α is any string of grammar symbols, to be
the set of terminals that begin strings derived from α.
• If α *ɛ then ɛ is also in FIRST(α )
• In predictive parsing when we have A α │β, where
FIRST(α) and FIRST(β) are disjoint sets .
• Select appropriate A-production by looking at the next input
symbol a, since a can be in at most one of FIRST (α ) and
FIRST(β ), not both.
Computing FIRST(X)
To compute FIRST(X) for all grammar symbols X, apply the
following rules until no more terminals or can be added to any
FIRST set.
1. If X is a terminal, then FIRST(X) = {X}.
2. If X is a nonterminal and X Y1Y2 … Yk is a production for
some k ≥ 1, then place a in FIRST(X) if for some i , a is in
FIRST(Yi), and Є is in all of FIRST(Y1),... , FIRST(Yi-1); that
is, Y1 …Yi-1 * Є.
If Є is in FIRST( Yj) for all j = 1,2,... , k, then add Є to
FIRST(X).
3. If X Є is a production, then add Є to FIRST(X).
Example :1
ETE’
E’ + TE’│Є
T FT’
T’ * FT’│ Є
F id│(E)
Follow(A)
• Follow(A), for any nonterminal A, is set of terminals a that
can appear immediately to the right of A in some sentential
form
• That is, the set of terminal a such that there exists a derivation
of the form S * αAaβ for some α and β, then a is in
Follow(A)
• If A can be the rightmost symbol in some sentential form, then
$ is in Follow(A)
Computing FOLLOW(A)
To compute FOLLOW(A) for all nonterminals A, apply the
following rules until nothing can be added to any FOLLOW
set.
1. Place $ in FOLLOW(S), where S is the start symbol, and $ is
the input right endmarker.
2. If there is a production A αBβ, then everything in FIRST(β)
except Є is in FOLLOW(B).
3. If there is a production A αB, or a production A αBβ,
where FIRST(β) contains Є, then everything in FOLLOW(A)
is in FOLLOW( B ) .
Example :1
ETE’
E’ + TE’│Є
T FT’
T’ * FT’│ Є
F id│(E)
First and follow
Nonterminal FIRST FOLLOW
E { id, (} { $, )}
E’ {+, Є} {$, )}
T { id, ( } {+, $, )}
T’ { *, Є} {+, $, )}
F { id, ( } {+, *,$, )}
Parsing table
LL(1) Grammar
• Predictive Parsers needing no backtracking, can be
constructed for a class of grammars called LL(1).
• The first "L" in LL(1) stands for scanning the input
from left to right,
• The second "L" for producing a leftmost derivation,
• " 1 " for using one input symbol of lookahead at each
step to make parsing action decisions.
Definition of LL(1) Grammar
A grammar G is LL(1) if and only if whenever A α │ β are
two distinct productions of G, the following conditions hold:
1. For no terminal a do both α and β derive strings beginning
with a.
2. At most one of α and β can derive the empty string.
3. If β * Є, then α does not derive any string beginning with a
terminal in FOLLOW (A). Likewise, if α * Є, then β does
not derive any string beginning with a terminal in
FOLLOW(A).
Construction of a predictive parsing table
Input : Grammar G.
Output : Parsing table M.
Method: For each production A α a of the grammar, do the
following:
1. For each terminal a in FIRST(A), add A α to M[A, a].
2. If Є is in FIRST(α), then for each terminal b in FOLLOW(A),
add A α to M[ A, b]. If Є is in FIRST (α ) and $ is in
FOLLOW(A), add A α to M[A, $] as well.
After performing the above, there is no production at all in
M[A, a], then set M[A, a] to error
Nonrecursive Predictive Parsing
• A nonrecursive predictive parser can be built by maintaining a
stack explicitly, rather than implicitly via recursive calls.
• The parser mimics a leftmost derivation.
• If w is the input that has been matched so far, then the stack
holds a sequence of grammar symbols α such that
S* w α
• The table-driven parser has an input buffer, a stack containing
a sequence of grammar symbols, a parsing table and an output
stream.
• The input buffer contains the string to be parsed, followed by
the endmarker $.
Model of Table-driven predictive parser
• We reuse the symbol $ to mark the bottom of the stack, which
initially contains the start symbol of the grammar on top of $.
• The parser is controlled by a program that considers X, the
symbol on top of the stack, and a, the current input symbol.
• If X is a nonterminal, the parser chooses an X-production by
consulting entry M[X, a] of the parsing table M.
• Otherwise, it checks for a match between the terminal X and
current input symbol a.
Predictive parsing algorithm
Input : A string W and a parsing table M for grammar G
Output: if w is in L(G), a leftmost derivation of w; otherwise , an error
indication
Method: Initially, the parser is in a configuration with w$ in the
input buffer and the start symbol S of G on top of the stack, above $.
Set ip point to the first symbol of w;
Set X to the top stack symbol;
While (X ≠ $) { /* stack is not empty */
if (X is a) pop the stack and advance ip;
else if (X is a terminal) error();
else if (M[X, a] is an error entry) error();
else if (M[X, a] = X Y1Y2..Yk) {
output the production X Y1Y2..Yk ;
pop the stack;
push Yk ,…, Y2 Y1on to the stack with Y1 on top;
}
set X to the top stack symbol;
}
Grammar
ETE’
E’ + TE’│Є
T FT’
T’ * FT’│ Є
F id│(E)
Parsing Table
Non Input symbol
terminal id + * ( ) $
E ETE’ ETE’
E’ E’ + TE’ E’ Є E’ Є
T T FT’ T FT’
T’ T’ Є T’ * FT’ T’ Є T’ Є
F F id F (E)
Moves made by a predictive parser on input id + id * id
id+id*id $ $
Example-2
Consider the grammar
S → iEtSS` | a
S` → eS | ɛ
E→b
i)Find FIRST and FOLLOW
ii)Construct the predictive parsing table
Example-3
6. SA
A BC│DBC
B Bb│ Є
C c │Є
D a│ d
6. S ACB│CbB│Ba
A da│BC
B g│ Є
C h │Є