u2 (2)
u2 (2)
A parser performs syntactic and semantic analysis of source code, converting it into an
intermediate representation while detecting and handling errors.
1. Context-free syntax analysis: The parser checks if the structure of the code follows
the basic rules of the programming language (like grammar rules). It looks at how
words and symbols are arranged.
2. Guides context-sensitive analysis: It helps with deeper checks that depend on the
meaning of the code, like making sure variables are used correctly. For example, it
ensures that a variable used in a mathematical operation, like x + 2, is a number and
not text.
3. Constructs an intermediate representation: The parser creates a simpler version of
your code that’s easier for the computer to understand and work with.
4. Produces meaningful error messages: If there’s something wrong in your code, the
parser tries to explain the problem clearly so you can fix it.
5. Attempts error correction: Sometimes, the parser tries to fix small mistakes in your
code so it can keep working without breaking completely.
Types of Parsing
The parsing is divided into two types, which are as follows:
Top-down Parsing
Bottom-up Parsing
Top-Down Parsing
Top-down parsing is a method of building a parse tree from the start symbol (root)
down to the leaves (end symbols). The parser begins with the highest-level rule and
works its way down, trying to match the input string step by step.
Process: The parser starts with the start symbol and looks for rules that can help it
rewrite this symbol. It keeps breaking down the symbols (non-terminals) into
smaller parts until it matches the input string.
Leftmost Derivation: In top-down parsing, the parser always chooses the leftmost
non-terminal to expand first, following what is called leftmost derivation. This means
the parser works on the left side of the string before moving to the right.
Other Names: Top-down parsing is sometimes called recursive parsing or predictive
parsing. It is called recursive because it often uses recursive functions to process the
symbols.
Top-down parsing is useful for simple languages and is often easier to implement.
However, it can have trouble with more complex or ambiguous grammars.
Top-down parsers can be classified into two types based on whether they use
backtracking or not:
1. Top-down Parsing with Backtracking
In this approach, the parser tries different possibilities when it encounters a choice, If
one possibility doesn’t work (i.e., it doesn’t match the input string), the parser
backtracks to the previous decision point and tries another possibility.
Example: If the parser chooses a rule to expand a non-terminal, and it doesn’t work, it
will go back, undo the choice, and try a different rule.
Advantage: It can handle grammars where there are multiple possible ways to expand a
non-terminal.
Disadvantage: Backtracking can be slow and inefficient because the parser might have
to try many possibilities before finding the correct one.
Recursive descent parser is a top-down parser that processes input based on a set of
recursive functions, where each function corresponds to a grammar rule. It parses the
input from left to right, constructing a parse tree by matching the grammar’s production
rules. This parser is simple to implement and is suitable for LL(1) grammars, where
decisions can be made based on a single lookahead token. While straightforward,
recursive descent parsers struggle with left-recursive grammars and may require
grammar transformations to handle such cases effectively.
A Predictive Parser is a special case of Recursive Descent Parser, where no Back Tracking
is required.
By carefully writing a grammar, means eliminating left recursion and left factoring from
it, the resulting grammar will be a grammar that can be parsed by a recursive descent
parser.
By carefully writing a grammar means eliminating left recursion and left factoring from
it, the resulting grammar will be a grammar that can be parsed by a recursive descent
parser.
Example:
Before removing left recursion After removing left recursion
E –> T E’
E –> E + T | T E’ –> + T E’ | e
T –> T * F | F T –> F T’
F –> ( E ) | id T’ –> * F T’ | e
F –> ( E ) | id
Here the 1st L represents that the scanning of the Input will be done from the Left to
Right manner and the second L shows that in this parsing technique, we are going to use
the Left most Derivation Tree. And finally, the 1 represents the number of look-ahead,
which means how many symbols you will see when you want to make a decision.
Conditions for an LL(1) Grammar
To construct a working LL(1) parsing table, a grammar must satisfy these conditions:
No Left Recursion: Avoid recursive definitions like A -> A + b.
Unambiguous Grammar: Ensure each string can be derived in only one way.
Left Factoring: Make the grammar deterministic, so the parser can proceed without
guessing.
Step 1: First check all the essential conditions mentioned above and go to step 2.
*ε denotes epsilon
Step 1: The grammar satisfies all properties in step 1.
Step 2: Calculate first() and follow().
Find their First and Follow sets:
First Follow
E’ –> +TE’/ { +, ε } { $, ) }
First Follow
T’ –> *FT’/
{ *, ε } { +, $, ) }
ε
id + * ( ) $
E’ –> E’ –>
E’ E’ –> +TE’
ε ε
T’ –> T’ –>
T’ T’ –> ε T’ –> *FT’
ε ε
As you can see that all the null productions are put under the Follow set of that symbol
and all the remaining productions lie under the First of that symbol.
Bottom-Up Parsing:
Bottom-up parsing is a method of building a parse tree starting from the leaf nodes (the
input symbols) and working towards the root node (the start symbol). The goal is to
reduce the input string step by step until we reach the start symbol, which represents
the entire language.
Process: The parser begins with the input symbols and looks for patterns that can be
reduced to non-terminals based on the grammar rules. It keeps reducing parts of the
string until it forms the start symbol.
Rightmost Derivation in Reverse: In bottom-up parsing, the parser traces the
rightmost derivation of the string but works backwards, starting from the input
string and moving towards the start symbol.
Shift-Reduce Parsing: Bottom-up parsers are often called shift-reduce parsers
because they shift (move symbols) and reduce (apply rules to replace symbols) to
build the parse tree.
Bottom-up parsing is efficient for handling more complex grammars and is commonly
used in compilers. However, it can be more challenging to implement compared to top-
down parsing.
Example
Recursive descent, LL parser. Shift-reduce, LR parser.
Parsers
Bottom-up parsing- is a type of syntax analysis method where the parser starts
from the input symbols (tokens) and attempts to reduce them to the start symbol of the
grammar (usually denoted as S). The process involves applying production rules in
reverse, starting from the leaves of the parse tree and working upwards toward the root.
Here’s how it works:
Start with tokens: The parser begins with the terminal symbols (the input tokens),
which are the leaves of the parse tree.
Example:
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id
Example:
Rule-02:
An identifier is always given the higher precedence than any other symbol.
$ symbol is always given the lowest precedence.
Rule-03:
Step-02:
Start scanning the string from LHS in the forward direction until > symbol is
encountered.
Keep a pointer on that location.
Step-03:
Start scanning the string from RHS in the backward direction until < symbol is
encountered.
Keep a pointer on that location.
Step-04:
Everything that lies in the middle of < and > forms the handle.
Replace the handle with the head of the respective production.
Step-05:
Keep repeating the cycle from Step-02 to Step-04 until the start symbol is reached.
Advantages-
Disadvantages-
The handling of tokens known to have two different precedence becomes difficult.
Only small class of grammars can be parsed using this parser.
A table defining precedence relationships among operators is required for the parser to
function. Example:
Operator + * ( ) $
LR Parsing-
LR Parser is a type of bottom-up parsing. It is used to parse context-free grammar,
primarily used by compilers. The LR parser reads the input from left to right and produces
the rightmost derivation.
The LR parsing can be further divided into four types-
The LR parser is denoted as LR(k), where the letter “L” refers to the left to right scanning of
the input, the letter “R” refers to the rightmost derivation in reverse, and “k” refers to the
number of unconsumed “look ahead” input symbols that are used in making parser
decisions. The letter k is usually one and is frequently omitted. A context-free grammar is
called LR(k) if the LR(k) parser exists.
LR Algorithm
The LR (Left-to-Right) algorithm, commonly referred to in the context of parsing, is a
technique used to analyze a sequence of input symbols to determine its grammatical
structure with respect to a given formal grammar. It is widely used in the construction of
compilers and interpreters.
1. Initialization: Start with an empty stack and place a special end-of-input marker ($)
at the bottom. Place the start symbol of the grammar on top of the stack.
2. Read Input: Read the input string from left to right, symbol by symbol.
3. Shift and Reduce Operations: Shift, move the current input symbol to the stack and
advance to the next input symbol. Reduce, if the stack matches the right-hand side of
a production rule, replace it with the left-hand side non-terminal.
4. Parser Actions: Use a parsing table to decide whether to shift, reduce, accept, or
error. Accept, if the stack contains the start symbol and the input is fully consumed,
the string is successfully parsed. Error, if no valid action exists for the current stack
and input symbol, report a syntax error.
5. Termination: The algorithm terminates when the input is successfully parsed (accept
state) or a syntax error is encountered.
Block Diagram
Below described is the block diagram of the LR parser.
The input buffer takes one symbol at a time from left to right a1, a2,..., am,$.
The stack contains the combination of the state symbol and the current input symbol.
The parsing table is a two-dimensional array containing the Action and GoTo tables.
SLR Parser
LR parser is also called as SLR parser
it is weakest of the three methods but easier to implement
a grammar for which SLR parser can be constructed is called SLR grammar
Steps for constructing the SLR parsing table
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the
parsing table
Solution:
STEP1: Find augmented grammar
The augmented grammar of the given grammar is:-
S’–>.S [0th production]
S–>.AA [1st production]
A–>.aA [2nd production]
A–>.b [3rd production]
FOLLOW(S)=$
FOLLOW(A)=a,b,$
Youtube Links-
https://youtu.be/WTxdKQmsfho?si=pB0qV4s-htUIwlVq
https://youtu.be/4NWgdNquqEo?si=-D9gizQgi6l-o1rU
https://youtu.be/qTJR9Jw_i2I?si=Uy1vOEiWMJZYDhHG