0% found this document useful (0 votes)
5 views

Parser

The document discusses context-free grammars and their role in programming language syntax, detailing various types of parsers including top-down and bottom-up methods. It addresses common programming errors, error recovery strategies, and the differences between context-free grammars and regular expressions. Additionally, it covers concepts like FIRST and FOLLOW sets, LL(1) grammars, and parsing techniques such as recursive-descent and shift-reduce parsing.

Uploaded by

paled27319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Parser

The document discusses context-free grammars and their role in programming language syntax, detailing various types of parsers including top-down and bottom-up methods. It addresses common programming errors, error recovery strategies, and the differences between context-free grammars and regular expressions. Additionally, it covers concepts like FIRST and FOLLOW sets, LL(1) grammars, and parsing techniques such as recursive-descent and shift-reduce parsing.

Uploaded by

paled27319
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Context-Free Grammars

• Precise syntactic specifications of a


programming language
• For some classes, we can construct
automatically an efficient parser
• Allows a language to evolve
The Parser
The Parser

Three general types of parsers

Universal parsing methods:


• can parse any grammars
• too inefficient to use in production compilers
The Parser

Three general types of parsers

Top-down methods:
• Parse-trees built from root to leaves.
• Input to parser scanned from left to right one symbol at a time
The Parser

Three general types of parsers

Bottom-up methods:
• Start from leaves and work their way up to the root.
• Input to parser scanned from left to right one symbol at a time
Dealing With Errors
If compiler had to process only correct programs, its
design and implementation would be simplified greatly!

• Few languages have been designed with


error handling in mind.
• Error handling is left to compiler designer.
• Bugs caused about 50% of the total cost,
same as they used to be 50 years ago!
Common Programming Errors
• Lexical errors: misspellings of
identifiers, keywords, or operators
• Syntactic errors: misplaced semicolons,
extra or missing braces, case without
switch, … .
• Semantic errors: type mismatches
between operators and operands
• Logical errors: anything else!
Wish List
• Report the presence of errors clearly
and accurately
• Recover from each error quickly enough
to detect subsequent errors
• Add minimal overhead to the processing
of correct programs

Easier said than done!


Error-Recovery Strategies
• Simplest: quit with an informative error
message when detecting the first error
• Panic-mode Recovery: discards input
symbols one at a time until a designated
synchronizing tokens is found.
• Phrase-level Recovery: perform local
correction on the remaining input. The
choice of local correction is left to the
compiler designer.
• Error Production: production rules for
common errors.
Context-Free Grammar

Terminals Nonterminals
(token name)
Example:

Start Productions
Symbol
Derivations
• Starting with start symbol
• At each step: a nonterminal replaced
with the body of a production

Example:

Deriving: -(id + id)


More on Derivations
means derive in one step

means derive in zero or more steps

means derive in one or more steps

Leftmost derivations, the leftmost nonterminal in each sentential is always


chosen.

Rightmost derivations, the rightmost nonterminal in each sentential is


always chosen.
Example
For the context-free grammar:
Parse Trees
• What is the relationship between a
parse-tree and derivations?
– Parse tree is the graphical representation
of derivations
– Filters out order of nonterminal
replacement
– many-to-one relationship between
derivations and parse-tree
Context-Free Grammar Vs
Regular Expressions
• Grammars are more powerful notations than
regular expressions
– Every construct that can be described by a regular
expression can be described by a grammar, but not
vice-versa

Regular expression -> NFA then:


(a|b)*abb
Question Worth Asking
If grammars are much powerful than regular
expressions, why not using them in lexical
analysis too?
• Lexical rules are quite simple and do not
need notation as powerful as grammars
• Regular expressions are more concise and
easier to understand for tokens
• More efficient lexical analyzers can be
generated from regular expressions than
from grammars
How Can We Enhance Our
Grammar?
• Eliminating ambiguity
• Eliminating left-recursion
• Left factoring
Eliminating Ambiguity
Sometimes we can re-write grammar to
eliminate ambiguity
Eliminating Left-Recursion

How about something like:


Left-Factoring
• A way of delaying the decision until
more info is available

Example:

stmt -> EXP else stmt | EXP


EXP -> if expr then stmt
Top-Down Parsing
• Constructing a parse tree for an input
string starting from root
• Parse tree built in preorder (depth-first)
• Finding left-most derivation
• At each step of a top-down parse:
– determine the production to be applied
– matching terminal symbols in production
body with input string
Given: and:
Recursive-Descent Parsing
How?
Example of Backtracking
and input
Important Concepts:
FIRST and FOLLOW
Example
FIRST FOLLOW

( id )$
+ε )$

( id +)$

*ε +)$

( id *+)$
LL(1) Grammars
• For recursive-descent parsers with no
backtracking
• L = scan from left to right
• L = left-most derivation
• 1 symbol lookahead
• Cannot be left-recursive or ambiguous
• If A-> F | T
– FIRST(F) and FIRST(T) are disjoint
– if ε is in FIRST(T) then FIRST(F) and FOLLOW(A)
are disjoint … and likewise when ε is in FIRST(F)
Parsing Table
Parsing Table
• Two dimensional array
– Rows: nonterminals Columns: input symbols
• M[A,a] where A is nonterminal and a is terminal
or $
• Gives the production rule to use.
First Follow
( id )$
+ε )$
( id +)$
*ε +)$
( id *+)$
Exercise
For the following productions:

S-> +SS | * SS | a

• Write predictive parser


• Write parsing table
• Show how to parse: +*aaa
Bottom-Up Parsing
• Given a string of terminals
• Build parse tree starting from leaves
and working up toward the root
• reverse of right-most derivation
• Used for type of grammars called LR
• LR parsers are difficult to build by hand
• We use automatic parser generators for
LR grammars
Given: and the string:
Shift-Reduce Parsing
• Form of bottom-up parsing
• Consists of:
– Stack: holds grammar symbols
– input buffer: holds the rest of the string to be
parsed
• Handle always appears on the top of the stack

Initial position: Final position (success)

Actions: shift, reduce, accept, error


Exercise
Let’s apply shift-reduce to the following
input: 00S11
and the following productions:
S-> 0S1 | 01

You might also like