0% found this document useful (0 votes)
20 views136 pages

CD Module 1 Cambridge

The document outlines the objectives and phases of compiler design, emphasizing the importance of language processors in translating high-level programming languages into machine-executable code. It details the various components of a compiler, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation, along with the role of tools in compiler construction. Additionally, it discusses the evolution of programming languages and the qualities and applications of effective compilers.

Uploaded by

someshsomu7704
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views136 pages

CD Module 1 Cambridge

The document outlines the objectives and phases of compiler design, emphasizing the importance of language processors in translating high-level programming languages into machine-executable code. It details the various components of a compiler, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation, along with the role of tools in compiler construction. Additionally, it discusses the evolution of programming languages and the qualities and applications of effective compilers.

Uploaded by

someshsomu7704
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

COMPILER DESIGN-BCS613C

Department of Computer Science & Engineering

www.cambridge.edu.in
Course objectives

●Understand the working of language processors


● Apply different phases of designing a compiler
● Illustrate lexical analysis
● Explain the need of real time operating system for embedded
system applications.
INTRODUCTION

Importance of Programming Languages


• Used to describe computations for both humans and machines.
• Essential for all software running on computers.
Need for Translation
• Programs must be converted into an executable form.
• Compilers handle this translation process.
What is a Compiler?

• A software system that converts high-level code into machine-executable


code.
• Enables program execution on a computer.
If the target program is an executable
machine-language program, it can then be
called by the user to process inputs and
produce outputs.
• What it does: A compiler takes your entire program (written in a
high-level language like C++ or Java) and translates it into machine
code (a low-level language that the computer understands) all at once.
How it works:
• You write your code (source code).
• The compiler analyzes the entire code for errors. If there are any
errors, it reports them all to you.
• If there are no errors, the compiler generates an executable file
(machine code).
• You can then run this executable file directly on your computer.
What it does:

An interpreter translates and executes your program code line by line.


It reads a line of code, translates it into machine code, and then
executes it immediately.

How it works:
1. You write your code (source code).
2. The interpreter reads a line of code.
3. It translates that line into machine code.
4. It executes that line of code.
5. It repeats steps 2-4 for every line of code in your program.
Compilation (Compiler - javac)
•Java source code (.java file) is compiled by the Java Compiler
(javac) into an intermediate code called Bytecode (.class file).
•This Bytecode is platform-independent and not directly executed by
the CPU.

Interpretation (Interpreter - JVM)


• The Java Virtual Machine (JVM) interprets the Bytecode and
executes it on the specific operating system using the
Just-In-Time (JIT) Compiler.
• The JVM translates the Bytecode into machine code at runtime,
making it adaptable to different platforms.
Example

public class Example { public static void main(String[] args)


{ System.out.println("Java is both compiled and
interpreted!");
}}
Execution Process:
1.Compilation (javac Example.java)
•Converts Example.java into Example.class (Bytecode).
2.Interpretation (java Example)
•The JVM reads Example.class, interprets it, and executes the
program.
This combination of compilation and interpretation makes Java
platform-independent (Write Once, Run Anywhere - WORA).
A HYBRID COMPILER
•Eclipse
•IntelliJ IDEA
•NetBeans
•VS Code (with Java extension)
The language-
processing
system
•Compiler – Translates high-level source code into assembly
or machine code.
•Assembler – Converts assembly language code into object
code (machine-readable).
•Linker – Combines multiple object files and libraries into a
single executable.
•Loader – Loads the executable into memory for execution.
1.Preprocessor
•The preprocessor is responsible for handling directives before actual compilation
begins.
•It collects source code from multiple files and expands macros (shorthand notations)
into full source language statements.
•Example (C language):
#define PI 3.14
int main() { float area = PI * 10 * 10; }
• The preprocessor replaces PI with 3.14 before passing the code to the
compiler.
2.Compiler
• The compiler translates the modified source code into an
intermediate language, often assembly language.
• It performs syntax checking, optimization, and translation.
• Example: GCC (GNU Compiler Collection) for C/C++.
3.Assembler
•The assembler converts the assembly language code
into relocatable machine code (object files, .o or
.obj).
•Example: NASM (Netwide Assembler) for x86
assembly code.
. 4. Linker
•The linker takes multiple object files (.o or .obj)
and resolves references between them.
•It also links necessary library files (like stdio.h
in C).
5. Loader
• The loader loads the final executable program into memory for
execution.
• It prepares the program for running by resolving addresses and
initializing the environment.
• Example: Operating system’s program loader (like Windows PE
Loader, Linux ELF Loader).
THE STRUCTURE OF A COMPILER

• Compiler as a Two-Part System


• A compiler is not just a single entity; it consists of analysis and
synthesis phases.

COMPILER

analysis synthesis
Analysis (Front End)

• Breaks down the source code into its components.


• Imposes grammatical structure on the code.
• Generates an intermediate representation (IR) of the source
program.
• Detects errors (syntactic or semantic) and provides messages for
correction.
• Creates a symbol table that stores information about variables,
functions, etc.
Synthesis (Back End)

•Takes the IR and symbol table from the


analysis phase.
•Generates the final target program (machine
code or assembly).
Front End Back End
(Analysis Part): (Synthesis Part):
Lexical analysis, Optimization, code
syntax analysis, generation, and
semantic analysis. code assembly.
PHASES OF
COMPILER
1.2.1 Lexical Analysis

• The First phase of a compiler is called lexical analysis or scanning.


• The lex-ical analyzer reads the stream of characters making up the
source program and groups the characters into meaningful sequences
called lexemes.
• For each lexeme, the lexical analyzer produces as output a token of
the form
• <token-name,attribute-value>
•Token-name: An abstract symbol representing the type of the token
(e.g., IDENTIFIER, KEYWORD, NUMBER, OPERATOR). Used in
syntax analysis (parsing).
•Attribute-value: A reference (or pointer) to an entry in the symbol
table, which holds additional information about the token (e.g., variable
names, data types, memory locations).
1.2.2 Syntax Analysis/Parsing:

• The second phase of the compiler.


• Utilizes the first components of the tokens produced by the lexical
analyzer.
• Creates a tree-like intermediate representation, typically a syntax
tree.
• The syntax tree depicts the grammatical structure of the token
stream.
Operation Order:
•First, multiply the value of rate by 60.
•Then, add the result to the value of initial.
•Finally, store the result into position.
1.2.3 Semantic Analysis

•Uses the syntax tree and symbol table to check for semantic
consistency with the language definition.
•Gathers and saves type information in the syntax tree or
symbol table.
•Type checking ensures each operator has matching operands.
For instance, an array index must be an integer, not a
floating-point number.
• Type Conversions (Coercions):
• Some languages permit type conversions, like converting integers to
floating-point numbers when needed.
• For example, a binary arithmetic operator might be applied to either
two integers or two floating-point numbers. If mixed, the compiler
may convert the integer to a floating-point number.
•Variables position, initial, and rate are floating-point numbers.
•The lexeme 60 is an integer.
•The type checker discovers rate (a floating-point number) and 60 (an integer)
are operands of a multiplication operator.
•The integer 60 may be converted to a floating-point number.
•The output of the semantic analyzer includes an extra node for the inttofloat
operator, explicitly converting the integer to a floating-point number.
1.2.4 Intermediate Code Generation
• Intermediate Representations (IR):
• Compilers may construct one or more intermediate representations
while translating a source program into target code.
• Syntax trees are a common form of IR, used during syntax and
semantic analysis.
• Post Syntax and Semantic Analysis:
• Many compilers generate a low-level, machine-like intermediate
representation.
• This representation can be thought of as a program for an abstract
machine.
•Ease of Production: It should be straightforward for the compiler to generate.
•Ease of Translation: It should be easy to translate into the target machine code.

•t1 = inttofloat(60)
•t2 = id3 * t1
•t3 = id2 + t2
•id1 = t3
Intermediate code generation
• Three-address instructions:
- Each three-address assignment instruction has at most
one operator on the right side.
- The compiler must generate a temporary name to hold
the value computed by a three-address instruction.
- Some "three-address instructions" have fewer than
three operands.

11-Mar-25 39

Department of Computer Science & Engineering www.cambridge.edu.i


1.2.5. Code Optimisation
• The goal is to improve the intermediate code and, thus, the
effectiveness of code generation and the performance of the
target code.
- faster, shorter code, target code consumes less power.

11-Mar-25 40

Department of Computer Science & Engineering www.cambridge.edu.i


1.2.6.Code Generation Phase
• The code generator takes as input an intermediate
representation of the source program and maps it into the
target language.
• If the target language is machine code, registers or memory
locations are selected for each of the variables used by the
program.
• Then, the intermediate instructions are translated into
sequences of machine instructions that perform the same
task.
• A crucial aspect of code generation is the judicious
11-Mar-25 41
assignment of registers to hold variables.
Department of Computer Science & Engineering www.cambridge.edu.i
Code Generation Phase

11-Mar-25 42

Department of Computer Science & Engineering www.cambridge.edu.i


Phases of a Compiler

Department of Computer Science & Engineering www.cambridge.edu.i


Mapping of Lexemes into Tokens

• position – lexeme • rate – lexeme


token - <id,1> token - <id,3>
• = – lexeme • * – lexeme
token - <=> token - <*>
• initial – lexeme • 60 – lexeme
token - <id,2> token - <60>
• + – lexeme
token - <+>
Department of Computer Science & Engineering www.cambridge.edu.i
Translation of an Assignment Statement

Department of Computer Science & Engineering www.cambridge.edu.i


Translation of an Assignment Statement

Department of Computer Science & Engineering www.cambridge.edu.i


Symbol Table Management
• An essential function of a compiler is to record the
variable names used in the source program and collect
information about various attributes of each name.
• These attributes may provide information about
- the storage allocated for a name,
- its type,
- its scope (where in the program its value may be used), and
- in the case of procedure names, such things as the number and
types of its arguments, the method of passing each argument
(for example, by value or by reference), and the type returned.
11-Mar-25 47

Department of Computer Science & Engineering www.cambridge.edu.i


1.2.7. Symbol Table Management
• The symbol table is a data structure containing a record
for each variable name, with fields for the attributes of
the name.
• The data structure should be designed to allow the
compiler to find the record for each name quickly and to
store or retrieve data from that record quickly.

11-Mar-25 48

Department of Computer Science & Engineering www.cambridge.edu.i


Symbol Table Management

11-Mar-25 49

Department of Computer Science & Engineering www.cambridge.edu.i


1.2.8.Grouping of Phases into Passes

11-Mar-25 50

Department of Computer Science & Engineering www.cambridge.edu.i


1.2.9 Compiler-Construction Tools
Compiler-Construction tools include
• Parser generators that automatically produce syntax analyzers from a grammatical description of a
programming language.
• Scanner generators that produce lexical analyzers from a regular-expres- sion description of the tokens of
a language.
• Syntax-directed translation engines that produce collections of routines for walking a parse tree and
generating intermediate code.
• Code-generator generators that produce a code generator from a collection of rules for translating each
operation of the intermediate language into the machine language for a target machine.
• Data-flow analysis engines that facilitate the gathering of information about how values are transmitted
from one part of a program to each other part. Data-flow analysis is a key part of code optimization.
• Compiler-construction toolkits that provide an integrated set of routines for constructing various phases of
a compiler.
1.3 The Evolution of Programming Languages
Emphasis of compiler construction research:
• 1945-1960: code generation
- need to “prove” that high-level programming can produce efficient
code (“automatic programming”).
• 1960-1975: parsing
- proliferation of programming languages
- study of formal languages reveals powerful techniques.
• 1975-...: code generation and code optimisation
Knuth (1962) observed that “in this field there has been an unusual amount
of parallel discovery of the same technique by people working independently”
11-Mar-25 52

Department of Computer Science & Engineering www.cambridge.edu.i


Historical Notes: the Move to
Higher-Level Programming Languages

• Machine Languages (1st generation)


• Assembly Languages (2nd generation) – early 1950s
• High-Level Languages (3rd generation) – later 1950s
• 4th generation higher level languages (SQL, Postscript)
• 5th generation languages (logic based, eg, Prolog)
• Other classifications:
- Imperative (how); declarative (what)
- Object-oriented languages
- Scripting languages
11-Mar-25 53

Department of Computer Science & Engineering www.cambridge.edu.i


Examples
• C is typically compiled
• Lisp is typically interpreted
• Java is compiled to bytecodes, which are then interpreted
Also:
• C++ to Intel Core 2/.../Assembly
• C++ to C
• High Performance Fortran (HPF) to Fortran (parallelising
compiler)
• C to C (or any language to itself)

Department of Computer Science & Engineering www.cambridge.edu.i


Examples
In the general sense:
• What is LaTeX?
• What is ghostview?
(PostScript is a language for describing images)

Department of Computer Science & Engineering www.cambridge.edu.i


1.4 The Science of Building a Compiler
• 1.4.1 Modeling in Compiler Design and Implementation
• 1.4.2 The Science of Code Optimization
Compiler optimizations must meet the following design objectives:
• The optimization must be correct, that is, preserve the meaning of the
• compiled program,
• The optimization must improve the performance of many programs,
• The compilation time must be kept reasonable, and
• The engineering effort required must be manageable.
Qualities of a Good Compiler
What qualities would you want in a compiler?
- generates correct code (first and foremost!)
- generates fast code
- conforms to the specifications of the input language
- copes with essentially arbitrary input size, variables, etc.
- compilation time (linearly)proportional to size of source
- good diagnostics
- consistent optimisations
- works well with the debugger

Department of Computer Science & Engineering www.cambridge.edu.i


1.5 Applications of Compiler Technology

• 1.5.1 Implementation of High-Level Programming Languages


The key ideas behind object orientation are
1. Data abstraction and
2. Inheritance of properties,
1.5.2 Optimizations for Computer Architectures
• Parallelism
• Memory Hierarchies
1.5.3 Design of New Computer Architectures
• RISC
• Specialized Architectures
1.5.4 Program Translations
• Binary Translation
• Hardware Synthesis
• Database Query Interpreters
• Compiled Simulation
1.5.5 Software Productivity Tools
• Type Checking
• Bounds Checking
1.6 Programming Language Basics
• 1.6.1 The Static/Dynamic Distinction
• 1.6.2 Environments and States

1. The environment is a mapping from names to locations in the store. Since variables refer to locations
(\l-values" in the terminology of C), we could alternatively dene an environment as a mapping from names to
variables.
2. The state is a mapping from locations in store to their values. That is, the state maps l-values to their
corresponding r-values, in the terminology of C.
The environment and state mappings in Fig. 1.8 are dynamic, but there
are a few exceptions:
1. Static versus dynamic binding of names to locations
2. Static versus dynamic binding of locations to values.
•1.6.3 Static Scope and Block
Structure
•1.6.4 Explicit Access Control
•1.6.5 Dynamic Scope
•1.6.6 Parameter Passing
Mechanisms-Call-by-Value,
Call-by-Reference, Call-by-Name
1.6.7 Aliasing

• Example 1.9 : Suppose a is an array belonging to a procedure p, and p


calls another procedure q(x; y) with a call q(a; a).
• Suppose also that parameters are passed by value, but that array names
are really references to the location where the array is stored, as in C
or similar languages.
• Now, x and y have become aliases of each other. The important point
is that if within q there is an assignment x[10] = 2, then the value of
y[10] also becomes 2.
Principles of Compilation

The compiler must:


• preserve the meaning of the program being compiled.
• “improve” the source code in some way.
Other issues (depending on the setting):
• Speed (of compiled code)
• Space (size of compiled code)
• Feedback (information provided to the user)
• Debugging (transformations obscure the relationship source
code vs target)
• Compilation time efficiency (fast or slow compiler?)

Department of Computer Science & Engineering www.cambridge.edu.i


Why study Compilation Technology?

• Success stories (one of the earliest branches in CS)


Applying theory to practice (scanning, parsing, static analysis)
Many practical applications have embedded languages (eg, tags)
• Practical algorithmic & engineering issues:
Approximating really hard (and interesting!) problems Emphasis on
efficiency and scalability Small issues can be important!

Department of Computer Science & Engineering www.cambridge.edu.i


Why study Compilation Technology?
• Ideas from different parts of computer science are involved:
AI: Heuristic search techniques; greedy algorithms - Algorithms:
graph algorithms - Theory: pattern matching - Also: Systems,
Architecture
• Compiler construction can be challenging and fun:
new architectures always create new challenges; success requires
mastery of complex interactions; results are useful; opportunity to
achieve performance.

Department of Computer Science & Engineering www.cambridge.edu.i


Uses of Compiler Technology
• Most common use: translate a high-level program to object code
- Program Translation: binary translation, hardware synthesis, …
• Optimizations for computer architectures:
- Improve program performance, take into account hardware parallelism,
etc…
• Automatic parallelisation or vectorisation
• Performance instrumentation: e.g., -pg option of cc or gcc
• Interpreters: e.g., Python, Ruby, Perl, Matlab, sh, …

Department of Computer Science & Engineering www.cambridge.edu.i


Uses of Compiler Technology

• Software productivity tools


- Debugging aids: e.g, purify
• Security: Java VM uses compiler analysis to prove “safety” of Java
code.
• Text formatters, just-in-time compilation for Java, power
management, global distributed computing, …

Key: Ability to extract properties of a source program (analysis) and


transform it to construct a target program (synthesis)

Department of Computer Science & Engineering www.cambridge.edu.i


Summary
• A compiler is a program that converts some input text in a
source language to output in a target language.
• Compiler construction poses some of the most challenging
problems in computer science.

Department of Computer Science & Engineering www.cambridge.edu.i


Chapter 2
2.1 Introduction
• The analysis phase of a compiler breaks up a source program into
constituent pieces and produces an internal representation for it,
called intermediate code.
• The synthesis phase translates the intermediate code into the target
program.
• Analysis is organized around the \syntax" of the language to be
compiled.
• The syntax of a programming language describes the proper form of
its pro- grams, while the semantics of the language defines what its
programs mean; that is, what each program does when it executes.
For specifying syntax, we present a widely used notation, called
context-free grammars or BNF (for Backus-Naur Form)
❑ Two forms of intermediate code are illustrated in Fig. 2.4.
❑ One form, called abstract syntax trees or simply syntax trees, represents the hierarchical syntactic structure of
the source program.
❑ The left child of the root represents the body of the loop, which consists of only the assignment i=i+1;.
❑ The right child of the root represents the condition a[i] < v.
❑ This form of intermediate code takes its name from instructions of the form x = y op z, where op is a binary
operator, y and z are the addresses for the operands, and x is the address for the result of the operation.
❑ A three-address instruction carries out at most one operation, typically a computation, a comparison, or a
branch.
2.2 SYNTAX DEFINITION
• A grammar naturally describes the hierarchical structure of most programming
language constructs. For example, an if-else statement in Java can have the
form.
• if ( expression ) statement else statement
• stmt 🡪if ( expr ) stmt else stmt
• The arrow may be read as \can have the form." Such a rule is called a
production.
• In a production, lexical elements like the keyword if and the parentheses are
called terminals.
• Variables like expr and stmt represent sequences of terminals and are called
nonterminals.
2.2.1 Definition of Grammars
A context-free grammar has four components:
• 1. A set of terminal symbols, sometimes referred to as \tokens." The
terminals are the elementary symbols of the language defined by the
grammar.
• 2. A set of nonterminals, sometimes called \syntactic variables." Each
non-terminal represents a set of strings of terminals, in a manner we
shall describe.
• 3. A set of productions, where each production consists of a
nonterminal, called the head or left side of the production, an arrow,
and a sequence of terminals and/or nonterminals, called the body or
right side of the production.
• 4. A designation of one of the non-terminals as the start symbol.

i. The digits, signs such as < and <=, and boldface strings such as
while are terminals.
ii. An italicized name is a nonterminal, and any nonitalicized name or
symbol may be assumed to be a terminal.
iii. For notational convenience, productions with the same
nonterminal as the head can have their bodies grouped, with the
alternative bodies separated by the symbol , which we read as “or."
• Example 2.1 : Several examples in this chapter use expressions
consisting of digits and plus and minus signs; e.g., strings such as
9-5+2, 3-1, or 7. Since a plus or minus sign must appear between two
digits, we refer to such expressions as “lists of digits separated by plus
or minus signs." The following grammar describes the syntax of these
expressions. The productions are:
• list 🡪list + digit
• list 🡪list - digit
• list 🡪digit
• digit 🡪 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
• The bodies of the three productions with nonterminal list as head
equiva-lently can be grouped:
• list 🡪list + digit | list - digit | digit
• According to our conventions, the terminals of the grammar are the
symbols
•+ - 0 1 2 3 4 5 6 7 8 9
• The non-terminals are the italicized names list and digit, with list
being the start symbol because its productions are given first.
• We say a production is for a nonterminal if the nonterminal is the
head of the production. A string of terminals is a sequence of zero or
more terminals.
• The string of zero terminals, written as €, is called the empty string.
2.2.2 Derivations

• A grammar generates strings in a language by starting with a start


symbol and applying production rules to replace non-terminals until
only terminals remain.
• The set of terminal strings that can be derived from the start symbol
forms the language defined by the grammar.
• For example, we can deduce that 9-5+2 is a list as follows.
• a) 9 is a list by production (2.3), since 9 is a digit.
• b) 9-5 is a list by production (2.2), since 9 is a list and 5 is a digit.
• c) 9-5+2 is a list by production (2.1), since 9-5 is a list and 2 is a digit.
Productions:
•Start Symbol:
Params
•Production Rules:
•Params→(ParamList
•ParamList→Param
•ParamList→Param,ParamList
•Param→Identifier
•ParamList→ϵ (for an empty parameter list)
Explanation:
•The function call starts with ( and ends with ).
•Inside, there can be:
∙A single parameter (Param).
∙Multiple parameters, separated by commas (Param , ParamList).
∙An empty list (ε), meaning no parameters.
•Each Param is an identifier (variable name, like x, y, etc.).
Examples of Derived Strings:
1.( ) → Empty parameter list
2.( x ) → Single parameter
3.( x, y ) → Two parameters
4.( a, b, c ) → Three parameters
This grammar successfully models function calls with or without parameters in Java.
2.2.3 Parse Trees

• A parse tree pictorially shows how the start symbol of a grammar


derives a string in the language. If nonterminal A has a production A !
XY Z, then a parse tree may have an interior node labeled A with
three children labeled X, Y , and Z, from left to right:
• Formally, given a context-free grammar, a parse tree according to the
gram-
• mar is a tree with the following properties:
• 1. The root is labeled by the start symbol.
• 2. Each leaf is labeled by a terminal or by empty set.
• 3. Each interior node is labeled by a nonterminal.
• 4. If A is the nonterminal labeling some interior node and
X1,X2,……Xn are the labels of the children of that node from left to
right, then there must be a production A 🡪X1,X2,…..Xn. Here,
X1;X2,……..Xn each stand for a symbol that is either a terminal or a
nonterminal. As a special case, if A 🡪€ is a production, then a node
labeled A may have a single child labeled €.
2.2.4 Ambiguity
• A grammar is ambiguous if there exists a string that has more than
one parse tree (i.e., it can be derived in multiple ways).
• This leads to multiple possible meanings, which is problematic for
applications like compilers.
• Left most Derivation and right most derivation .
Example s🡪s+s|s-s|a|c
Input:a-b+c
• A CFG is said to ambiguous if there exists more than one derivation
tree for the given input string
• There is NO standard method to check ambiguity.
Condition when I will tell it is
AMBIGUITY
1. When we have 2 LMD
2. When we have 2 RMD
3. When we have 2 parse tree
Example :E🡪E+E|E*E | id

• Lets solve this using LMD • Lets solve this using LMD
E🡪 E+E E🡪 E*E
🡪id+E 🡪E+E*E
🡪id+E* E 🡪id*E*E
🡪id+id*id 🡪id+id*E
🡪id+id*id
2.2.5 Associativity of Operators
• Associativity is an important concept in arithmetic and programming. It dictates how operators of
the same precedence are grouped in the absence of parentheses.
Left-Associative Operators Right-Associative Operators
Most arithmetic operators like addition (+), subtraction (-), Some operators, such as exponentiation (^ or ** in
multiplication (*), and division (/) are left-associative. This many programming languages) and assignment (= in
means that in an expression with multiple instances of these C and its descendants), are right-associative. This
operators, the operations are performed from left to right. means that in an expression with multiple instances
of these operators, the operations are performed
from right to left.
Operator Precedence
• In arithmetic, operators with higher precedence are evaluated first. In
most programming languages, multiplication (*) and division (/) have
higher precedence than addition (+) and subtraction (-).
• This means that in expressions involving both types of operators, the
multiplication and division operations are performed before addition and
subtraction.
• For the expression 9 + 5 * 2:
•Here, * has higher precedence than +, so 5 * 2 is calculated first.
•This expression is equivalent to 9 + (5 * 2), resulting in 9 + 10, which
equals 19.
Precedence Table
The operators are categorized by their precedence and associativity:
•Left-Associative Operators:
•Low Precedence: +, -
•High Precedence: *, /
The grammar is constructed using three nonterminal symbols: expr, term, and factor.
•Factor: Represents basic units in expressions (digits or parenthesized expressions).
factor -> digit | ( expr )
•Term: Represents elements of higher precedence (* and /). It can consist of factors separated by these
operators.
term -> term * factor
| term / factor
| factor
•Expression (Expr): Represents elements of lower precedence (+ and -). It can consist of terms separated
by these operators.
expr -> expr + term
| expr - term
| term
Complete Grammar
expr -> expr + term
| expr - term
| term

term -> term * factor


| term / factor
| factor

factor -> digit | ( expr )


What language is generated by the
following grammars? In each case justify
your answer.
•S -> + S S | - S S | a
1.S -> a:
2.S -> + S S -> + a a:
3.S -> - S S -> - a a:
4.S -> + S S -> + - S S a -> + - a a a:
5.S -> - S S -> - + S S a -> - + a a a
L = {Prefix expression consisting of plus and minus
signs}
2.3 Syntax-Directed Translation

• Syntax-directed translation is done by attaching rules or program fragments to


productions in a grammar. For example, consider an expression expr generated by
the production
• expr ! expr1 + term
• Here, expr is the sum of the two subexpressions expr1 and term. We can
translate expr by exploiting its structure, as in the following pseudo-code:
translate expr1;
translate term;
handle +;
•Attributes – These are properties associated with programming constructs,
such as data types, instruction counts, or memory locations. Since grammar
symbols (terminals and nonterminals) represent these constructs, attributes
are assigned to these symbols in syntax analysis.
•Syntax-Directed Translation (SDT) Schemes – These define how program
fragments (semantic actions) are attached to grammar productions. When a
production is used during syntax analysis, the attached program fragments
execute. This execution process guides the translation of the input program.
2.3.1 Postfix Notation
The examples in this section deal with translation into postfix notation.
The postfix notation for an expression E can be defined inductively as
follows:
1. If E is a variable or constant, then the postfix notation for E is E itself.
2. If E is an expression of the form E1 op E2, where op is any binary
operator, then the postfix notation for E is E11 E12 op, where E11
and E12 are the postfix notations for E1 and E2, respectively.
3. If E is a parenthesized expression of the form (E1), then the postx
notation for E is the same as the postfix notation for E1.
• The postfix notation for (9-5)+2 is 95-2+. That is, the translations of 9,
5, and 2 are the constants themselves, by rule (1).
• Then, the translation of 9-5 is 95- by rule (2).
• The translation of (9-5) is the same by rule (3).
• Having translated the parenthesized subexpression, we may apply rule
(2) to the entire expression, with (9-5) in the role of E1 and 2 in the
role of E2, to get the result 95-2+.
• (1+9)+3+(6-2)=======1 9 + 3 + 6 2 - +
• No parentheses are needed in postfix notation, because the position
and parity (number of arguments) of the operators permits only one
decoding of a postfix expression.
The “trick" is to repeatedly scan the postfix string from the left, until
you find an operator.
• Then, look to the left for the proper number of operands, and group
this operator with its operands.
• Evaluate the operator on the operands, and replace them by the
result.
• Then repeat the process, continuing to the right and searching for
another operator.
Example 2.9

• Consider the postfix expression 952+-3*. Scanning from the left, we


first encounter the plus sign.
• Looking to its left we find operands 5 and 2.
• Their sum, 7, replaces 52+, and we have the string 97-3*.
• Now, the leftmost operator is the minus sign, and its operands are 9
and 7.
• Replacing these by the result of the subtraction leaves 23*. Last, the
multiplication sign applies to 2 and 3, giving the result 6.
2.3.2 Synthesized Attributes
1. Attributes are associated with grammar symbols (both non-terminals and
terminals).
2. Attribute rules are attached to productions in the grammar.
3. These rules specify how attributes are computed in the parse tree.

A syntax-directed definition associates


1. With each grammar symbol, a set of attributes, and
2. With each production, a set of semantic rules for computing the values of the
attributes associated with the symbols appearing in the production.
• What is a Synthesized Attribute?
• A synthesized attribute is an attribute whose value is computed from
the values of its children nodes in the parse tree.
• It moves from bottom to top in the parse tree (computed in a
postorder traversal).
• It is commonly used in bottom-up parsing techniques like LR parsing.
The val attribute is used to store and propagate values during syntax-directed translation.
It helps in computing results for expressions, which is essential for evaluation, code
generation, and optimization in compilers.

• E → E1 + T { E.val = E1.val + T.val }


•E → T { E.val = T.val }
• T → num { T.val = num.lexval }
E
/\
E +
/\ \
T 3 T
|
5
Syntax-Directed Definition (SDD) for Infix to Postfix Translation
Infix expressions are written with operators between operands (e.g., A + B), whereas postfix
expressions place operators after operands (e.g., A B +). We can define a syntax-directed
definition (SDD) to translate infix expressions to postfix using synthesized attributes.

• E → E1 + T { E.postfix = E1.postfix || T.postfix || "+" }


• E → E1 - T { E.postfix = E1.postfix || T.postfix || "-" }
•E→T { E.postfix = T.postfix }
• T → T1 * F { T.postfix = T1.postfix || F.postfix || "*" }
• T → T1 / F { T.postfix = T1.postfix || F.postfix || "/" }
•T→F { T.postfix = F.postfix }
• F → ( E ) { F.postfix = E.postfix }
• F → id { F.postfix = id.lexeme }
What is a Simple Syntax-Directed Definition (SSDD)?
A syntax-directed definition (SDD) specifies how to associate attributes
and semantic rules with grammar productions. If an SDD has the
following properties, it is called simple:
1. The translation of the left-hand side (LHS) nonterminal is the
concatenation of the translations of the right-hand side (RHS)
nonterminals.
2. The order of concatenation matches the order in the production rule.
3. There may be some additional symbols interleaved within the
translation.
2.3.4 Tree Traversals

Tree Traversal: The process of visiting each node in a tree in a specific


order.
Depth-First Traversal (DFS): Starts from the root and explores as deep
as possible before backtracking.
• The order of visiting child nodes can vary (not necessarily left to
right).
• It is called "depth-first" because it prioritizes depth over breadth.
2.3.5 Translation Schemes

• Pre-order
• In-order
• Post-order
2.4 Parsing
2.4 Parsing
• Parsing Definition:
• Parsing determines whether a given string of terminals (tokens) can
be generated by a grammar.
• It checks if the input conforms to the syntax rules defined by the
grammar.
•A parse tree represents the syntactic structure of the input
according to the grammar.
•Even if a compiler does not explicitly build a parse tree, it must
still be able to do so in principle to ensure correctness.
Expand the Nonterminal
•At node N, which corresponds to a nonterminal A, select a production rule A → α
from the grammar.
•Create child nodes for each symbol in the production body α.
•This step follows the grammar rules to build a parse tree from the root downward.
Select the Next Node for Expansion
•Typically, the next node to expand is chosen based on a depth-first, left-to-right
traversal.
•The leftmost nonterminal in the tree is expanded next, ensuring that the parse tree is
built in a structured order.
•For some grammars, the above steps can be implemented during a single
•left-to-right scan of the input string.
•The current terminal being scanned in the input is frequently referred to as the
lookahead symbol.
•Initially, the lookahead symbol is the first, i.e., leftmost, terminal of the input string.
Backtracking is not
needed
2.4.2 Predictive Parsing

• Recursive-descent parsing is a top-down method of syntax analysis in


which a set of recursive procedures is used to process the input.
• A simple form of recursive-descent parsing, called predictive parsing,
in which the lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
• Procedure match(t) compares its argument t with the lookahead
symbol and advances to the next input terminal if they match.
• Thus match changes the value of variable lookahead, a global
variable that holds the currently scanned input terminal.
• stmt 🡪 for ( optexpr ; optexpr ; optexpr ) stmt
FIRST Sets:
•Definition:
•For a string of grammar symbols (terminals and/or nonterminals) α, FIRST(α) is the set of
terminals that can appear at the beginning of any string derived from α.
•If α can derive the empty string (ε), then ε is also included in FIRST(α).
•Purpose:
•FIRST sets are essential for predictive parsing because they help determine which production to
apply based on the lookahead symbol.
•They allow the parser to "predict" which production to use without backtracking.
Connecting to the Pseudocode (Figure 2.19):
•stmt() Function:
•This function represents the nonterminal stmt.
•It uses a switch statement based on the lookahead symbol to choose which production to apply.
•The case labels (e.g., expr, if, for, other) correspond to the elements in FIRST(stmt) = {expr, if,
for, other}.
•The code within each case implements the corresponding production rule.
•How FIRST Sets are Used:
•The switch statement in stmt() directly uses the elements of FIRST(stmt) to
guide the parsing process.
•If the lookahead is expr, the parser knows to apply the production for stmt that
begins with expr.
•The if condition in optexpr() uses the first set of expr.
Example Breakdown:
•FIRST(stmt) = {expr, if, for, other}:
•This means that any statement can start with an expression (expr), an if
statement (if), a for loop (for), or some other statement (other).
•The switch statement in stmt() reflects this by having cases for each of these
possibilities.
•FIRST(expr) = {expr}:
•This means that an expression can only begin with an expr terminal.
•optexpr() Function:
•This function represents the optional expression Non-Terminal.
•It checks if the lookahead is in the FIRST(expr), and if it is, it calls match(expr).
otherwise, it does nothing, thus handling the optional aspect of the non-terminal.
•match(terminal t) Function:
•This function checks if the lookahead symbol matches the expected terminal t.
•If they match, it advances the lookahead to the next terminal.
•If they don't match, it reports a syntax error.
FIRST sets are a fundamental tool in predictive parsing.
They provide the information needed to make deterministic parsing decisions based on the lookahead symbol, enabling
efficient and error-free parsing for LL(1) grammars.
2.4.3 When to Use ɛ-Productions
• Our predictive parser uses an ɛ-production as a default when no other
production can be used.
• Handling optexpr

optexpr → expr
optexpr → ε
•If the lookahead symbol matches something in FIRST(expr), the parser applies optexpr → expr.
•If the lookahead symbol does not match FIRST(expr), the parser applies optexpr → ε, effectively doing
nothing.
2.4.4 Designing a Predictive Parser

•A predictive parser consists of procedures corresponding to each nonterminal in the


grammar.
Choosing the Right Production
•The parser selects a production based on the lookahead symbol.
•If a production body β exists where the lookahead symbol is in FIRST(β), that
production is chosen.
•If multiple productions share a FIRST set, the grammar is not suitable for predictive
parsing.
•If none of the FIRST(β) sets match the lookahead symbol, and an ε-production
exists, it is used.
Continued ……..

• The parser processes the symbols of the chosen production from left
to right: If a nonterminal appears, its corresponding procedure is
called.
• If a terminal appears, it must match the lookahead symbol,
otherwise a syntax error occurs.
• If the lookahead symbol does not match any expected terminal, a
syntax error is reported.
Continued ………………

Handling Translation Schemes


• If the grammar has embedded actions (used in syntax-directed
translation), these actions can be executed as part of the parsing
procedure.
• This allows predictive parsers to not only check syntax but also build
syntax trees or generate code.
Continued …………

When Predictive Parsing Fails


•If two productions for a nonterminal have overlapping
FIRST sets, the parser cannot decide which one to use.
•Such a grammar must be modified (factored or
transformed) to remove ambiguity before using predictive
parsing.
2.4.5 Left Recursion

• A production of grammar is said to have left recursion if the left most


variable of its RHS is same as a variable of its LHS.
• i.e A🡪 Aα|β
• Direct :: B🡪Ba
• Indirect :: S🡪Aa
• A🡪Sm
• WHY TO REMOVE ?
• Since my top down parsers cannot accept the grammar having Left
Recursion so we will remove(Language should not change )

You might also like