CD Module 1 Cambridge
CD Module 1 Cambridge
www.cambridge.edu.in
Course objectives
How it works:
1. You write your code (source code).
2. The interpreter reads a line of code.
3. It translates that line into machine code.
4. It executes that line of code.
5. It repeats steps 2-4 for every line of code in your program.
Compilation (Compiler - javac)
•Java source code (.java file) is compiled by the Java Compiler
(javac) into an intermediate code called Bytecode (.class file).
•This Bytecode is platform-independent and not directly executed by
the CPU.
COMPILER
analysis synthesis
Analysis (Front End)
•Uses the syntax tree and symbol table to check for semantic
consistency with the language definition.
•Gathers and saves type information in the syntax tree or
symbol table.
•Type checking ensures each operator has matching operands.
For instance, an array index must be an integer, not a
floating-point number.
• Type Conversions (Coercions):
• Some languages permit type conversions, like converting integers to
floating-point numbers when needed.
• For example, a binary arithmetic operator might be applied to either
two integers or two floating-point numbers. If mixed, the compiler
may convert the integer to a floating-point number.
•Variables position, initial, and rate are floating-point numbers.
•The lexeme 60 is an integer.
•The type checker discovers rate (a floating-point number) and 60 (an integer)
are operands of a multiplication operator.
•The integer 60 may be converted to a floating-point number.
•The output of the semantic analyzer includes an extra node for the inttofloat
operator, explicitly converting the integer to a floating-point number.
1.2.4 Intermediate Code Generation
• Intermediate Representations (IR):
• Compilers may construct one or more intermediate representations
while translating a source program into target code.
• Syntax trees are a common form of IR, used during syntax and
semantic analysis.
• Post Syntax and Semantic Analysis:
• Many compilers generate a low-level, machine-like intermediate
representation.
• This representation can be thought of as a program for an abstract
machine.
•Ease of Production: It should be straightforward for the compiler to generate.
•Ease of Translation: It should be easy to translate into the target machine code.
•t1 = inttofloat(60)
•t2 = id3 * t1
•t3 = id2 + t2
•id1 = t3
Intermediate code generation
• Three-address instructions:
- Each three-address assignment instruction has at most
one operator on the right side.
- The compiler must generate a temporary name to hold
the value computed by a three-address instruction.
- Some "three-address instructions" have fewer than
three operands.
11-Mar-25 39
11-Mar-25 40
11-Mar-25 42
11-Mar-25 48
11-Mar-25 49
11-Mar-25 50
1. The environment is a mapping from names to locations in the store. Since variables refer to locations
(\l-values" in the terminology of C), we could alternatively dene an environment as a mapping from names to
variables.
2. The state is a mapping from locations in store to their values. That is, the state maps l-values to their
corresponding r-values, in the terminology of C.
The environment and state mappings in Fig. 1.8 are dynamic, but there
are a few exceptions:
1. Static versus dynamic binding of names to locations
2. Static versus dynamic binding of locations to values.
•1.6.3 Static Scope and Block
Structure
•1.6.4 Explicit Access Control
•1.6.5 Dynamic Scope
•1.6.6 Parameter Passing
Mechanisms-Call-by-Value,
Call-by-Reference, Call-by-Name
1.6.7 Aliasing
i. The digits, signs such as < and <=, and boldface strings such as
while are terminals.
ii. An italicized name is a nonterminal, and any nonitalicized name or
symbol may be assumed to be a terminal.
iii. For notational convenience, productions with the same
nonterminal as the head can have their bodies grouped, with the
alternative bodies separated by the symbol , which we read as “or."
• Example 2.1 : Several examples in this chapter use expressions
consisting of digits and plus and minus signs; e.g., strings such as
9-5+2, 3-1, or 7. Since a plus or minus sign must appear between two
digits, we refer to such expressions as “lists of digits separated by plus
or minus signs." The following grammar describes the syntax of these
expressions. The productions are:
• list 🡪list + digit
• list 🡪list - digit
• list 🡪digit
• digit 🡪 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
• The bodies of the three productions with nonterminal list as head
equiva-lently can be grouped:
• list 🡪list + digit | list - digit | digit
• According to our conventions, the terminals of the grammar are the
symbols
•+ - 0 1 2 3 4 5 6 7 8 9
• The non-terminals are the italicized names list and digit, with list
being the start symbol because its productions are given first.
• We say a production is for a nonterminal if the nonterminal is the
head of the production. A string of terminals is a sequence of zero or
more terminals.
• The string of zero terminals, written as €, is called the empty string.
2.2.2 Derivations
• Lets solve this using LMD • Lets solve this using LMD
E🡪 E+E E🡪 E*E
🡪id+E 🡪E+E*E
🡪id+E* E 🡪id*E*E
🡪id+id*id 🡪id+id*E
🡪id+id*id
2.2.5 Associativity of Operators
• Associativity is an important concept in arithmetic and programming. It dictates how operators of
the same precedence are grouped in the absence of parentheses.
Left-Associative Operators Right-Associative Operators
Most arithmetic operators like addition (+), subtraction (-), Some operators, such as exponentiation (^ or ** in
multiplication (*), and division (/) are left-associative. This many programming languages) and assignment (= in
means that in an expression with multiple instances of these C and its descendants), are right-associative. This
operators, the operations are performed from left to right. means that in an expression with multiple instances
of these operators, the operations are performed
from right to left.
Operator Precedence
• In arithmetic, operators with higher precedence are evaluated first. In
most programming languages, multiplication (*) and division (/) have
higher precedence than addition (+) and subtraction (-).
• This means that in expressions involving both types of operators, the
multiplication and division operations are performed before addition and
subtraction.
• For the expression 9 + 5 * 2:
•Here, * has higher precedence than +, so 5 * 2 is calculated first.
•This expression is equivalent to 9 + (5 * 2), resulting in 9 + 10, which
equals 19.
Precedence Table
The operators are categorized by their precedence and associativity:
•Left-Associative Operators:
•Low Precedence: +, -
•High Precedence: *, /
The grammar is constructed using three nonterminal symbols: expr, term, and factor.
•Factor: Represents basic units in expressions (digits or parenthesized expressions).
factor -> digit | ( expr )
•Term: Represents elements of higher precedence (* and /). It can consist of factors separated by these
operators.
term -> term * factor
| term / factor
| factor
•Expression (Expr): Represents elements of lower precedence (+ and -). It can consist of terms separated
by these operators.
expr -> expr + term
| expr - term
| term
Complete Grammar
expr -> expr + term
| expr - term
| term
• Pre-order
• In-order
• Post-order
2.4 Parsing
2.4 Parsing
• Parsing Definition:
• Parsing determines whether a given string of terminals (tokens) can
be generated by a grammar.
• It checks if the input conforms to the syntax rules defined by the
grammar.
•A parse tree represents the syntactic structure of the input
according to the grammar.
•Even if a compiler does not explicitly build a parse tree, it must
still be able to do so in principle to ensure correctness.
Expand the Nonterminal
•At node N, which corresponds to a nonterminal A, select a production rule A → α
from the grammar.
•Create child nodes for each symbol in the production body α.
•This step follows the grammar rules to build a parse tree from the root downward.
Select the Next Node for Expansion
•Typically, the next node to expand is chosen based on a depth-first, left-to-right
traversal.
•The leftmost nonterminal in the tree is expanded next, ensuring that the parse tree is
built in a structured order.
•For some grammars, the above steps can be implemented during a single
•left-to-right scan of the input string.
•The current terminal being scanned in the input is frequently referred to as the
lookahead symbol.
•Initially, the lookahead symbol is the first, i.e., leftmost, terminal of the input string.
Backtracking is not
needed
2.4.2 Predictive Parsing
optexpr → expr
optexpr → ε
•If the lookahead symbol matches something in FIRST(expr), the parser applies optexpr → expr.
•If the lookahead symbol does not match FIRST(expr), the parser applies optexpr → ε, effectively doing
nothing.
2.4.4 Designing a Predictive Parser
• The parser processes the symbols of the chosen production from left
to right: If a nonterminal appears, its corresponding procedure is
called.
• If a terminal appears, it must match the lookahead symbol,
otherwise a syntax error occurs.
• If the lookahead symbol does not match any expected terminal, a
syntax error is reported.
Continued ………………