0% found this document useful (0 votes)
121 views41 pages

CD Unit 4

Compiler design unit 4 ipu

Uploaded by

anshgarg91036
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views41 pages

CD Unit 4

Compiler design unit 4 ipu

Uploaded by

anshgarg91036
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT – 4

Symbol Table

Symbol table is an important data structure used in a compiler.

Symbol table is used to store the information about the occurrence of various entities such as objects,
classes, variable name, interface, function name etc. it is used by both the analysis and synthesis phases.

The symbol table used for following purposes:

 It is used to store the name of all entities in a structured form at one place.
 It is used to verify if a variable has been declared.
 It is used to determine the scope of a name.
 It is used to implement type checking by verifying assignments and expressions in the source code
are semantically correct.

A symbol table can either be linear or a hash table. Using the following format, it maintains the entry for
each name.

1. <symbol name, type, attribute>

For example, suppose a variable store the information about the following variable declaration:

1. static int salary

then, it stores an entry in the following format:

1. <salary, int, static>

The clause attribute contains the entries related to the name.

Implementation

The symbol table can be implemented in the unordered list if the compiler is used to handle the small
amount of data.

A symbol table can be implemented in one of the following techniques:

 Linear (sorted or unsorted) list


 Hash table
 Binary search tree

Symbol table are mostly implemented as hash table.

Operations

The symbol table provides the following operations:

Insert ()

o Insert () operation is more frequently used in the analysis phase when the tokens are identified and
names are stored in the table.

o The insert() operation is used to insert the information in the symbol table like the unique name
occurring in the source code.

o In the source code, the attribute for a symbol is the information associated with that symbol. The
information contains the state, value, type and scope about the symbol.

o The insert () function takes the symbol and its value in the form of argument.

For example:
1. int x;

Should be processed by the compiler as:

1. insert (x, int)

lookup()

In the symbol table, lookup() operation is used to search a name. It is used to determine:

 The existence of symbol in the table.


 The declaration of the symbol before it is used.
 Check whether the name is used in the scope.
 Initialization of the symbol.
 Checking whether the name is declared multiple times.

The basic format of lookup() function is as follows:

1. lookup (symbol)

This format is varies according to the programming language.


Error detection and Recovery in Compiler

In this phase of compilation, all possible errors made by the user are detected and reported to the user in
form of error messages. This process of locating errors and reporting them to users is called the Error
Handling process.

Functions of an Error handler.


 Detection
 Reporting
 Recovery

Classification of Errors

Compile-time errors

Compile-time errors are of three types:-

Lexical phase errors


These errors are detected during the lexical analysis phase. Typical lexical errors are:

 Exceeding length of identifier or numeric constants.


 The appearance of illegal characters
 Unmatched string

Example 1 : printf("Geeksforgeeks");$
This is a lexical error since an illegal character $ appears at the end of statement.
Example 2 : This is a comment */
This is an lexical error since end of comment is present but beginning is not present

Error recovery for lexical errors:

Panic Mode Recovery

 In this method, successive characters from the input are removed one at a time until a designated set of
synchronizing tokens is found. Synchronizing tokens are delimiters such as; or }
 The advantage is that it is easy to implement and guarantees not to go into an infinite loop
 The disadvantage is that a considerable amount of input is skipped without checking it for additional
errors

Syntactic phase errors:


These errors are detected during the syntax analysis phase. Typical syntax errors are:

 Errors in structure
 Missing operator
 Misspelled keywords
 Unbalanced parenthesis

Example : swich(ch)
{
.......
.......
}

The keyword switch is incorrectly written as a swich. Hence, an “Unidentified keyword/identifier” error
occurs.

Error recovery for syntactic phase error:

1. Panic Mode Recovery

 In this method, successive characters from the input are removed one at a time until a designated set of
synchronizing tokens is found. Synchronizing tokens are deli-meters such as; or }
 The advantage is that it’s easy to implement and guarantees not to go into an infinite loop
 The disadvantage is that a considerable amount of input is skipped without checking it for additional
errors

2. Statement Mode recovery

 In this method, when a parser encounters an error, it performs the necessary correction on the remaining
input so that the rest of the input statement allows the parser to parse ahead.
 The correction can be deletion of extra semicolons, replacing the comma with semicolons, or inserting a
missing semicolon.
 While performing correction, utmost care should be taken for not going in an infinite loop.
 A disadvantage is that it finds it difficult to handle situations where the actual error occurred before
pointing of detection.

3. Error production

 If a user has knowledge of common errors that can be encountered then, these errors can be incorporated
by augmenting the grammar with error productions that generate erroneous constructs.
 If this is used then, during parsing appropriate error messages can be generated and parsing can be
continued.
 The disadvantage is that it’s difficult to maintain.

4. Global Correction

 The parser examines the whole program and tries to find out the closest match for it which is error-free.
 The closest match program has less number of insertions, deletions, and changes of tokens to recover
from erroneous input.
 Due to high time and space complexity, this method is not implemented practically.

Semantic errors
These errors are detected during the semantic analysis phase. Typical semantic errors are

 Incompatible type of operands


 Undeclared variables
 Not matching of actual arguments with a formal one

Example : int a[10], b;


.......
.......
a = b;

It generates a semantic error because of an incompatible type of a and b.

Error recovery for Semantic errors

 If the error “Undeclared Identifier” is encountered then, to recover from this a symbol table entry for
the corresponding identifier is made.
 If data types of two operands are incompatible then, automatic type conversion is done by the compiler.

Advantages:

Improved code quality: Error detection and recovery in a compiler can improve the overall quality of the
code produced. This is because errors can be identified early in the compilation process and addressed before
they become bigger issues.

Increased productivity: Error recovery can also increase productivity by allowing the compiler to continue
processing the code after an error is detected. This means that developers do not have to stop and fix every
error manually, saving time and effort.

Better user experience: Error recovery can also improve the user experience of software applications.
When errors are handled gracefully, users are less likely to become frustrated and are more likely to continue
using the application.

Better debugging: Error recovery in a compiler can help developers to identify and debug errors more
efficiently. By providing detailed error messages, the compiler can assist developers in pinpointing the
source of the error, saving time and effort.

Consistent error handling: Error recovery ensures that all errors are handled in a consistent manner, which
can help to maintain the quality and reliability of the software being developed.

Reduced maintenance costs: By detecting and addressing errors early in the development process, error
recovery can help to reduce maintenance costs associated with fixing errors in later stages of the software
development lifecycle.

Improved software performance: Error recovery can help to identify and address code that may cause
performance issues, such as memory leaks or inefficient algorithms. By improving the performance of the
code, the overall performance of the software can be improved as well.
Disadvantages:

Slower compilation time: Error detection and recovery can slow down the compilation process, especially
if the recovery mechanism is complex. This can be an issue in large software projects where the compilation
time can be a bottleneck.

Increased complexity: Error recovery can also increase the complexity of the compiler, making it harder to
maintain and debug. This can lead to additional development costs and longer development times.

Risk of silent errors: Error recovery can sometimes mask errors in the code, leading to silent errors that go
unnoticed. This can be particularly problematic if the error affects the behavior of the software application in
subtle ways.

Potential for incorrect recovery: If the error recovery mechanism is not implemented correctly, it can
potentially introduce new errors or cause the code to behave unexpectedly.

Dependency on the recovery mechanism: If developers rely too heavily on the error recovery mechanism,
they may become complacent and not thoroughly check their code for errors. This can lead to errors being
missed or not addressed properly.

Difficulty in diagnosing errors: Error recovery can make it more difficult to diagnose and debug errors
since the error message may not accurately reflect the root cause of the issue. This can make it harder to fix
errors and may lead to longer development times.

Compatibility issues: Error recovery mechanisms may not be compatible with certain programming
languages or platforms, leading to issues with portability and cross-platform development.

Code Optimization
Code optimization in compiler design refers to the process of improving the efficiency and performance of a
program by making modifications to its code. It involves analyzing the code and applying various techniques
to eliminate redundancies, reduce computational overhead, minimize memory usage, and optimize the
generated machine code. The goal is to produce optimized code that performs the same functionality as the
original code but with improved execution speed and resource utilization. Code optimization plays a crucial
role in enhancing the overall quality and performance of compiled programs.

Code optimization occurs during the compilation phase, where the compiler analyzes the code and applies
various transformations to optimize it.
Advantages of Code Optimization
Optimizing code offers several advantages, including:

 Faster Execution Speed: Optimized code executes more quickly, leading to improved program
responsiveness and user experience.
 Efficient Memory Utilization: Code optimization ensures efficient utilization of memory, reducing
resource consumption and enabling programs to run smoothly on various hardware platforms.
 Enhanced Performance: Optimized code performs better in terms of speed and efficiency, resulting in
overall improved program performance.

Code Optimization Techniques:


Code optimization techniques in compiler design are used to improve the efficiency, performance, and
quality of the generated machine code.

Here are some commonly used code optimization techniques:

1. Constant Folding
2. Dead Code Elimination
3. Common Subexpression Elimination
4. Loop Optimization
5. Register Allocation
6. Inline Expansion
7. Strength Reduction
8. Control Flow Optimization

1. Constant Folding
Constant folding is a powerful optimization technique that involves evaluating constant expressions at
compile time and replacing them with their computed results. By performing computations during
compilation rather than runtime, constant folding eliminates the need for repetitive calculations, resulting in
faster and more efficient code execution.

The technique focuses on simplifying and optimizing code by replacing the original expressions with their
resolved values.

Example:

int result = 2 + 3 * 4;

During constant folding, the expression "2 + 3 * 4" is evaluated at compile time, resulting in the value 14.
The optimized code would then replace the original expression with the computed result:
int result = 14;

By eliminating the need for runtime computations, constant folding improves code efficiency and enhances
the overall performance of the program.

2. Dead Code Elimination


Dead code elimination is a crucial optimization technique that involves identifying and removing code
segments that are unreachable or have no impact on the program's output. By eliminating dead code, the
compiler reduces the size of the executable and enhances code clarity and maintainability.

Example:

int x = 5;
if (x > 10) {
int y = x * 2;
}

In this case, the code inside the if statement is dead code since the condition is never true. During the dead
code elimination process, the compiler detects that the code block inside the if statement will never be
executed and removes it from the optimized code:

int x = 5;

By eliminating dead code, the compiler streamlines the program, reducing unnecessary computations and
improving execution efficiency.

3. Common Subexpression Elimination


Common subexpression elimination is an optimization technique that aims to identify and eliminate
redundant computations in a program. By recognizing expressions that have already been computed and
storing their results for reuse, common subexpression elimination reduces redundant calculations and
enhances code efficiency.

Example:

int a = b + c;
int d = b + c * 2;
In this case, the expression "b + c" is a common subexpression occurring twice. During common
subexpression elimination, the compiler detects this redundancy and assigns the result of "b + c" to a
temporary variable, which can then be reused in subsequent computations:

int temp = b + c;
int a = temp;
int d = temp * 2;

By eliminating redundant computations, common subexpression elimination reduces computational


overhead, leading to improved code efficiency and faster execution.

4. Loop Optimization
Loop optimization techniques focus on enhancing the efficiency of loops. Loop optimization aims to
optimize the execution of loops by minimizing the number of iterations, reducing unnecessary computations,
and improving memory access patterns. This optimization technique plays a crucial role in improving the
overall performance of programs that heavily rely on loop constructs.

Example:

You may also like...

 Input Buffering in Compiler Design


 Type Checking in Compiler Design
 Symbol Table in Compiler Design

int sum = 0;
for (int i = 0; i < n; i++) {
sum += i;
}

Loop optimization techniques, such as loop unrolling, can be applied to optimize this loop. Loop unrolling
involves duplicating the loop body to reduce loop control overhead and improve instruction-level
parallelism. After loop unrolling, the code would look like:

int sum = 0;
for (int i = 0; i < n; i += 2) {
sum += i;
sum += i + 1;
}
By reducing the number of iterations and improving instruction-level parallelism, loop optimization
techniques enhance the efficiency of loop execution and contribute to overall code optimization.

5. Register Allocation
Register allocation is the process of efficiently assigning variables to the limited number of CPU registers
available. By minimizing memory accesses and maximizing register usage, execution speed and resource
utilization can be improved.

Example:

int a = 5;
int b = 10;
int c = a + b;

During register allocation, the values of variables 'a' and 'b' can be stored in registers, and the computation
can be performed directly on the registers instead of accessing memory repeatedly.

6. Inline Expansion
Inline expansion involves replacing function calls with their actual code to eliminate the overhead of
function call and return. This technique is beneficial for small, frequently called functions.

Example: Consider a function that calculates the square of a number

int square(int x) {
return x * x;
}

By applying inline expansion, the function call square(5) can be replaced with the actual code 5 * 5,
eliminating the need for a function call.

7. Strength Reduction
Strength reduction is an optimization technique that aims to replace expensive or complex operations with
simpler and more efficient alternatives. By identifying opportunities to replace computationally expensive
operations with less costly ones, strength reduction improves code efficiency and execution speed.

Example:

int result = a * 8;

In this case, the multiplication operation with the constant 8 is computationally expensive. However, through
strength reduction, the compiler can replace the multiplication with a more efficient operation:
int result = a << 3;
In this optimized version, the shift-left operation by 3 bits achieves the same result as multiplying by 8 but
with fewer computational steps.

By replacing expensive operations with simpler alternatives, strength reduction reduces computational
overhead and improves code execution efficiency.

8. Control Flow Optimization


Control flow optimization is a technique in compiler design that improves the efficiency of control flow
structures in a program. It includes optimizations such as branch prediction, loop optimization, dead code
elimination, simplification of control flow graphs, and tail recursion elimination.

The main objective is to minimize the impact of conditional branches and loops on program performance.
By predicting branch outcomes, optimizing loops, removing dead code, simplifying control flow graphs, and
transforming tail recursion, the compiler enhances execution speed and resource utilization.

Example:

int x = 10;
int y = 20;
int z;

if (x > y) {
z = x + y;
} else {
z = x - y;
}
In this code, there is a conditional statement that checks if x is greater than y. Based on the condition, either
the addition or subtraction operation is performed, and the result is stored in variable z.

Through control flow optimization, the compiler can perform branch prediction and determine that the
condition x > y is always false. In this case, it knows that the code inside the if block will never be executed.

As a result, the compiler can optimize the code by eliminating the unused code block, resulting in the
following optimized code:

int x = 10;
int y = 20;
int z = x - y;
By removing the unnecessary conditional branch, the optimized code becomes simpler and more efficient.
This improves the program's execution speed and reduces any overhead associated with evaluating the
condition.

Loop optimization is a machine-independent optimization technique that increases a program's


execution speed by decreasing the overhead of loops. Since the most time of a compiler is spent executing
the loops, reducing the number of instructions in a loop will improve the program's running time and help
execute the code faster.
Loop optimization can be performed by using the following techniques-

Loop Optimization Techniques


 Code Motion
 Induction Variable Elimination
 Strength Reduction
 Loop Fusion
 Loop Unrolling

We will discuss these techniques one by one.

Code Motion
Also known as Frequency Reduction or Loop Invariant, this technique reduces the number of lines of
code in a loop by moving some statements outside the loop.
Only those statements or expressions that do not affect a program's semantics and give the same result even
after not shifting the statements can be placed just before the loop.
Let’s take an example to understand this technique.

Example:
Consider the following piece of code.

#include <stdio.h>

void main() {
int a = 50;
int x, y = 10, z = 10;
while(a>0) {
x = y + z;
if(a%x==0)
printf("%d",a);
a--;
}
}

In the above code, a variable a is initialized to 50. Then a loop is there, running till a > 0 means 50 times in
which a variable x is assigned to y + z and a condition checking if a divided by x leaves remainder 0 then
prints the value of a.
In the loop, the only variable whose value is getting updated is a. The value of x, y, and z is not getting
changed. So, we can optimize this code by placing the statement x = y + z outside the loop.
Hence, the optimized code will be-

#include <stdio.h>

void main() {
int a = 50;
int x, y = 10, z = 10;
x = y + z;
while(a>0) {
if(a%x==0)
printf("%d",a);
a--;
}
}

Induction Variable Elimination


In the induction variable elimination technique, there is an induction variable. A variable x is called an
induction variable in a loop L; if in every round of the loop, the value of x changes, it either gets
incremented or decremented.
The aim of the induction variable elimination method is to replace the induction variable from the loop. This
technique improves the execution time of a program.
Let’s take an example to understand this technique.

Example:
Consider the following piece of code.

#include <stdio.h>
void main() {
int a[20], b[20];
int i, j, k;
for (i = 0, j = 0, k = 0; i < 20; i++)
a[j++] = b[k++];
}

In the above code, there are three induction variables i, j, and k. The variable i is a counter variable that is
running a loop to copy the values from array b to array a while the variables j and k are just pointing to
the current indices of the two arrays. Thus, we can eliminate the induction variables j and k to optimize our
code.
Hence, the optimized code will be-

#include <stdio.h>

void main() {
int a[20], b[20];
int i;
for (i = 0; i < 20; i++)
a[i] = b[i];
}

Strength Reduction
Strength reduction or reduction in strength is a technique in which we replace expensive operations with
cheaper operations. For example, the strength of the * operator is higher than the + operator. So, the
compiler takes more time to execute the multiplication operation than the addition operation. To optimize
the code, we will simply try to replace the *, i.e., high strength or expensive operator, with the +, i.e., low
strength or cheap operator.
Let’s take an example to understand this technique.

Example:
Consider the following piece of code.

#include <stdio.h>
void main() {
int a = 1, b;
while (a<10) {
b = a * 2;
a++;
}
}
In the above code, there are 2 variables a and b. Variable a is a loop variable. Inside the loop, a
multiplication operation is being performed in which variable a and the digit 2 are multiplied, and the result
is stored in variable b. Since * is a high-strength operator, we will replace it with the + operator.
Hence, the optimized code will be-

#include <stdio.h>

void main() {
int a = 1, b;
while (a<10) {
b = a + a;
a++;
}
}

Loop Fusion
Also known as Loop Jamming, this technique combines two or more loops which have the same index
variable and number of iterations.
This technique reduces the time taken in compiling all the loops.
Let’s take an example to understand this technique.

Example:
Consider the following piece of code.

#include <stdio.h>

void main() {
int a[10], b[10], i;
for(i = 0; i < 10; i++)
a[i] = 1;
for(i = 0; i < 10; i++)
b[i] = 2;
}

In the above code, there are two loops. The first loop running ten times assigns the value 1 at each index
of the array a while the second loop also runs ten times, assigning the value 2 at each index of the array
b. We can observe that the work of both these loops is almost the same. Both are running ten times and
assigning a value to an array. Thus, we can combine both these loops to optimize our code.
Hence, the optimized code will be-

#include <stdio.h>

void main() {
int a[10], b[10], i;
for(i = 0; i < 10; i++) {
a[i] = 1;
b[i] = 2;
}
}

Loop Unrolling
The loop unrolling technique transforms the loop. In this technique, the number of jumps and tests can be
optimized by writing the code to times without changing the meaning of the code. It reduces the number of
iterations and increases the program’s speed by eliminating the loop control instructions and loop test
instructions.
Let’s take an example to understand this technique.

Example:
Consider the following piece of code.

#include <stdio.h>

void main() {
int i = 1;
int a[100], b[100];
while(i<100) {
a[i] = b[i];
i++;
}
}

In the above code, a loop is running 100 times. We can optimize this code by repeating the statements so
that the loop runs only 50 times.
Hence, the optimized code will be-

#include <stdio.h>

void main() {
int i = 1;
int a[100], b[100];
while(i<100) {
a[i] = b[i];
i++;
a[i] = b[i];
i++;
}
}

Basic Block
In compiler design,a basic block is a straight-line piece of code that has only one entry point and one exit
point. Basic block construction is the process of dividing a program's control flow graph into basic blocks.
The task is to partition a sequence of three-address codes into the basic block. The new basic block always
begins with the first instruction and continues to add instructions until a jump or a label is reached. If no
jumps or labels are identified, control will flow sequentially from one instruction to another.

Task: Partition a sequence of three-address codes into basic blocks.

Input: Sequence of three address statements.

Output: A sequence of basic blocks.


Algorithm of Basic Block in Compiler Design

1. First, find the set of leaders from intermediate code, the first statements of basic blocks. The following
are the steps for finding leaders:
1. The first instruction of the three-address code is a leader.
2. Instructions that are the target of conditional/unconditional goto are leaders.
3. Instructions that immediately follow any conditional/unconditional goto/jump statements are
leaders.
2. For each leader found, its basic block contains itself and all instructions up to the next leader.

Hence following the above algorithm, you can partition a sequence of three-address code into basic blocks.

Example of Basic Block in Compiler Design


Consider the source code for converting a 10 x 10 matrix to an identity matrix.

for r from 1 to 10 do
for c from 1 to 10 do
a [ r, c ] = 0.0;

for r from 1 to 10 do
a [ r, c ] = 1.0;

The following are the three address codes for the above source code:

1) r = 1
2) c = 1
3) t1 = 10 * r
4) t2 = t1 + c
5) t3 = 8 * t2
6) t4 = t3 - 88
7) a[t4] = 0.0
8) c = c + 1
9) if c <= 10 goto (3)
10) r = r + 1
11) if r <= 10 goto (2)
12) r = 1
13) t5 = c - 1
14) t6 = 88 * t5
15) a[t6] = 1.0
16) r = r + 1
17) if r <= 10 goto (13)

There are six basic blocks for the above-given code, which are:

 B1 for statement 1
 B2 for statement 2
 B3 for statements 3-9
 B4 for statements 10-11
 B5 for statement 12
 B6 for statements 13-17.

Explanation
According to the definition of leaders given in the above algorithm,

 Instruction 1 is a leader as the first instruction of a three-address code is always a leader.


 Instruction 2 is a leader because it is followed by a goto statement at Instruction 11.
 Instruction 3 and Instruction 13 are also leaders because they are followed by a goto statement at
Instruction 9 and 17, respectively.
 Instruction 10 and Instruction 12 are also leaders because they are followed by a conditional goto
statement at Instruction 9 and 17, respectively.

Flow Graphs
Flow graphs are directed graphs. The nodes/bocks of the control flow graph are the basic blocks of the
program. There are two designated blocks in Control Flow Graph:

1. Entry Block: The entry block allows the control to enter in the control flow graph.
2. Exit Block: Control flow leaves through the exit block.

An edge can flow from one block A to another block B if:

1. the first instruction of the B's block immediately follows the last instruction of the A's block.
2. there is a conditional/unconditional jump from A's end to the starting of B.
3. B follows X in the original order of the three-address code, and A does not end in an unconditional
jump.
Example

Consider the source code for converting a 10 x 10 matrix to an identity matrix.

for r from 1 to 10 do
for c from 1 to 10 do
a [ r, c ] = 0.0;

for r from 1 to 10 do
a [ r, c ] = 1.0;

The following are the three address codes for the above source code:
1) r = 1
2) c = 1
3) t1 = 10 * r
4) t2 = t1 + c
5) t3 = 8 * t2
6) t4 = t3 - 88
7) a[t4] = 0.0
8) c = c + 1
9) if c <= 10 goto (3)
10) r = r + 1
11) if r <= 10 goto (2)
12) r = 1
13) t5 = c - 1
14) t6 = 88 * t5
15) a[t6] = 1.0
16) r = r + 1
17) if r <= 10 goto (13)

There are six basic blocks for the above-given code, which are:
 B1 for statement 1
 B2 for statement 2
 B3 for statements 3-9
 B4 for statements 10-11
 B5 for statement 12
 B6 for statements 13-17.
The control flow graph of the above-given basic blocks is:

Explanation:

 B1 is the start point for the control flow graph because B1 contains the starting instructions.
 Because B1 does not end with unconditional jumps, and the B2 block's leader immediately follows B1's
leader, B2 is the only successor of B1.
 There are two successors to the B3 block. The conditional jump in the last instruction of block B3 is
targeted at the first instruction of the B3 block; therefore, one is block B3 itself. Another is block B4
due to conditional jump at the end of the B3 block.
 The last block, B6, is the exit point of the control flow graph.

DAG Representation of basic blocks

In the compilation process, the high level code must be transformed into low level code. To perform this
transformation, the object code generated must retain the exact meaning of the source code. Hence DAG is
used to depict the structure of the basic blocks, and helps to see the flow of the values among the blocks and
offers some degree of optimization too.

A DAG is used in compiler design to optimize the basic block. It is constructed using Three Address Code.
Then after construction, multiple transformations are applied such as dead code elimination, and common
subexpression elimination. DAG's are useful in compilers because topological ordering can be defined in the
case of DAGs, which is crucial for construction of object level code. Transitive reduction as well as closure
are uniquely defined for DAGs. This is how a DAG looks like:

Directed Acyclic Graph Characteristics


The following are some characteristics of DAG.

 DAG is a type of Data Structure used to represent the structure of basic blocks.
 Its main aim is to perform the transformation on basic blocks.
 The leaf nodes of the directed acyclic graph represent a unique identifier that can be a variable or a
constant.
 The non-leaf nodes represent an operator symbol.
 Moreover, the nodes are also given a string of identifiers to use as labels for the computed value.
 Transitive closure and transitive reduction are defined differently in DAG.
 DAG has defined topological ordering.
Algorithm for construction of DAG
For constructing a DAG, the input and output are as follows.

Input- The input will contain a basic block.


Output- The output will contain the following information-

 Each node of the graph represents a label.


o If the node is a leaf node, the label represents an identifier.
o If the node is a non-leaf node, the label represents an operator.
 Each node contains a list of attached identifiers to hold the computed value.

There are three possible scenarios on which we can construct a DAG.

1. Case 1: x = y op z
where x, y, and z are operands and op is an operator.

2. Case 2: x = op y
where x and y are operands and op is an operator.

3. Case 3: x = y
where x and y are operands.

To draw a DAG, follow these three steps.

1. Step 1:
According to step 1,
1. If, in any of the three cases, the y operand is not defined, then create a node(y).
2. If, in case 1, the z operand is not defined, then create a node(z).

2. Step 2:
According to step 2,
1. For case 1, create a node(op) with node(y) as its left child and node(z) as its right child. Let the
name of this node be n.
2. For case 2, check whether there is a node(op) with one child node as node(y). If there is no such
node, then create a node.
3. For case 3, node n will be node(y).

3. Step 3:
For a node(x), delete x from the list of identifiers. Add x to the list of attached identifiers list found in
step 2. At last, set node(x) to n.

Example 1
Consider the following statements with their three address code-
a=b*c
d=b
e=d*c
b=e
f=b+c
g=d+f

We will construct a DAG for these six statements.

Solution- Since we have six different statements, we will start drawing a graph from the first one.

Step 1: The first statement is a = b * c. This statement lies in the first case from the three cases defined
above. So, we will draw a node(*) with its left child node(b) and right child node(c).

Step 2: The second statement is d = b. This statement lies in the third case from the three cases defined
above. Since we already have a node defining operand b, i.e., node(b), we will append d to node(b).
Step 3: The third statement is e = d * c. This statement lies in the first case from the three cases defined
above. Since we already have a node defining the * (multiplication) operation, i.e., node(*), and nodes
defining operand d and c, i.e., node(d) and node(c), respectively. Thus, we will append the result of d * c,
i.e., e to a.

Step 4: The fourth statement is b = e. This statement lies in the third case from the three cases defined
above. Since we have both b and e defined in our graph, we will simply append b to e.

Step 5: The fifth statement is f = b + c. This statement lies in the first case from the three cases defined
above. Since we already have a node defining operands b and c, i.e., node(b) and node(c), respectively but
no node representing + (addition) operation. We will draw a node(+) with its left child node(b) and right
child node(c).
Step 6: The sixth statement is g = d + f. This statement lies in the first case from the three cases defined
above. Since we already have a node defining operands d and f. We will draw a node(+) with its left child
node(d) and right child node(f).

Example 2
Consider the following expression: (a + b) * (a + b +c)
We will construct a DAG for this expression.

Solution- First, we will find the statements with the three address code for the given expression.

The above expression can be divided into three statements.


t1 = a + b
t2 = t1 + c
t3 = t1 * t2

We will start from the first statement.

Step 1: The first statement is t1 = a + b. This statement lies in the first case from the three cases defined
above. So, we will draw a node(+) with its left child node(a) and right child node(b).

Step 2: The second statement is t2 = t1 + c. This statement lies in the first case from the three cases defined
above. So, we will draw a node(+) with its left child node(t1) and right child node(c).

Step 3: The third statement is t3 = t1 * t2. This statement lies in the first case from the three cases defined
above. So, we will draw a node(*) with its left child node(t1) and right child node(t2).
Application of Dag in Compiler Design

The applications of Directed Acyclic Graph (DAG) in Compiler Design are:

1. Expression Optimization: DAGs are used to optimize expressions by identifying common


subexpressions and representing them efficiently, reducing redundant computations.
2. Code Generation: DAGs assist in generating efficient code by representing intermediate code and
optimizing it before translating it into machine code.
3. Register Allocation: DAGs aid in register allocation by identifying variables that can share the same
register, minimizing the number of memory accesses.
4. Control Flow Analysis: DAGs help in analyzing control flow structures and optimizing the flow of
control within a program.

Advantages of Dag in Compiler Design

The advantages of DAG in Compiler Design are:

 Space Efficiency: DAGs minimize memory usage by representing common subexpressions and
eliminating redundancy in intermediate representations.
 Time Efficiency: DAG-based optimizations reduce the time required for expression evaluation and
code generation, leading to faster compilation times.
 Improved Code Quality: By eliminating redundant computations and optimizing expressions, DAGs
produce more efficient and optimized code, enhancing overall code quality and performance.
 Simplifies Optimization Passes: DAG-based optimizations simplify the implementation of
optimization passes in compilers by providing a structured representation of intermediate code.
 Facilitates Register Allocation: DAGs aid in register allocation by providing insights into the usage of
variables and their lifetimes, enabling efficient allocation of hardware resources.
Code Generator

Code generator is used to produce the target code for three-address statements. It uses registers to store the
operands of the three address statement.

Example:

Consider the three address statement x:= y + z. It can have the following sequence of codes:

MOV x, R0

ADD y, R0

Register and Address Descriptors:


 A register descriptor contains the track of what is currently in each register. The register descriptors
show that all the registers are initially empty.
 An address descriptor is used to store the location where current value of the name can be found at
run time.

A code-generation algorithm:

The algorithm takes a sequence of three-address statements as input. For each three address statement of the
form a:= b op c perform the various actions. These are as follows:

1. Invoke a function getreg to find out the location L where the result of computation b op c should be
stored.
2. Consult the address description for y to determine y'. If the value of y currently in memory and
register both then prefer the register y' . If the value of y is not already in L then generate the
instruction MOV y' , L to place a copy of y in L.
3. Generate the instruction OP z' , L where z' is used to show the current location of z. if z is in both
then prefer a register to a memory location. Update the address descriptor of x to indicate that x is in
location L. If x is in L then update its descriptor and remove x from all other descriptor.

4. If the current value of y or z have no next uses or not live on exit from the block or in register then
alter the register descriptor to indicate that after execution of x : = y op z those register will no longer
contain y or z.

Generating Code for Assignment Statements:

The assignment statement d:= (a-b) + (a-c) + (a-c) can be translated into the following sequence of three
address code:
1. t:= a-b
2. u:= a-c
3. v:= t +u
4. d:= v+u

Code sequence for the example is as follows:

Statement Code Generated Register descriptor Address descriptor


Register empty

t:= a - b MOV a, R0 R0 contains t t in R0


SUB b, R0

u:= a - c MOV a, R1 R0 contains t t in R0


SUB c, R1 R1 contains u u in R1

v:= t + u ADD R1, R0 R0 contains v u in R1


R1 contains u v in R1

d:= v + u ADD R1, R0 R0 contains d d in R0


MOV R0, d d in R0 and memory

Design Issues

In the code generation phase, various issues can arises:

1. Input to the code generator


2. Target program
3. Memory management
4. Instruction selection
5. Register allocation
6. Evaluation order

1. Input to the code generator


 The input to the code generator contains the intermediate representation of the source program and
the information of the symbol table. The source program is produced by the front end.
 Intermediate representation has the several choices:
a)Postfixnotation
b)Syntaxtree
c) Three address code
 We assume front end produces low-level intermediate representation i.e. values of names in it can
directly manipulated by the machine instructions.
 The code generation phase needs complete error-free intermediate code as an input requires.

2. Target program:

The target program is the output of the code generator. The output can be:

a) Assembly language: It allows subprogram to be separately compiled.

b) Relocatable machine language: It makes the process of code generation easier.

c) Absolute machine language: It can be placed in a fixed location in memory and can be executed
immediately.

3. Memory management

 During code generation process the symbol table entries have to be mapped to actual p addresses and
levels have to be mapped to instruction address.

 Mapping name in the source program to address of data is co-operating done by the front end and
code generator.
 Local variables are stack allocation in the activation record while global variables are in static area.

4. Instruction selection:
 Nature of instruction set of the target machine should be complete and uniform.
 When you consider the efficiency of target machine then the instruction speed and machine idioms
are important factors.

 The quality of the generated code can be determined by its speed and size.

Example:

The Three address code is:

1. a:= b + c
2. d:= a + e

Inefficient assembly code is:


1. MOV b, R0 R0→b
2. ADD c, R0 R0 c + R0
3. MOV R0, a a → R0
4. MOV a, R0 R0→ a
5. ADD e, R0 R0 → e + R0
6. MOV R0, d d → R0

5. Register allocation

Register can be accessed faster than memory. The instructions involving operands in register are shorter and
faster than those involving in memory operand.

The following sub problems arise when we use registers:

Register allocation: In register allocation, we select the set of variables that will reside in register.

Register assignment: In Register assignment, we pick the register that contains variable.

Certain machine requires even-odd pairs of registers for some operands and result.

For example:

Consider the following division instruction of the form:

1. D x, y

Where,

x is the dividend even register in even/odd register pair

y is the divisor

Even register is used to hold the reminder.

Old register is used to hold the quotient.

6. Evaluation order

The efficiency of the target code can be affected by the order in which the computations are performed.
Some computation orders need fewer registers to hold results of intermediate than others.
Peephole Optimization
Peephole optimization is an optimization technique performed on a small set of compiler-generated
instructions; the small set is known as the peephole optimization in compiler design or window.

Some important aspects regarding peephole optimization:


1. It is applied to the source code after it has been converted to the target code.

2. Peephole optimization comes under machine-dependent optimization. Machine-dependent optimization


occurs after the target code has been generated and transformed to fit the target machine architecture. It
makes use of CPU registers and may make use of absolute memory references rather than relative
memory references.

3. It is applied to a small piece of code, repeatedly.

The objectives of peephole optimization are as follows:


1. Improve performance
2. Reduce memory footprint
3. Reduce code size.

Objectives of Peephole Optimization in Compiler Design


The objectives of Peephole optimization in compiler design are:

 It makes the generated machine code smaller, improving cache usage and saving memory.
 Improve the performance of instructions arranged to make the program run faster.
 Get rid of operations that are not needed or are repeated to make things work better and smoother.
 It calculates fixed values in advance and substitutes variables with already known numbers.
 Improve how choices are made to control the program's flow more effectively.
 Replace slower instructions with quicker options for better performance.
 Optimize memory operations for better data handling.
 Use specific hardware features for better performance on the target platform.

Working of Peephole Optimization in Compiler design


There are mainly four steps in Peephole Optimization in Compiler Design. The steps are as follows:

Identification
 The first step says that you must identify the code section where you need the Peephole Optimization.
 Peephole is an instruction with a fixed window size, so the window size depends on the specific
optimization being performed.

 The compiler helps to define the instructions within the window.

Optimization
 In the next step, you must apply the rules of optimizations pre-defined in the Peephole.

 The compiler will search for the specific pattern of instructions in the window.

 There can be many types of patterns, such as insufficient code, series of loads and stores or complex
patterns like branches.

Analysis
 After the pattern is identified, the compiler will make the changes in the instructions.

 Now the compiler will cross-check the codes to determine whether the changes improved the code.

 It will check the improvement based on size, speed and memory usage.

Iteration
 The above steps will go on a loop by finding the Peephole repeatedly until no more optimisation is left
in the code.

 The compiler will go to each instruction one at a time and make the changes and reanalyse it for the
best result.

Peephole Optimization Techniques


There are various peephole optimization techniques.

Redundant Load and Store


In this optimization, the redundant operations are removed. For example, loading and storing values on
registers can be optimized.

For example,
a= b+c
d= a+e
It is implemented on the register(R0) as
MOV b, R0; instruction to copy b to the register
ADD c, R0; instruction to Add c to the register, the register is now b+c
MOV R0, a; instruction to Copy the register(b+c) to a
MOV a, R0; instruction to Copy a to the register
ADD e, R0 ;instruction to Add e to the register, the register is now a(b+c)+e
MOV R0, d; instruction to Copy the register to d

This can be optimized by removing load and store operation, like in third instruction value in register R0 is
copied to a, and it again loaded to R0 in the next step for further operation. The optimized implementation
will be:

MOV b, R0; instruction to Copy b to the register


ADD c, R0; instruction to Add c to the register, which is now b+c (a)
MOV R0, a; instruction to Copy the register to a
ADD e, R0; instruction to Add e to the register, which is now b+c+e [(a)+e]
MOV R0, d; instruction to Copy the register to d

Strength Reduction
In strength reduction optimization, operators that consume higher execution time are replaced by the
operators consuming less execution time. Like multiplication and division, operators can be replaced by shift
operators.

Initial code:
n = a * 2;
Optimized code:
b= a << 1;
//left shifting the bit
Initial code:
b = a / 2;
Optimized code:
b = a >> 1;
// right shifting the bit by one will give the same result

Simply Algebraic Expressions


The algebraic expressions that are useless or written inefficiently are transformed.
For example:
a=a+0
a=a*1
a=a/1
a=a-0
//All these above expression are causing calculation overhead.
// These can be removed for optimization

Replace Slower Instructions With Faster


Slower instructions can be replaced with faster ones, and registers play an important role. For example, a
register supporting unit increment operation will perform better than adding one to the register. The same
can be done with many other operations, like multiplication.

Add #1
SUB #1
//The above instruction can be replaced with
// INC R
// DEC R
//If the register supports increment and decrement

Let’s see another example of Java bytecode:


Here X is loaded on ‘a’ twice and then multiplied. We can use dup function, it will copy the value on the top
of the stack( ‘X’ need not be loaded again), and then we can perform our operation. It works faster and can
be preferred over slower operations.

a load X
a load X
Mul
// The above instructions can be replaced with
a load X
dup
Mul

Dead code Elimination


The dead code can be eliminated to improve the system's performance; resources will be free and less
memory will be needed.

int dead(void)
{
int a=1;
int b=5;
int c=a+b;
return c;
// c will be returned
// The remaining part of code is dead code, never reachable
int k=1;
k=k*2;
k=k+b;
return k;
// This dead code can be removed for optimization
}

Moreover, null sequences and user less operations can be deleted too.

Target Machine

 The target computer is a type of byte-addressable machine. It has 4 bytes to a word.


 The target machine has n general purpose registers, R0, R1,...., Rn-1. It also has two-address
instructions of the form:

1. op source, destination

Where, op is used as an op-code and source and destination are used as a data field.

 It has the following op-codes:


ADD (add source to destination)
SUB (subtract source from destination)
MOV (move source to destination)

 The source and destination of an instruction can be specified by the combination of registers and
memory location with address modes.

MODE FORM ADDRESS EXAMPLE ADDED COST

absolute M M Add R0, R1 1


register R R Add temp, R1 0

indexed c(R) C+ contents(R) ADD 100 (R2), R1 1

indirect register *R contents(R) ADD * 100 0

indirect indexed *c(R) contents(c+ contents(R)) (R2), R1 1

literal #c c ADD #3, R1 1

 Here, cost 1 means that it occupies only one word of memory.


 Each instruction has a cost of 1 plus added costs for the source and destination.
 Instruction cost = 1 + cost is used for source and destination mode.

Example:

1. Move register to memory R0 → M

2. Indirect indexed mode:

3. Literal Mode:

Register Allocation and Assignment

Local register allocation


Register allocation is only within a basic block. It follows top-down approach.

Assign registers to the most heavily used variables


 Traverse the block

 Count uses

 Use count as a priority function

 Assign registers to higher priority variables first

Advantage
. Heavily used values reside in registers

Disadvantage
Does not consider non-uniform distribution of uses

Need of global register allocation

Local allocation does not take into account that some instructions (e.g. those in loops) execute more
frequently. It forces us to store/load at basic block endpoints since each block has no knowledge of the
context of others.

To find out the live range(s) of each variable and the area(s) where the variable is used/defined global
allocation is needed. Cost of spilling will depend on frequencies and locations of uses.

Register allocation depends on:


Size of live range
Number of uses/definitions
Frequency of execution
Number of loads/stores needed.
Cost of loads/stores needed.

Register allocation by graph coloring

Global register allocation can be seen as a graph coloring problem.


Basic idea:
1. Identify the live range of each variable
2. Build an interference graph that represents conflicts between live ranges (two nodes are connected if
the variables they represent are live at the same moment)
3. Try to assign as many colors to the nodes of the graph as there are registers so that two neighbors
have different colors

Fig 4.3 Flow graph of an inner loop

You might also like