Intermediate Code Generation in Compiler Design
Intermediate Code Generation in Compiler Design
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source
program into an independent intermediate code, then the back end of the compiler uses this
intermediate code to generate the target code (which can be understood by the machine). The
benefits of using machine-independent intermediate code are:
Because of the machine-independent intermediate code, portability will be enhanced. For ex,
suppose, if a compiler translates the source language to its target machine language without
having the option for generating intermediate code, then for each new machine, a full native
compiler is required. Because, obviously, there were some modifications in the compiler itself
according to the machine specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the performance of source code by
optimizing the intermediate code.
If we generate machine code directly from source code then for n target machine we will have
optimizers and n code generator but if we will have a machine-independent intermediate code, we
will have only one optimizer. Intermediate code can be either language-specific (e.g., Bytecode for
Java) or language. independent (three-address code).
Expression 2: T1 = T0 + c
Expression 3 : d = T0 + T1
Example :
T1 = a + b
T2 = T1 + c
T3 = T1 x T2
Example :
T1 = a + b
T2 = a – b
T3 = T 1 * T2
T4 = T 1 – T3
T5 = T 4 + T3
Example :
a=bxc
d=b
e=dxc
b=e
f=b+c
g=f+d
Final Directed acyclic graph
Example :
T1:= 4*I0
T2:= a[T1]
T3:= 4*I0
T4:= b[T3]
T5:= T2 * T4
T6:= prod + T5
prod:= T6
T7:= I0 + 1
I0:= T 7
if I0 <= 20 goto 1
General Representation
a = b op c
Where a, b or c represents operands like names, constants or compiler generated temporaries and
op represents the operator
1. Quadruple – It is a structure which consists of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to store the
result of the expression.
Advantage –
Easy to rearrange code for global optimization.
One can quickly access value of temporary variables using symbol table.
Disadvantage –
Contain lot of temporaries.
Temporary variable creation increases time and space complexity.
Example – Consider expression a = b * – c + b * – c. The three address code is:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
2. Triples – This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that triple is
used. So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
Temporaries are implicit and difficult to rearrange code.
It is difficult to optimize because optimization involves moving intermediate code. When a triple
is moved, any other triple referring to it must be updated also. With help of pointer one can
directly access symbol table entry.
Example – Consider expression a = b * – c + b * – c
3. Indirect Triples – This representation makes use of pointer to the listing of all references to
computations which is made separately and stored. Its similar in utility as compared to quadruple
representation but requires less space than it. Temporaries are implicit and easier to rearrange
code.
Example – Consider expression a = b * – c + b * – c
Question – Write quadruple, triples and indirect triples for following expression : (x + y) * (y + z) +
(x + y + z)
Explanation – The three address code is:
(1) i = 1
(2) if i <= 10 goto (4)
(3) goto next
(4) t1 = x + 2
(5) x = t1
(6) t2 = i + 1
(7) i = t2
(8) goto (2)
next: // next line after (8)th statement
Types and Declarations
Introduction.
Applications of types are grouped under checking and translation. First type checking is used to reason
about the behavior of a program at runtime by using logical rules.
In applications of translation, given the type of a name, the compiler can determine the storage
needed for the name during runtime.
In this article, we learn about types and storage layouts for names declared within classes or
procedures. The storage procedure call or an object is allocated during runtime when the procedure is
called or when the object is created.
Type expressions
Types have a structure we will represent using type expressions, a type expression can either be formed
by applying a type constructor operator to a type expression or can be a basic type.
Basic types are determined by the language that is being checked.
An example;
We have an array of type int[2][3] that is read as array of 2 arrays of 3 integers each. It is written as a
type expression as follows; array(2, array(3, integer))
We represent it as a tree as shown below (1);
A basic type expression. These include, boolean, char, integer, float and void.
A type expression is formed by applying an array type constructor to a number and type expression.
A record is a data structure with named fields. A type expression is formed by applying the record type
constructor to the fields' names and their types.
A type expression is formed by using the type constructor → for function types. That is, we write s →
t for 'function from type s to type t'.
If s and t are type expressions, their cartesian product s x t is a type expression.
Type expressions may contain variables whose values are type expressions.
Type equivalence
Type-checking rules are of the form; 'if two type expressions are equal then return a certain type, else return
an error'.
When similar names are used for type expressions and other subsequent type expressions,
ambiguities arise.
The problem is whether a name in a type expression represents itself or represents an abbreviation
for another type expression.
When representing type expressions using graphs, we say that two types are structurally equivalent if
and only if either of the conditions is true;
Declarations
We learn about types and declarations using simplified grammar that declares a single name at a
time.
We have the following grammar; (2)
Type and relative addresses are stored in a symbol table entry for the name. Varying-length data
such as strings or data whose size cannot be determined until runtime such as dynamic arrays is
handled by keeping a fixed amount of storage for a pointer to data.
Assuming that storage is in blocks of contiguous bytes whereby a byte is the smallest unit of
addressable memory. Multibyte objects are stored in consecutive bytes and given the address of the
first byte.
We have the following SDT(Syntax Directed Translation) that computes types and their widths for basic
and array types. (3)
The above SDT uses synthesized attributes type and width for each non-terminal and two
variables t and w to pass type and width information from B node in a parse tree to a node fo the
production C →ℇ.
Similarly, if B → float, B.type is float and B.width is 8, which is the width of a float.
Otherwise, C specifies an array component. The action for C → [num] C1 forms C.type by applying the
type constructor array to the operands num.value and C1.type. For example, the resulting tree structure
for applying an array can be seen from the first image.
To obtain the width of an array we multiply the width of an element by the number of elements in an
array. If addresses of consecutive integers differ by 4, then the address calculations for an array of
integers include multiplications by 4.
These multiplications give opportunities for optimization and therefore the front end needs to make
them explicit.
Sequences and declarations
In programming languages such as C and Java, declarations in a single group are processed as a
group. These declarations can be distributed within a Java procedure but can still be processed when
the procedures analyzed.
We can use a variable to track the next available relative address.
The following translation scheme(4) deals with a sequence of declarations in the form of T
id where T generates a type as shown in image (3).
Before the first declaration is considered, a variable offset that tracks the next available relative
address is set to 0.
The value of offset is incremented by the width of the type of x, x is a new name entered into the
symbol table with its relative address set to the current value of offset.
The semantic action, within the production D → T id, D1 creates a symbol table entry by
executing top.put(id.lexeme, T.type, offset). top denotes the current symbol table.
top.put creates a symbol table entry for id.lexeme with T.type and relative address offset in its data area.
The initialization of offset in image (4) is more evident in the first production that appears as;
P → {offset = 0;} D
Non-terminals generating ℇ referred to as marker non-terminals are used to rewrite productions so that
all actions appear at the right ends.
By using a maker non-terminal M, the above production is restated as;
P→MD
*M → ℇ { offset = 0; }
Records and classes fields
The translation of declarations in image (4) carries over to fields in records and classes. Record types
are added to the grammar in image (3) by adding the following production;
The offset or relative address for a field name is relative to the data area for that record.
Summary.
Graphs are useful for representing type expressions.
Applications of types can be under checking or translation. Checking reasons about the behavior of a
program at runtime by using logical rules.
For translation, given the type of a name, the compiler can determine the storage needed for the name
during runtime.
Translation of Expressions
2 Incremental Translation
The rest of this chapter explores issues that arise during the translation of ex-pressions and statements. We
begin in this section with the translation of ex-pressions into three-address code. An expression with more than
one operator, like a + b* c, will translate into instructions with at most one operator per in-struction. An array
reference A[i][j] will expand into a sequence of three-address instructions that calculate an address for the
reference. We shall consider type checking of expressions in Section 6.5 and the use of boolean expressions to
direct the flow of control through a program in Section 6.6.
The syntax-directed definition in Fig. 6.19 builds up the three-address code for an assignment statement S using
attribute code for S and attributes addr and code for an expression E. Attributes S.code and E.code denote the
three-address code for S and E, respectively. Attribute E.addr denotes the address that will hold the value of E.
Recall from Section 6.2.1 that an address can be a name, a constant, or a compiler-generated temporary.
Consider the last production, E ->• id, in the syntax-directed definition in Fig. 6.19. When an expression is a
single identifier, say x, then x itself holds the value of the expression. The semantic rules for this production
define E.addr to point to the symbol-table entry for this instance of id. Let top denote the current symbol table.
Function top. get retrieves the entry when it is applied to the string representation id.lexeme of this instance of
id. E.code is set to the empty string.
When E —> (Ei), the translation of E is the same as that of the subex-pression Ei. Hence, E.addr equals Ei.addr,
and E.code equals Ei.code.
The operators + and unary - in Fig. 6.19 are representative of the operators in a typical language. The semantic
rules for E —> E1 + E2, generate code to compute the value of E from the values of E1 and E2. Values are
computed into newly generated temporary names. If E1 is computed into Ei.addr and E2 into E2. addr, then E1
+ E2 translates into t = E1. addr + E2. addr, where t is a new temporary name. E.addr is set to t. A sequence of
distinct temporary names ti,t2,... is created by successively executing n e w Tempi).
For convenience, we use the notation gen(x '=' y '+' z) to represent the three-address instruction x = y + z.
Expressions appearing in place of variables like x, y, and z are evaluated when passed to gen, and quoted strings
like are taken literally.5 Other three-address instructions will be built up similarly 5 In syntax-directed
definitions, gen builds an instruction and returns it.
In translation schemes, gen builds an instruction and incrementally emits it by putting it into the stream by
applying gen to a combination of expressions and strings.
When we translate the production E -> Ei+E2, the semantic rules in Fig. 6.19 build up E.code by concatenating
Ei.code, E2.code, and an instruction that adds the values of E1 and E2. The instruction puts the result of the
addition into a new temporary name for E, denoted by E.addr.
The translation of E -» - E1 is similar. The rules create a new temporary for E and generate an instruction to
perform the unary minus operation.
Finally, the production S id = E; generates instructions that assign the value of expression E to the identifier id.
The semantic rule for this production uses function top.get to determine the address of the identifier represented
by id, as in the rules for E —v id. S.code consists of the instructions to compute the value of E into an address
given by E.addr, followed by an assignment to the address top.get(id.lexeme) for this instance of id.
Example 6.11 : The syntax-directed definition in Fig. 6.19 translates the as-signment statement a = b + - c; into
the three-address code sequence
2. Incremental Translation
Code attributes can be long strings, so they are usually generated incremen-tally, as discussed in Section 5.5.2.
Thus, instead of building up E.code as in Fig. 6.19, we can arrange to generate only the new three-address
instructions, as in the translation scheme of Fig. 6.20. In the incremental approach, gen not only constructs a
three-address instruction, it appends the instruction to the sequence of instructions generated so far. The
sequence may either be retained in memory for further processing, or it may be output incrementally.
The translation scheme in Fig. 6.20 generates the same code as the syntax-directed definition in Fig. 6.19. With
the incremental approach, the code at-tribute is not used, since there is a single sequence of instructions that is
created by successive calls to gen. For example, the semantic rule for E ->• E1 + E2 in Fig. 6.20 simply
calls gen to generate an add instruction; the instructions to compute Ei into Ei.addr and E2 into E2.addr have
already been generated.
The approach of Fig. 6.20 can also be used to build a syntax tree. The new semantic action for E -- >
E1 + E2 creates a node by using a constructor, as in of generated instructions.
Here, attribute addr represents the address of a node rather than a variable or constant.
3. Addressing Array Elements
Array elements can be accessed quickly if they are stored in a block of consecu-tive locations. In C and Java,
array elements are numbered 0 , 1 , . . . , n — 1, for an array with n elements. If the width of each array element
is w, then the iih element of array A begins in location
where base is the relative address of the storage allocated for the array. That is, base is the relative address of
A[0].
The formula (6.2) generalizes to two or more dimensions. In two dimensions, we write A[ii][i2] in C and Java
for element i2 in row ii. Let w1 be the width of a row and let w2 be the width of an element in a row. The
relative address of .A[zi][z2] can then be calculated by the formula
Alternatively, the relative address of an array reference can be calculated in terms of the numbers of elements
rij along dimension j of the array and the width w = Wk of a single element of the array. In two dimensions (i.e.,
k = 2 and w = w2), the location for A[n][i2 ] is given by
More generally, array elements need not be numbered starting at 0. In a one-dimensional array, the array
elements are numbered low, low + 1 , . . . , high and base is the relative address of A[low}. Formula (6.2) for the
address of A[i] is replaced by:
The expressions (6.2) and (6.7) can be both be rewritten as i x w + c, where the subexpression c = base — low x
w can be precalculated at compile time.
Note that c = base when low is 0. We assume that c is saved in the symbol table entry for A, so the relative
address of A[i] is obtained by simply adding i x w to c.
Compile-time precalculation can also be applied to address calculations for elements of multidimensional
arrays; see Exercise 6.4.5. However, there is one situation where we cannot use compile-time precalculation:
when the array's size is dynamic. If we do not know the values of low and high (or their gen-eralizations in
many dimensions) at compile time, then we cannot compute constants such as c. Then, formulas like (6.7) must
be evaluated as they are written, when the program executes.
The above address calculations are based on row-major layout for arrays, which is used in C and Java. A two-
dimensional array is normally stored in one of two forms, either row-major (row-by-row) or column-
major (column-by-column). Figure 6.21 shows the layout of a 2 x 3 array A in (a) row-major form and (b)
column-major form. Column-major form is used in the Fortran family of languages.
We can generalize row- or column-major form to many dimensions. The generalization of row-major form is to
store the elements in such a way that, as we scan down a block of storage, the rightmost subscripts appear to
vary fastest, like the numbers on an odometer. Column-major form generalizes to the opposite arrangement,
with the leftmost subscripts varying fastest.
The chief problem in generating code for array references is to relate the address-calculation formulas in Section
6.4.3 to a grammar for array references. Let nonterminal L generate an array name followed by a sequence of
index expressions:
Let us calculate addresses based on widths, using the formula (6.4), rather than on numbers of elements, as in
(6.6). The translation scheme in Fig. 6.22 generates three-address code for expressions with array references. It
consists of the productions and semantic actions from Fig. 6.20, together with productions involving
nonterminal L.
L.array is a pointer to the symbol-table entry for the array name. The base address of the array,
say, L.array.base is used to determine the actual
/-value of an array reference after all the index expressions are analyzed.
L.type is the type of the subarray generated by L. For any type t, we assume that its width is given
by t.width. We use types as attributes, rather than widths, since types are needed anyway for type
checking. For any array type t, suppose that t.elem gives the element type.
The production S -» id = E; represents an assignment to a nonarray vari-able, which is handled as usual. The
semantic action for S —?> L = E; generates an indexed copy instruction to assign the value denoted by
expression E to the location denoted by the array reference L. Recall that attribute L.array gives the symbol-
table entry for the array. The array's base address — the address of its Oth element — is given
by L.array.base. Attribute L.addr denotes the temporary that holds the offset for the array reference generated
by L. The location for the array reference is therefore L.array.base[L.addr]. The generated instruction copies the
r-value from address E.addr into the location for L.
Productions E -- > Ei+E2 and E —> id are the same as before. The se-mantic action for the new
production E —»• L generates code to copy the value from the location denoted by L into a new temporary. This
location is L.array.base[L.addr], as discussed above for the production S -¥ L =
E;. Again, attribute L.array gives the array name, and L.array.base gives its base address.
Attribute L.addr denotes the temporary that holds the offset. The code for the array reference places the r-value
at the location designated by the base and Example 6.12 : Let a denote a 2 x 3 array of integers, and let c, i, and j
all denote integers. Then, the type of a is array(2, array(S, integer)). Its width w is 24, assuming that the width of
an integer is 4. The type of a [ i ] is array(3, integer), of width w1 = 12. The type of a [ i ] [ j ] is integer.
The expression is translated into the sequence of three-address instructions in Fig. 6.24. As usual, we have used
the name of each identifier to refer to its symbol-table entry. •
Type Checking in Compiler Design
Type checking is the process of verifying and enforcing constraints of types in values. A compiler must check
that the source program should follow the syntactic and semantic conventions of the source language and it
should also check the type rules of the language. It allows the programmer to limit what types may be used in
certain circumstances and assigns types to values. The type-checker determines whether these values are used
appropriately or not.
It checks the type of objects and reports a type error in the case of a violation, and incorrect types are corrected.
Whatever the compiler we use, while it is compiling the program, it has to follow the type rules of the language.
Every language has its own set of type rules for the language. We know that the information about data types is
maintained and computed by the compiler.
The information about data types like INTEGER, FLOAT, CHARACTER, and all the other data types is
maintained and computed by the compiler. The compiler contains modules, where the type checker is a module
of a compiler and its task is type checking.
Conversion
Conversion from one type to another type is known as implicit if it is to be done automatically by the compiler.
Implicit type conversions are also called Coercion and coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an integer.
Conversion is said to be Explicit if the programmer writes something to do the Conversion.
Tasks:
1. has to allow “Indexing is only on an array”
2. has to check the range of data types used
3. INTEGER (int) has a range of -32,768 to +32767
4. FLOAT has a range of 1.2E-38 to 3.4E+38.
Static type checking is defined as type checking performed at compile time. It checks the type variables at
compile-time, which means the type of the variable is known at the compile time. It generally examines the
program text during the translation of the program. Using the type rules of a system, a compiler can infer from
the source text that a function (fun) will be applied to an operand (a) of the right type each time the expression
fun(a) is evaluated.
Examples of Static checks include:
Type-checks: A compiler should report an error if an operator is applied to an incompatible operand. For
example, if an array variable and function variable are added together.
The flow of control checks: Statements that cause the flow of control to leave a construct must have
someplace to which to transfer the flow of control. For example, a break statement in C causes control to
leave the smallest enclosing while, for, or switch statement, an error occurs if such an enclosing statement
does not exist.
Uniqueness checks: There are situations in which an object must be defined only once. For example, in
Pascal an identifier must be declared uniquely, labels in a case statement must be distinct, and else a
statement in a scalar type may not be represented.
Name-related checks: Sometimes the same name may appear two or more times. For example in Ada, a
loop may have a name that appears at the beginning and end of the construct. The compiler must check that
the same name is used at both places.
The Benefits of Static Type Checking:
1. Runtime Error Protection.
2. It catches syntactic errors like spurious words or extra punctuation.
3. It catches wrong names like Math and Predefined Naming.
4. Detects incorrect argument types.
5. It catches the wrong number of arguments.
6. It catches wrong return types, like return “70”, from a function that’s declared to return an int.
Dynamic Type Checking:
Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic Type Checking,
types are associated with values, not variables. Implementations of dynamically type-checked languages runtime
objects are generally associated with each other through a type tag, which is a reference to a type containing its
type information. Dynamic typing is more flexible. A static type system always restricts what can be
conveniently expressed. Dynamic typing results in more compact programs since it is more flexible and does not
require types to be spelled out. Programming with a static type system often requires more design and
implementation effort.
Languages like Pascal and C have static type checking. Type checking is used to check the correctness of the
program before its execution. The main purpose of type-checking is to check the correctness and data type
assignments and type-casting of the data types, whether it is syntactically correct or not before their execution.
Static Type-Checking is also used to determine the amount of memory needed to store the variable.
The design of the type-checker depends on:
1. Syntactic Structure of language constructs.
2. The Expressions of languages.
3. The rules for assigning types to constructs (semantic rules).
The token streams from the lexical analyzer are passed to the PARSER. The PARSER will generate a syntax
tree. When a program (source code) is converted into a syntax tree, the type-checker plays a Crucial Role. So,
by seeing the syntax tree, you can tell whether each data type is handling the correct variable or not. The Type-
Checker will check and if any modifications are present, then it will modify. It produces a syntax tree, and after
that, INTERMEDIATE CODE Generation is done.
Overloading:
An Overloading symbol is one that has different operations depending on its context.
Overloading is of two types:
1. Operator Overloading
2. Function Overloading
Operator Overloading: In Mathematics, the arithmetic expression “x+y” has the addition operator ‘+’
is overloaded because ‘+’ in “x+y” have different operators when ‘x’ and ‘y’ are integers, complex numbers,
reals, and Matrices.
Example: In Ada, the parentheses ‘()’ are overloaded, the ith element of the expression A(i) of an Array A has a
different meaning such as a ‘call to function ‘A’ with argument ‘i’ or an explicit conversion of expression i to
type ‘A’. In most languages the arithmetic operators are overloaded.
Function Overloading: The Type Checker resolves the Function Overloading based on types of arguments and
Numbers.
Example:
E-->E1(E2)
{
E.type:= if E2.type = s
E1.type = s -->t then t
else type_error
}
1. Representation of Boolean Expressions
Boolean expressions in programming languages involve logical operations (e.g., AND, OR, NOT) and
comparisons (e.g., equal to, greater than, less than). These expressions evaluate to either true or false. During
translation to intermediate code, boolean expressions need to be represented using appropriate data
structures and operations.
Representation Techniques:
Using Temporary Variables: Translate boolean expressions into intermediate code using temporary
variables to store intermediate results of sub-expressions.
For example, consider the expression a && (b || c). This can be translated into intermediate code as:
cssCopy code
t1 = b || c t2 = a && t1
Using Conditional Jumps: Represent boolean expressions using conditional jumps (if, goto) for control
flow based on the evaluated boolean condition.
For example, in an if-statement like if (a && b), the translation might involve generating code that checks the
truth value of a, then b, and jumps accordingly based on both conditions.
2. Short-Circuit Evaluation
Short-circuit evaluation is a technique used to optimize boolean expressions by evaluating only as much of the
expression as necessary to determine the final value. In many programming languages, the && (AND) and ||
(OR) operators employ short-circuit evaluation.
Using Conditional Jumps: When generating intermediate code for boolean expressions with short-
circuit evaluation, the compiler inserts conditional jumps to skip unnecessary evaluations.
vbnetCopy code
if a goto L1 t1 = false goto L2 L1: if b goto L3 t1 = false goto L2 L3: t1 = true L2:
Control flow statements (like if-else, loops, switch-case) direct the execution of a program based on certain
conditions or values. Intermediate code generation for these statements involves translating high-level
constructs into intermediate representations that can be efficiently executed.
Translation Techniques:
if-else Statements: Translate if-else statements by generating conditional jumps (if, goto) based on
the condition's evaluation.
Example:
vbnetCopy code
if (condition) goto true_label goto false_label true_label: // Code for true block goto exit_label false_label: //
Code for false block exit_label:
Loops (e.g., while, for): Translate loop constructs by using labels and conditional jumps to control the
loop's execution.
Example:
vbnetCopy code
start_label: if (condition) goto loop_body_label goto exit_label loop_body_label: // Loop body code goto
start_label exit_label:
switch-case Statements: Implement switch-case statements using jump tables or series of conditional
branches (if-else) to determine which block of code to execute based on the value of an expression.
Example:
vbnetCopy code
switch (variable) { case 1: goto case_1_label case 2: goto case_2_label default: goto default_label }
case_1_label: // Code for case 1 goto exit_label case_2_label: // Code for case 2 goto exit_label default_label:
// Default case code exit_label:
Additional Resources:
Books:
Back patching is a technique used during intermediate code generation to efficiently handle unresolved jumps
or placeholders in generated code. It involves delaying the assignment of target addresses (or labels) until
more information is available, such as when the target location is known.
Using Lists or Data Structures: During code generation, maintain lists or data structures (e.g., linked
lists, arrays) to store information about unresolved jumps and their target locations.
Delayed Assignment: Instead of immediately assigning target addresses to jumps, delay this
assignment until the target location is determined or generated.
1. Identify Placeholder Locations: During code generation, identify locations in the intermediate code
where target addresses are not yet known (e.g., goto statements without resolved targets).
2. Generate Unique Placeholder Labels: Create unique placeholder labels or identifiers (e.g., numbers,
strings) to represent unresolved targets in the intermediate code.
3. Store Placeholder Information: Store information about these unresolved jumps in a list or data
structure, associating each placeholder with its corresponding jump instruction.
4. Resolve Targets: Once the target locations are determined (e.g., end of a loop, start of a function),
update the placeholder labels with the actual target addresses.
If-Else Statements: Use back patching to handle if-else statements where the target addresses of
goto instructions depend on the evaluation of boolean conditions.
Example:
arduinoCopy code
if (condition) goto L1 // Code for false block goto L2 L1: // Code for true block L2:
Loops (e.g., while, for): Apply back patching to resolve loop exits (goto statements) once the loop
body is generated.
Example:
vbnetCopy code
Switch-Case Statements: Use back patching to resolve jump targets based on the matched case value.
Example:
arduinoCopy code
switch (variable) { case 1: goto case_1_label case 2: goto case_2_label default: goto default_label } // Generate
code for case_1, case_2, and default
Backpatching
The most elementary programming language construct for changing the flow of control in a program
is a label and goto. When a compiler encounters a statement like goto L, it must check that there is
exactly one statement with label L in the scope of this goto statement. If the label has already
appeared, then the symbol table will have an entry giving the compiler-generated label for the first
three-address instruction associated with the source statement labeled L. For the translation, we
generate a goto three-address statement with that compiler-generated label as a target.
When a label L is encountered for the first time in the source program, either in a declaration or as
the target of the forward goto, we enter L into the symbol table and generate a symbolic table for L.
One-pass code generation using backpatching:
In a single pass, backpatching may be used to create a boolean expressions program as well as
the flow of control statements. The synthesized properties truelist and falselist of non-terminal B are
used to handle labels in jumping code for Boolean statements. The label to which control should go
if B is true should be added to B.truelist, which is a list of a jump or conditional jump instructi ons.
B.falselist is the list of instructions that eventually get the label to which control is assigned when B
is false. The jumps to true and false exist, as well as the label field, are left blank when the program
is generated for B. The lists B.truelist and B.falselist, respectively, contain these early jumps.
A statement S, for example, has a synthesized attribute S.nextlist, which indicates a list of jumps to
the instruction immediately after the code for S. It can generate instructions into an instruction
array, with labels serving as indexes. We utilize three functions to modify the list of jumps:
Makelist (i): Create a new list including only i, an index into the array of instructions and the
makelist also returns a pointer to the newly generated list.
Merge(p1,p2): Concatenates the lists pointed to by p1, and p2 and returns a pointer to the
concatenated list.
Backpatch (p, i): Inserts i as the target label for each of the instructions on the record pointed
to by p.
Using a translation technique, it can create code for Boolean expressions during bottom-up parsing.
In grammar, a non-terminal marker M creates a semantic action that picks up the index of the next
instruction to be created at the proper time.
For Example, Backpatching using boolean expressions production rules table:
Step 1: Generation of the production table
Production Table for Backpatching
Step 2: We have to find the TAC(Three address code) for the given expression using backpatching:
A < B OR C < D AND P < Q
Step 3: Now we will make the parse tree for the expression:
Control statements are those that alter the order in which statements are executed. If, If-else,
Switch-Case, and while-do statements are examples. Boolean expressions are often used in
computer languages to
Alter the flow of control: Boolean expressions are conditional expressions that change the
flow of control in a statement. The value of such a Boolean statement is implicit in the
program’s position. For example, if (A) B, the expression A must be true if statement B is
reached.
Compute logical values: During bottom-up parsing, it may generate code for Boolean
statements via a translation mechanism. A non-terminal marker M in the grammar establishes a
semantic action that takes the index of the following instruction to be formed at the appropriate
moment.
Applications of Backpatching:
Intermediate code generation for procedures involves translating function and procedure calls, managing
parameter passing mechanisms, handling function return values, and managing activation records. Let's
explore each aspect in detail:
When translating function and procedure calls in intermediate code generation, the compiler needs to
perform the following tasks:
Identifying Function or Procedure Calls: Recognize function or procedure calls in the source code,
along with their parameters (if any).
Generating Code for Call Instructions: Generate intermediate code to invoke the function or
procedure. This typically involves pushing arguments onto the stack or into registers, setting up the
calling environment, and transferring control to the called function.
Managing Return Addresses: Save the return address (i.e., the instruction following the call) to
facilitate returning to the correct point in the calling function after the callee finishes execution.
Parameter passing mechanisms determine how arguments are transferred from the caller to the callee.
Common mechanisms include:
Pass by Value: Copy the value of the argument into a parameter variable within the callee's activation
record. Changes to the parameter variable do not affect the original argument.
Pass by Reference (or Address): Pass the address (or reference) of the argument instead of its value.
This allows the callee to directly access and modify the original argument.
Pass by Name: Delay the evaluation of arguments until they are actually used within the callee. This is
less common but allows for lazy evaluation.
Function return values are handled as follows during intermediate code generation:
Returning Values: Generate code within the function body to compute and return the result. This
may involve assigning the result to a designated return variable or register.
Setting Return Address: Before returning from a function, restore the return address saved during
the function call to resume execution at the correct location in the calling function.
4. Activation Records
An activation record (also known as a stack frame) is a data structure used to manage information for a single
invocation of a function or procedure. It typically includes:
Local Variables: Storage for variables declared within the function or procedure.
Control Link: Link to the activation record of the calling function (for nested function calls).
Saved Registers: Storage for saved register values (if needed for register-based architectures).
Consider the following C function and its translation into intermediate code:
cCopy code
int add(int a, int b) { return a + b; } int main() { int x = 10; int y = 20; int z = add(x, y); return 0; }
sqlCopy code
Here:
PARAM x and PARAM y push the values of x and y onto the parameter stack.
Summary
Intermediate code generation for procedures involves managing function and procedure calls, handling
parameter passing mechanisms (by value or reference), managing function return values, and maintaining
activation records to support nested function calls and proper memory management during program
execution. This process plays a crucial role in transforming high-level language constructs into an intermediate
representation suitable for subsequent stages of compilation and optimization.