0% found this document useful (0 votes)
16 views36 pages

lecture27.1 - Global Optimization

The lecture discusses code optimization techniques in compiler design, focusing on improving code efficiency and performance through various methods such as high-level and advanced-level optimizations. Key topics include eliminating redundant computations, efficient memory usage, and the importance of maintaining program correctness. The document also highlights the limitations of compiler optimizations and the significance of intermediate representations for effective code transformation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views36 pages

lecture27.1 - Global Optimization

The lecture discusses code optimization techniques in compiler design, focusing on improving code efficiency and performance through various methods such as high-level and advanced-level optimizations. Key topics include eliminating redundant computations, efficient memory usage, and the importance of maintaining program correctness. The document also highlights the limitations of compiler optimizations and the significance of intermediate representations for effective code transformation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CMPE 152

Compiler Design

Lecture 27
Ahmed Ezzat

Optimization & Runtime


Environment
1 Ahmed Ezzat
Outline

⚫ Code optimizations Topics


⚫ Compiler Code Optimization
❑ High-level
❑ Advanced-level
⚫ Summary

2 CMPE 152 Ahmed Ezzat


Where We Are

Source
Code Lexi c al Analysis
Syntax Analysis
S e m a n t i c Analysis
I R Generation
I R Optimization
C o d e Generation
Machine
Optimization Code

3 CMPE 152
Code Optimizations Topics:
High-level Overview

⚫ Goal: improve code to be faster and/or smaller (more


efficient)
❑ Optimized code
❖ Executes faster
❖ Efficient memory usage
❖ Yielding better performance.
❑ Compilers can be designed to provide code
optimization.
❑ Users should only focus on optimizations not
provided by the compiler such as algorithms choice,
i.e., choosing a faster and/or less memory intensive
4 algorithm. CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
High-level Overview

❑ Requirements
❖ Improved code has the same output
❖ Improved code has the same side effect
❖ Improved code is as correct as original
❑ Example tasks
❖ Eliminate redundant computations
❖ Efficient use of registers
❖ Eliminate loops or move computation out of loops; when
appropriate

5 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Where Optimization can Occur

⚫ Programmer ⚫ Processor
❑ Choice of algorithm
❑ Pipelining
❑ Intelligent coding
❑ Multiple execution units
❑ Memory accesses
⚫ Compiler
❑ Branches
❑ Choice of instructions
❑ Caches
❑ Moving code
❑ Reordering code
❑ Strength reduction*
⚫ Rest of system
❑ Must be faithful to original ❑ Uncontrollable
program
* An optimization where a function is calculated more efficiently using previous values of the
6 function, or to replace expensive operation with less expensive one (e.g., replace
multiplication with shift)
Code Optimizations Topics:
Overview

⚫ Machine-independent ❑ Loop unrolling (optimization)


optimizations ❑ Enabling instruction-level
❑ Code motion parallelism
❑ Reduction in strength ⚫ Understanding processor
❑ Common sub-expression optimization
sharing ❑ Translation of instructions
⚫ Tuning: Identifying performance into operations
bottlenecks ❑ Out-of-order execution or
⚫ Machine-dependent Dynamic execution –
optimizations execution order factors-in
❑ Efficient use of registers
availability of data
⚫ Branches
❑ Pointer code
⚫ Caches and Blocking

7 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Limitations of Compiler optimization

⚫ Operate Under Fundamental Constraint


– Must not cause any change in program behavior under any possible
condition
– Often prevents making optimizations that would only affect behavior
under pathological conditions
⚫ Behavior that may be obvious to the programmer can be
obfuscated by languages and coding styles
– E.g., data ranges may be more limited than variable types suggest
⚫ Most analysis is performed only within procedures (code block)
– Whole-program analysis is too expensive in most cases
⚫ Most analysis is based only on static information
– Compiler has difficulty anticipating run-time inputs
⚫ When in doubt, the compiler must be conservative
8 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-Independent Optimizations

⚫ Optimizations you should do regardless of processor /


compiler
⚫ Code Motion
– Reduce frequency with which computation performed
⚫ If it will always produce same result
⚫ Especially moving code out of loop

for (i = 0; i < n; i++) {


for (i = 0; i < n; i++) int ni = n*i;
for (j = 0; j < n; j++) for (j = 0; j < n; j++)
a[n*i + j] = b[j]; a[ni + j] = b[j];
}

9 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Machine-Independent Optimizations

⚫ Most compilers do a good job with array code + simple loop


structures for (i = 0; i < n; i++) {
int ni = n*i;
int *p = a+ni;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (j = 0; j < n; j++)
(*p+j)++ = b[j];
a[n*i + j] = b[j];
}

imull %ebx,%eax # i*n


movl 8(%ebp),%edi # a
leal (%edi,%eax,4),%edx # p = a+i*n (scaled by 4)
⚫ Code generated # Inner Loop
.L40:
movl 12(%ebp),%edi # b
by GCC movl (%edi,%ecx,4),%eax # b+j (scaled by 4)
movl %eax,(%edx) # *p = b[j]
addl $4,%edx # p++ (scaled by 4)
incl %ecx # j++
cmpl %ebx,%ecx # j : n (reversed)
jl .L40 # loop if j<n

10 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Machine-Independent Optimizations

⚫ Strength reduction
❑ Replace costly operation with simpler one
❑ Shift, add instead of multiply or divide
16*x → x << 4
⚫ Utility is machine-dependent
⚫ Depends on cost of multiply or divide instruction
⚫ On Pentium II or III, integer multiply only requires 4 CPU cycles
❑ Recognize sequence of products int ni = 0;
for (i = 0; i < n; i++) {
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
for (j = 0; j < n; j++)
a[ni + j] = b[j];
a[n*i + j] = b[j];
ni += n;
}

11 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Machine-Independent Optimizations

⚫ Share Common Sub-expressions


❑ Reuse portions of expressions
❑ Compilers often unsophisticated about exploiting arithmetic
properties

/* Sum neighbors of i,j */ int inj = i*n + j;


up = val[(i-1)*n + j]; up = val[inj - n];
down = val[(i+1)*n + j]; down = val[inj + n];
left = val[i*n + j-1]; left = val[inj - 1];
right = val[i*n + j+1]; right = val[inj + 1];
sum = up + down + left + right; sum = up + down + left + right;

3 multiplications: i*n, (i–1)*n, (i+1)*n 1 multiplication: i*n

12 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Machine-dependent Optimizations

⚫ Efficient use of registers: Reading and writing


registers is much faster than reading/writing memory

⚫ Limitations
❑ Compiler not always able to determine whether variable can
be held in register
❑ Possibility of aliasing
❑ See example later

13 CMPE 152 Ahmed Ezzat


Code Optimizations Topics:
Machine-dependent Optimizations

⚫ Pointer vs. Array Code Inner Loops:


❑ Array Code
.L24: # Loop:
addl (%eax,%edx,4),%ecx # sum += data[i]
incl %edx # i++
cmpl %esi,%edx # i:length
jl .L24 # if < goto Loop
❑ Pointer Code
.L30: # Loop:
addl (%eax),%ecx # sum += *data
addl $4,%eax # data++
cmpl %edx,%eax # data:dend
jb .L30 # if < goto Loop
❑ Performance:
➢ Array Code: 4 instructions in 2 clock cycles
➢ Pointer Code: Almost same 4 instructions in 3 clock cycles
14 CMPE 152 Ahmed Ezzat
Code Optimizations Topics:
Machine-dependent Optimizations

⚫ Inner Loops: void combine4p(vec_ptr v, int *dest)


{
int length = vec_length(v);
int sum = 0; int *data = get_vec_start(v);
int *dend = data+length;
for (i = 0 ; i < n ; i++) int sum = 0;
sum += v[i]; while (data < dend) {
sum += *data;
*dest = sum; data++;
}
*dest = sum;
}
⚫ Optimization
❑ Use pointers rather than array references
❑ Compiled -O2
➢ Oops! We’re not making progress here!

15 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:

⚫ The optimizer code sits between the front


end and the code generator (Back end):
❑ Works with intermediate code.
❑ Can do control flow analysis.
❑ Can do data flow analysis.
❑ Does transformations to improve the intermediate
code.

16 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Optimizations provided by a compiler at high-


level includes:
❑ Inlining small functions
❑ Code hoisting (move code outside loop)
❑ Dead store elimination
❑ Eliminating common sub-expressions
❑ Loop unrolling (optimization)
❑ Loop optimizations: Code motion, Induction variable
elimination (reduce add/subtract in a loop), and
Reduction in strength.
17 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level

⚫ Example for Induction Variable Elimination:


int a[SIZE]; int a[SIZE];
int b[SIZE]; int b[SIZE];

void f (void) void f (void)


{ {
int i1, i2, i3; int i1;
for (i1 = 0, i2 = 0, i3 = 0;
for (i1 = 0 ; i1 < SIZE ; i1++)
i1 < SIZE; i1++)
a[i2++] = b[i3++]; a[i1] = b[i1];
return; return;
} }

18 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Inlining small functions:


❑ Repeatedly inserting the function code instead of
calling it, saves the calling overhead and enable
further optimizations.
❑ Inlining large functions will make the executable
too large.

19 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Code hosting:
❑ Moving computations outside loops saves computing time
❑ In the following example (2.0 * PI) is an invariant expression there
is no reason to re-compute it 100 times.
DO i = 1, 100
ARRAY(i) = 2.0 * PI * i
END DO

❑ By introducing a temporary variable 't' it can be transformed to:


t = 2.0 * PI
DO i = 1, 100
ARRAY(i) = t * i
END DO

20 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Dead Store elimination:


❑ If the compiler detects variables that are never used, it
may safely ignore many of the operations that compute
their values.

21 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Eliminating common sub-expressions:


❑ Optimization compilers are able to perform quite well:
X = A * LOG(Y) + (LOG(Y) ** 2)
❑ Introduce an explicit temporary variable t:
t = LOG(Y)
X = A * t + (t ** 2)
❑ Saves one 'heavy' function call, by an elimination of the
common sub-expression LOG(Y), the exponentiation now is:
t = LOG(Y)
X = (A + t) * t

22 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Loop unrolling:
❑ The loop exit checks cost CPU time.
❑ Loop unrolling tries to get rid of the checks completely
or to reduce the number of checks.
❑ If you know a loop is only performed a certain number
of times, or if you know the number of times it will be
repeated is a multiple of a constant you can unroll this
loop.

23 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
High-level

⚫ Loop unrolling (Example):


// old loop
for(int i=0; i<3; i++) {
color_map[n+i] = i;
}
// unrolled version – eliminated the check (i<3)
int i = 0;
color_map[n+i] = i;
i++;
color_map[n+i] = i;
i++;
color_map[n+i] = i;
24 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level

⚫ Loop optimization: Code Motion


❑ Any code inside a loop that always computes the same
value can be moved outside/before the loop.
❑ Example:
while (i <= limit-2)
do {loop code}

where the loop code doesn't change the limit variable. The
subtraction, limit-2, will be inside the loop. Code motion
would substitute:
t = limit-2;
while (i <= t)
do {loop code}
25 CMPE 152 Ahmed Ezzat
Compiler Code Optimizations:
High-level

⚫ In Summary:
❑ Compilers can provide some code optimization.
❑ Programmers do not have to worry about such optimizations.
❑ Program definition must be preserved.

26 CMPE 152 Ahmed Ezzat


Compiler Code Optimizations:
Advanced-level

⚫ Intermediate Representation (IR):


❑ The best place for optimizations
❑ Each instruction is simple
❑ By analyzing sequence of IR, one can
accumulate information about definitions and use
of names
❑ Can detect patters and match them, replace them
with better code

27 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Control Flow Graph:


❑ Basic blocks are nodes
❑ Flow of control as edges
❑ Capture the dynamic structure (returns,
branches, jumps)

28 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Basic Blocks:
❑ A list of instructions
❑ Once the block is entered, each instruction in that
basic block is executed exactly once
❑ Only the first statement can be reached from
outside the block (no jumping in)
❑ Execution continues from first instruction to last in
the block (no jumping out)

29 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Basic Blocks (Contd.):


❑ Begins (Leader statements)
❖ Procedure entry point
❖ Label (branch target)
❖ Instruction after branch or return
❑ Ends
❖ Jump
❖ Branch (conditional or unconditional)
❖ Return

30 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Local optimization within the basic block:


❑ Rearrange code or replace them with different code
that does the same thing
❑ Reordering within a block is okay because we know
everything gets executed exactly once
❑ Some local techniques generalize across blocks

31 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Global optimization is hard:


❑ Need to be able to handle multiple
predecessors/successors for a basic block.
❑ Need to be able to handle multiple paths through the
control-flow graph and may need to iterate multiple
times to compute the final value (but the analysis still
needs to terminate)
❑ Need to be able to assign each basic block a
reasonable default value before we've analyzed it.

32 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Structure-preserving transformation on Basic


Blocks:
❑ Common sub-expression computation elimination
❖ A common sub-expression E means that E was previously
computed, and the values of variables in E have not changed
since the previous computation
❑ Dead code elimination
❖ Code that are never reached
❖ Code that does not affect the result of the program

33 CMPE 152 Ahmed Ezzat


Code Optimizations:
Advanced-level

⚫ Structure-preserving transformation on Basic


Blocks (Contd.):
❑ Renaming of temporary variables
❖ Make normal form blocks; block where temporary variables
are not reused within the basic block, i.e., block with the ability
to change any temporary variable t (e.g., t = b + c) to be (u = b
+ c) and change all uses of this instance (t); the value of the
basic block is not changed. In other words, We call such basic
block a normal form block.
❖ Help to decide whether two blocks are equivalent
❖ Allow interchange of statements
❑ Interchange of two independent adjacent statement
34 CMPE 152 Ahmed Ezzat
Summary

⚫ How do your code and data look-like during execution?


⚫ Interaction among compiler, OS, and target machine
(Assembly code)
⚫ Two main themes with memory management:
❑ Allocation of storage locations
❑ Memory Mgmt & Access to variables and data

35 CMPE 152 Ahmed Ezzat


END

36 CMPE 152 Ahmed Ezzat

You might also like