0% found this document useful (0 votes)
39 views

Lecture 8

The document discusses computer architecture and performance metrics like CPI (cycles per instruction) and MIPS (millions of instructions per second). It provides examples of calculating CPI and performance improvements for a RISC processor based on reducing load/branch times. It also compares the performance and CPI of two code sequences depending on the instruction mix and counts. MIPS is defined as a popular metric but also noted to depend on the program and instruction set.

Uploaded by

King Of Luck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Lecture 8

The document discusses computer architecture and performance metrics like CPI (cycles per instruction) and MIPS (millions of instructions per second). It provides examples of calculating CPI and performance improvements for a RISC processor based on reducing load/branch times. It also compares the performance and CPI of two code sequences depending on the instruction mix and counts. MIPS is defined as a popular metric but also noted to depend on the program and instruction set.

Uploaded by

King Of Luck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

Computer Architecture CSE 3322

Lecture 8

TEST 1 – Tuesday March 3


Lectures 1 - 8, Ch 1,2

Web Site
crystal.uta.edu/~cse3322
TEST 1 – Tuesday March 3
Lectures 1 - 8, Ch 1,2
• HW Due Feb 24
– 1.4.1 p.60
– 1.4.4 p.60
– 1.4.6 p.60
– 1.5.2 p.60-61
– 1.5.4 p.61
– 1.5.5 p.61
CPI
“Average clock cycles per instruction”
CPI = Clock Cycles / Instruction Count
= (CPU Time * Clock Rate) / Instruction Count

CPU Time = Instruction Count x CPI / Clock Rate


= Instruction Count x CPI x Clock Cycle Time

Average CPI = SUM of CPI (i) * I(i) for i=1, n


Instruction Count

Average CPI = SUM of CPI(i) * F(i) for i = 1, n


F(i) is the Instruction Frequency

Invest Resources where time is Spent!


Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1
Load 20% 5
Store 10% 3
Branch 20% 2

Typical Mix
Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
Typical Mix
2.2 = CPI ave
Example (RISC processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5
Load 20% 5 1.0
Store 10% 3 .3
Branch 20% 2 .4
Typical Mix
2.2 = CPI ave

CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle Time


CPU Time Inst Cnt * CPI ave * Clk Cycle Time

% Time = F(i) * CPI(i) / CPI ave


Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Typical Mix
2.2 = CPI ave

CPU Time(i) = Instr Cnt(i) * CPI(i) * Clk Cycle Time


CPU Time Inst Cnt * CPI ave * Clk Cycle Time

% Time = F(i) * CPI(i) / CPI ave


Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Typical Mix
2.2 = CPI ave
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 (2) 1.0 (.4) 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Typical Mix
2.2 (1.6)
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?

2.2/1.6 = 1.375

CPU Time = Inst Cnt * CPI ave * Clk Cycle Time


Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Typical Mix
2.2
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles? CPI = 1.6

How does this compare with using branch prediction to shave a


cycle off the branch time?
Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 (1) .4 (.2) 18%
Typical Mix
2.2 (2.0)
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles? CPI = 1.6

How does this compare with using branch prediction to shave a


cycle off the branch time? CPI = 2.0
Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Typical Mix
2.2
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles? CPI = 1.6

How does this compare with using branch prediction to shave a


cycle off the branch time? CPI = 2.0

What if two ALU instructions could be executed at once?


Example (RISC
processor)
Base Machine (Reg / Reg)
Op Freq Cycles F(i)CPI(i) % Time
ALU 50% 1 (.5) .5 (.25) 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
Typical Mix
2.2 (1.95)
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles? CPI = 1.6

How does this compare with using branch prediction to shave a


cycle off the branch time? CPI = 2.0

What if two ALU instructions could be executed at once?


CPI=1.95
A compiler designer is trying to decide between two code sequences for a
particular machine. Based on the hardware implementation, there are
three different classes of instructions:
Class A has 1 cycle
Class B has 2 cycles
Class C has 3 cycles

The first code sequence has 5 instructions:


2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C.

Which sequence will be faster? How much?


What is the CPI for each sequence?
A compiler designer is trying to decide between two code sequences for a
particular machine. Based on the hardware implementation, there are three
different classes of instructions:
Class A has 1 cycle
Class B has 2 cycles
Class C has 3 cycles

The first code sequence has 5 instructions:


2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10 The second sequence
has 6 instructions:
4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9
Which sequence will be faster? How much?
What is the CPI for each sequence?
A compiler designer is trying to decide between two code sequences for a
particular machine. Based on the hardware implementation, there are three
different classes of instructions:
Class A has 1 cycle
Class B has 2 cycles
Class C has 3 cycles

The first code sequence has 5 instructions:


2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be
faster? How much? 10 / 9 = 1.11
What is the CPI for each sequence?
A compiler designer is trying to decide between two code sequences for a particular
machine. Based on the hardware implementation, there are three different classes of
instructions:
Class A has 1 cycle
Class B has 2 cycles
Class C has 3 cycles

The first code sequence has 5 instructions:


2 of A, 1 of B, and 2 of C 2*1+1*2+2*3 = 10
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C. 4*1+1*2+1*3 = 9 Which sequence will be faster?
How much? 10 / 9 = 1.11
What is the CPI for each sequence? 10/5 = 2
9/6 = 1.5
A popular performance metric is MIPS, the number
of millions of instructions per second.

For a given program,

Instruction Count
MIPS =
Execution time x 10 6
A popular performance metric is MIPS, the number
of millions of instructions per second.

For a given program,

Instruction Count
MIPS =
Execution time x 10 6

1. Cannot compare if instruction set is different


A popular performance metric is MIPS, the number
of millions of instructions per second.

For a given program,

Instruction Count
MIPS =
Execution time x 10 6

1. Cannot compare if instruction set is different


2. Highly dependent on the program
A popular performance metric is MIPS, the number
of millions of instructions per second.

For a given program,

Instruction Count
MIPS =
Execution time x 10 6

1. Cannot compare if instruction set is different


2. Highly dependent on the program
3. Can be inversely proportional to performance
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A has 1 cycle,Class B has 2 cycles, Class C has 3
cycles
Instruction counts ( billions)
Code from A B C
Compiler 1 5 1 1
Compiler 2 10 1 1

• Which sequence will be faster according to MIPS?


• Which sequence will be faster according to execution
time?
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class
C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 5+1x2+1x3=10 billion
Compiler 2
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class
C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 10 billion
Compiler 2 15 billion
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class
C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 10 billion 1010x10-8=100
Compiler 2 15 billion
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class
C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 10 billion 100 sec
Compiler 2 15 billion 150 sec
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class
C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 10 billion 100 sec 7x103/100
Compiler 2 15 billion 150 sec
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 10 billion 100 sec 70
Compiler 2 15 billion 150 sec 12x103/150
MIPS example
Two different compilers are being tested for a 100 MHz.
machine with three different classes of instructions:
Class A Class B Class
C
CPI 1 2 3
Instruction counts ( billions)
Code from A B C Total
Compiler 1 5 1 1 7
Compiler 2 10 1 1 12
CPU cycles Exec Time MIPS
Compiler 1 10 billion 100 sec 70
Compiler 2 15 billion 150 sec 80
Benchmarks
• Performance best determined by running a
real application
– Use programs typical of expected workload
– Or, typical of expected class of applications
e.g., compilers/editors, scientific
applications, graphics, etc.
Benchmarks
• Performance best determined by running a
real application
– Use programs typical of expected workload
– Or, typical of expected class of applications
e.g., compilers/editors, scientific
applications, graphics, etc.
• Small benchmarks
– nice for architects and designers
– easy to standardize
– can be abused
SPEC CPU 2006

• SPEC - System Performance Evaluation Cooperative


• 12 Integer Benchmarks
• 17 Floating Point Benchmarks
• Fig 1.20 P.49
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )

• Example:
Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds of this
time. How much do we have to improve the speed of
multiplication if we want the program to run 4 times
faster?
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )

• Example:
Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do
we have to improve the speed of multiplication if we want the
program to run 4 times faster?
Improved Time = (100 – 80) + 80/n = 100/4
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )

• Example:
Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do we have
to improve the speed of multiplication if we want the program to run 4
times faster?
Improved Time = (100 – 80) + 80/n = 100/4
20 + 80/n = 25
80 = 5n ; n = 16
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )

• Example:
Suppose a program runs in 100 seconds on a
machine, with multiply responsible for 80 seconds of this
time. How much do we have to improve the speed of
multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )

• Example:
Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do
we have to improve the speed of multiplication if we want the
program to run 4 times faster?
How about making it 5 times faster?
Improved Time = (100 –80) + 80/n = 100/5
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )

• Example:
Suppose a program runs in 100 seconds on a machine, with multiply
responsible for 80 seconds of this time. How much do we have to improve
the speed of multiplication if we want the program to run 4 times faster?
How about making it 5 times faster?
Improved Time = (100 –80) + 80/n = 100/5
20 + 80/n = 20
80/n = 0 Impossible!
Remember

• Performance is specific to a particular program/s


– Total execution time is a consistent summary
of performance
Remember
• Performance is specific to a particular program/s
– Total execution time is a consistent summary of
performance

• For a given architecture performance increases come


from:
– increases in clock rate (without adverse CPI affects)
Remember
• Performance is specific to a particular program/s
– Total execution time is a consistent summary of
performance

• For a given architecture performance increases come from:


– increases in clock rate (without adverse CPI affects)
– improvements in processor organization that lower CPI
Remember
• Performance is specific to a particular program/s
– Total execution time is a consistent summary of performance

• For a given architecture performance increases come from:


– increases in clock rate (without adverse CPI affects)
– improvements in processor organization that lower CPI
– compiler enhancements that lower CPI and/or instruction
count

You might also like