0% found this document useful (0 votes)

34 views

Introduc) On: Lecture 12: Measuring Cpu Performance Lecture 1: Evolution of Computer System

The document discusses measuring CPU performance by calculating execution time. Execution time (XT) is calculated as the product of instruction count (IC), average cycles per instruction (CPI), and clock cycle time (C). Comparing execution times of the same program on different machines allows estimating their relative performance and speedup. The document provides examples calculating execution time and speedup based on given parameters like IC, CPI, and clock rate. Key factors affecting performance include hardware technology, instruction set architecture, compiler, and program characteristics.

Uploaded by

bt20103120 Bharat Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Introduc) On: Lecture 12: Measuring Cpu Performance Lecture 1: Evolution of Computer System

Uploaded by

bt20103120 Bharat Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

24/07/17

Introduc)on
•  Most processors execute instruc4ons in a synchronous manner using a clock
that runs at a constant clock rate or frequency f.
•  Clock cycle 4me C is the reciprocal of the clock rate f:
C = 1 / f

Lecture 1: EVOLUTION OF COMPUTER SYSTEM •  The clock rate f depends on two factors:
Lecture 12:1:MEASURING
Lecture CPU
EVOLUTION OF PERFORMANCE
COMPUTER SYSTEM a)  The implementa4on technology used.
b)  The CPU organiza4on used.
DR. KAMALIKA DATTA •  A machine instruc4on typically consists of a number of elementary micro-
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA opera4ons that vary in number and complexity depending on the
DEPARTMENT OF
DEPARTMENT OFCOMPUTER
COMPUTERSCIENCE AND ENGINEERING,
SCIENCE AND ENGINEERING,NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA instruc4on and the CPU organiza4on used.

1 2 2 2

•  A micro-opera4on is an elementary hardware opera4on that can be •  For a given program compiled to run on a specific machine, we can define
carried out in one clock cycle. the following parameters:
–  Register transfer opera4ons, arithme4c and logic opera4ons, etc. a)  The total number of instruc4ons executed or instruc6on count (IC).
•  Thus a single machine instruc4on may take one or more CPU cycles to b)  The average number of cycles per instruc6on (CPI).
complete. c)  Clock cycle 6me (C) of the machine.
–  We can characterize an instruc4on by Cycles Per Instruc6on (CPI). •  The total execu4on 4me can be computed as:
•  Average CPI of a program: Execu&on Time XT = IC x CPI x C
–  Average CPI of all instruc4ons executed in the program on a given processor. •  How do we evaluate and compare the performances of several machines?
–  Different instruc4ons can have different CPIs.

3 2 4 2

By Measuring the Execu)on Times

•  One of the easiest methods to make the comparison. •  An example:
•  We measure the execu4on 4mes of a program on two machines (A and B), as A program is run on three diﬀerent machines A, B and C and execu4on
XTA and XTB. 4mes of 10, 25 and 75 are noted.
–  A is 2.5 4mes faster than B
•  Performance can be deﬁned as the reciprocal of execu4on 4me:
–  A is 7.5 4mes faster than C
PerfA = 1 / XTA
–  B is 3.0 4mes faster than C
PerfB = 1 / XTB
•  We can es4mate the speedup of machine A over machine B as: •  Simple for one program. But the main challenge is to extend the comparison
Speedup = PerfA / PerfB = XTB / XTA when we have a set of programs.
–  Shall be discussed later.

5 2 6 2

1
24/07/17

Example 1 Factors Aﬀec)ng Performance

•  A program is running on a machine with the following parameters: C CPI IC
–  Total number of instruc4ons executed = 50,000,000 Hardware Technology (VLSI) X
–  Average CPI for the program = 2.7
–  CPU clock rate = 2.0 GHz (i.e. C = 0.5 x 10-9 sec) Hardware Technology (Organiza)on) X X

•  Execu4on 4me of the program: XT = IC x CPI x C Instruc)on set architecture X X

XT = 50,000,000 x 2.7 x 0.5 x 10-9 = 0.0675 sec Compiler technology X X

Program X X

7 2 8 2

•  IC depends on:
–  Program used, compiler, ISA •  A tradeoﬀ:
•  CPI depends on: –  RISC: increases number of instruc4ons/program, but decreases CPI and
–  Program used, compiler, ISA, CPU organiza4on clock cycle 4me because the instruc4ons and hence the implementa4ons
•  C depends on: are simple.
–  Technology used to implement the CPU –  CISC: decreases number of instruc4ons/program, but increases CPI and
clock cycle 4me because many instruc4ons are more complex.
•  Unfortunately, it is very diﬃcult to change one parameter in complete •  Overall, it has been found through experimenta4on that RISC
isola4on from the others.
architecture gives becer performance.
–  Basic technologies are interdependent.

9 2 10 2

Example 2 Example 3
•  Suppose that a machine A executes a program with an average CPI of 2.3. •  Consider the earlier example with IC = 50,000,000; average CPI = 2.7, and
Consider another machine B (with the same instruc4on set and a becer clock rate = 2.0 GHz.
compiler) that executes the same program with 20% less instruc4ons and Suppose we use a new compiler on the same program, for which:
with a CPI of 1.7 at 1.2 GHz. –  New IC = 40,000,000
What should be the clock rate of A so that the two machines have the same –  New CPI = 3.0 (i.e. the new compiler is using more complex instruc4ons)
performance? –  Also we have a faster CPU implementa4on, with clock rate = 2.4 GHz.
We must have: ICA x CPIA x CA = ICB x CPIB x CB
Hence: ICA x 2.3 x CA = 0.80 x ICA x 1.7 x (1 / (1.2 x 109)) Speedup = XTold / XTnew
We get: CA = 0.49 x 10-9 sec = (50,000,000 x 2.7 x 0.5 x 10-9) / (40,000,000 x 3.0 x 0.4167 x 10-9)
Thus, clock rate of A = 1 / CA = 2.04 GHz = 1.35 à 35% faster

11 2 12 2

2
24/07/17

n
X
CPU clock cycles = (ICi ⇥ CP Ii)
i=1

Instruc)on Types and CPI Example 4

•  Consider a program execu4ng on a processor, with n types or classes of
n
•  Consider an implementa4on of a ISA where the instruc4ons can be classiﬁed
X
instruc4ons (like, load, store, ALU, branch, etc.).
Instruction Count (IC) = Ci into four types, with CPI values of 1, 2, 3 and 4 respec4vely.
i=1
ICi = number of instruc4ons of type i executed Two code sequences have the following instruc4on counts:
CPIi = cycles per instruc4on for type i Code Sequence ICType1 ICType2 ICType3 ICType4
n
•  The following expressions follow.
CPU clock cycles =
X
(ICi ⇥ CP Ii)
i=1 Pn CS-1 20 15 5 2
n i=1 (ICi ⇥ CP Ii)
CPU clock cycles =
X
(ICi ⇥ CP Ii) CPI = CS-2 10 12 10 4
i=1 0
IC 1
n
XICi CPU cycles for CS-1: 20x1 + 15x2 + 5x3 + 2x4 = 73 CPI for CS-1: 73 / 42 = 1.74
n
X = @ ⇥ CP IiA
Instruction Count (IC) = ICi i=1 IC
i=1 CPU cycles for CS-2: 10x1 + 12x2 + 10x3 + 4x4 = 80 CPI for CS-2: 80 / 36 = 2.22
n
X
Instruction Count (IC) = Ci n
X
i=1 CPU clock cycles = (ICi ⇥ CP Ii)
i=1
13 2 14 2
Pn
⇥ CP Ii)
i=1 (ICi
CPI = Pn
i=1
IC
0 (ICi ⇥ CP I1i ) n
X
CPI = X
n ICi n (ICi ⇥ CP Ii )
CPU clock cycles = X n
X
= @ IC
⇥ CP IiA1 i=1(IC Instruction Count (IC) = ICi
i=1
X
0
n IC ICi CPU clock cycles = i ⇥ CP Ii)
= @ A ⇥ CP Ii i=1 i=1
i=1 IC

ICi
Fi = n
X
IC n ICi
Instruction Count (IC) =X Pn
Instruction Count (IC) = i=1ICi i=1 (ICi ⇥ CP Ii)
i=1 CPI =
n
X IC
CPU clock cycles = (ICi ⇥ CP Ii) n
X
0
ICi
1
i=1 1
= @ ⇥ CP IiA
Pn
i=1 IC
Instruc)on Frequency and CPI
ni=1 (ICi ⇥ CP Ii )
Example 5
P
CPI = i=1 (ICi ⇥ CP Ii)
CPI = IC
n
X IC
Instruction Count (IC) = Ci 0 0
ICi
nn IC
X
X
11 •  Suppose for an implementa4on of a RISC ISA there are four instruc4on
ICi
i
i=1 = @
@ ⇥⇥CPCPIiIAiA
•  CPI can also be expressed in terms of the frequencies of the various types, with their frequency of occurrence (for a typical mix of programs)
Fi =
i=1 IC
instruc4on types that are executed in a program.
i=1 IC and CPI as shown in the table below. IC
–  Fi denotes the frequency of execu4on of instruc4on type i.
1
1 n
X
Type Frequency CPI CPI = (Fi ⇥ CP Ii)
Pn
(ICi ⇥ CP Ii) IC
ICi i Load 20 % 4 i=1
CPI = i=1 F
Fii =
= IC
0
IC 1
IC Store 8 % 3
n
X ICi CPI = (0.20 x 4) + (0.08 x 3) + (0.60 x 1)
= @ ⇥ CP IiA n ALU 60 % 1 + (0.12 x 2)
X
i=1 IC CPI = (Fi ⇥ CP Ii) Branch 12 % 2 = 1.88
i=1

15 2 16 2

1
1

Example 6
•  Suppose that a program is running on a machine with the following
instruc4on types, CPI values, and the frequencies of occurrence.
The CPU designer gives two op4ons: (a) reduce CPI of instruc4on type A to
1.1, and (b) reduce CPI of instruc4on type B to 1.6. Which one is becer? END OF LECTURE 12
Type CPI Frequency Average CPI for (a): 0.60 x 1.1 + 0.10 x 2.2 + 0.30 x 2.0
A 1.3 60 % = 1.48

B 2.2 10 % Average CPI for (b): 0.60 x 1.3 + 0.10 x 1.6 + 0.30 x 2.0
C 2.0 30 % = 1.54
Op&on (a) is beAer

17 2 18 2

3
24/07/17

Introduc)on
•  Basic concept:
–  How to compare the performances of two or more computer systems?
–  We need to execute some programs and measure the execu4on 4mes.
•  Set of standard programs used to comparison is called benchmark.
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture
Lecture 1:13: CHOICEOF
EVOLUTION OFCOMPUTER
BENCHMARKS
SYSTEM •  Various metrics have been proposed to carry out the evalua4on.
–  To be discussed next.

DR. KAMALIKA DATTA

DR. KAMALIKA DATTA
DR. KAMALIKA DATTA
DEPARTMENT OF
DEPARTMENT OFCOMPUTER
COMPUTERSCIENCE AND ENGINEERING,
SCIENCE AND ENGINEERING,NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA

19 2 20 2

Some Early Metrics Used

a)  MIPS (Million Instruc)ons Per Second)
–  Computed as (IC / XT) x 10-6 •  The MIPS ra4ng is only valid to compare the performance of two or more
–  Dependent on instruc4on set, making it difficult to compare MIPS of computers
processors provided that the following condi4ons are sa4sfied:
with different instruc4on sets. a)  The same program is used
–  MIPS varies between programs running on the same processor. b)  The same ISA is used
–  Higher MIPS ra4ng may not mean becer performance. c)  The same compiler is used
–  Example: A machine with op4onal floa4ng-point co-processor.
•  When co-processor is used, overall execu4on 4me is less but more complex •  In other words, the resul4ng programs used to obtain the MIPS ra4ng are
instruc4ons are executed (i.e. smaller MIPS). iden4cal at the machine code level with the same instruc4on count.
•  Solware rou4nes take more 4me but gives higher MIPS value à FALLACY.

21 2 22 2

Example 1
•  Consider a processor with three instruc4on classes A, B and C, with the corresponding
b)  MFLOPS (Million Floa)ng Point Opera)ons Per Second) CPI values being 1, 2 and 3 respec4vely. The processor runs at a clock rate of 1 GHz.
•  Simply computes number of floa4ng-point opera4ons executed per second. For a given program wricen in C, two compilers produce the following executed
•  More suitable for applica4ons that involve lot of floa4ng-point computa4ons. instruc4on counts.
•  Here again, different machines implement different floa4ng-point opera4ons.
Instruc)on Count (in millions)
•  Different floa4ng-point opera4ons take different 4mes.
For ICA For ICB For ICC
–  Division is much slower than addi4on.
•  Compilers have no floa4ng-point opera4ons and has a MFLOP ra4ng of 0. Compiler 1 7 2 1
•  Hence, not very suitable to use this metric across machines and also across Compiler 2 12 1 1
programs. Compute the MIPS ra4ng and the CPU 4me for the two program versions.

23 2 24 2

4
24/07/17

•  $t1 = address of s
MIPS = Clock Rate (MHz) / CPI CPI = CPU Execu4on Cycles / Instruc4on Count
Example 2 •  $t3 = s
A loop in C •  $t2 points to A[0]
•  Solu4on: CPU Time = Instruc4on Count x CPI / Clock Rate LW $t3, 0($t1)
for (k=0; k<1000; k++)
–  For compiler 1: { ADDI $t6, $t2, 4000
A[k] = A[k] + s; Loop: LW $t4, 0($t3)
CPI1 = (7 x 1 + 2 x 2 + 1 x 3) / (7 + 2 + 1) = 14 / 10 = 1.40 ADD $t5, $t4, $t3
} MIPS32 Code
MIPS Ra4ng1 = 1000 MHz / 1.40 = 714.3 MIPS SW $t5, 0($t2)
CPU Time1 = ((7 + 2 + 1) x 106 x 1.40) / (1 x 109) = 0.014 sec ADDI $t2, $t2, 4
BNE $t6, $t2, Loop
–  For compiler 2:
Instruc)on Type CPI
CPI2 = (12 x 1 + 1 x 2 + 1 x 3) / (12 + 1 + 1) = 17 / 14 = 1.21 •  The code is executed on a processor that runs at 1 GHz
(C = 1 nsec). ALU 2
MIPS Ra4ng2 = 1000 MHz / 1.21 = 826.4 MIPS
•  There are four instruc4on types with CPI values as shown in LOAD 5
CPU Time2 = ((12 + 1 + 1) x 106 x 1.21) / (1 x 109) = 0.017 sec the table. STORE 6
MIPS ra&ng indicates that compiler 2 is faster, while in reality the reverse is true. •  We show some calcula4ons next. BRANCH 3

25 2 26 2

•  The code has 2 instruc4ons before the loop and 5 instruc4ons in the body of the loop
that executes 1000 4mes.
–  Total instruc4on count IC = 5 x 1000 + 2 = 5002. •  MIPS ra4ng = Clock Rate (MHz) / CPI = 1000 MHz / 3.6 = 277.8 MIPS
•  Number of instruc4ons executed and frac4on Fi for each instruc4on type: •  The processor achieves its peak MIPS ra4ng when execu4ng a program that only
–  ICALU = 1 + 2 x 1000 = 2001, FALU = 2001 / 5002 = 0.4 = 40% has instruc4ons of the type with lowest CPI (i.e. ALU which has CPI = 2).
–  ICLOAD = 1 + 1 x 1000 = 1001, FLOAD = 1001 / 5002 = 0.2 = 20% –  Peak MIPS ra4ng = Clock Rate (MHz) / CPIALU = 1000 MHz / 2 = 500 MIPS
–  ICSTORE = 1000 , FSTORE = 1000 / 5002 = 0.2 = 20%
–  ICBRANCH = 1000, FBRANCH = 1000 / 5002 = 0.0 = 20%
•  Total CPU clock cycles = 2001 x 2 + 1001 x 5 + 1000 x 6 + 1000 x 3 = 18,007 cycles
•  Average CPI = CPU clock cycles / IC = 18007 / 5002 = 3.6
•  Execu4on 4me = IC x CPI x C = 5002 x 3.6 x 1 x 10-9 = 18.0 μsec

27 2 28 2

Choosing Programs for Benchmarking

•  Suppose you are trying to buy a new computer and you have several •  Diﬀerent levels of programs used for benchmarking:
alterna4ves. a)  Real applica4ons
–  How to decide which one will be best for you?
b)  Kernel benchmarks
•  The best way to evaluate is to run the actual applica4ons that you are c)  Toy benchmarks
expected to run (actual target workload), and ﬁnd out which computer runs
d)  Synthe4c benchmarks
them the fastest.
–  Not possible for everyone to do this.
–  We olen rely on other methods that are standardized to give us a good measure
of performance.

29 2 30 2

5
24/07/17

(a) Real Applica)ons SPEC95 Programs (Integer)

•  Here we select a specific mix or suite of programs that are typical of target
applica4ons or workload (e.g. SPEC95, SPEC CPU2000, etc.). Benchmark Descrip)on
go A game based on ar4ficial intelligence
•  SPEC (System Performance Evalua6on Corpora6on) is the most popular and
industry-standard set of CPU benchmarks. m88ksim A simulator for Motorola 88k chip
gcc Gnu C compiler to generate SPARC code
•  Examples:
compress Compression and decompression u4lity
–  SPECint95 consists of 8 integer programs.
li LISP interpreter
–  SPECfp95 consists of 10 floa4ng-point intensive programs.
ijpeg Image compression and decompression u4lity
–  SPEC CPU2000 consists of 12 integer programs (CINT2000) and 14 floa4ng-point
intensive programs (CFP2000). perl PERL interpreter
–  SPEC CPU2006 consists of 12 integer programs (CINT2006) and 17 floa4ng-point vortex A database program
intensive programs (CFP2006).

31 2 32 2

SPEC95 Programs
Benchmark Descrip)on
CINT2000 (Integer)
tomcatv A mesh genera4on program
(Floa)ng-Point) swim Shallow water modeling
su2cor Quantum physics Monte Carlo simula4on
1.  164.gzip : compression 9.  254.gap : Group theory interpreter
hydro2d Solving hydrodynamic Naiver Stokes equa4ons
2.  175.vpr : FPGA placement / rou4ng 10. 255.vortex : Object-oriented database
mgrid Mul4grid solver on 3D poten4al ﬁeld 3.  176.gcc : C compiler 11. 256.bzip2 : Compression
applu Solving parabolic/ellip4cal diﬀeren4al equa4ons 4.  181.mcf : Combinatorial op4miza4on 12. 300.twolf : VLSI Place / Route
trub3d Simulates turbulence in a cube 5.  186.craly : Chess playing
apsi Solver for distribu4on of pollutant 6.  197.parser : Word processing
fpppp Quantum chemistry simula4on 7.  252.con : Computer visualiza4on
wave5 Simula4on of plasma physics 8.  253.perlbmk : PERL interpreter

33 2 34 2

CFP2000 (Floa)ng-Point) (b) Kernel Benchmarks

•  Here key computa4onally-intensive pieces of code are extracted from real
1.  168.wupwise: Quantum dynamics 9.  187.facerec : Face recogni4on programs (e.g. Fast Fourier transform, matrix factoriza4on, etc.).
2.  171.swim : Shallow water modeling 10. 188.ammp : Computa4onal chemistry •  Unlike real programs, no user would be running the kernel benchmarks.
3.  172.mgrid : Mul4-grid solver 11. 189.lucas : Primality tes4ng –  They are used solely to evaluate performance.
4.  173.applu : Differen4al equa4on solver 12. 191.fma3d : Finite-element simula4on •  Kernels are best used to isolate performance of specific features of a
5.  177.mesa : 3D graphics library 13. 200.sixtrack : Nuclear accelerator machine and evaluate them.
6.  178.galgel : Fluid dynamics 14. 301.apsi : Pollutant distribu4on •  Examples: Livermore Loops, Linpack.
7.  179.art : Neural networks –  Some compilers were reported to have been using benchmark specific
8.  183.equake : Seismic wave simula4on op4miza4ons so as to give the machine a good ra4ng.

35 2 36 2

6
24/07/17

(c) Toy Benchmarks (d) Synthe)c Benchmarks

•  These are small pieces of code, typically between 10 and 100 lines. •  Somewhat similar in principle to kernel benchmarks.
•  They are convenient and can be run easily on any computer. –  They try to match the average frequency of opera4ons and operands of a
large set of programs.
•  They have limited u4lity in benchmarking and hence sparingly used.
•  Synthe4c benchmarks are further removed from reality than kernels, as
•  Examples: Sieve of Eratosthenes, quicksort, etc.
kernel code is extracted from real programs, while synthe4c code is
created ar4ﬁcially to match an average execu4on proﬁle.
•  Examples: Whetstone, Dhrystone, etc.
–  These are not real programs.
•  Some drawbacks with synthe4c benchmarks are discussed next.

37 2 38 2

n
X
Weighted Arithmetic Mean = (Wi ⇥ XTi)
i=1

v
u n
u Y
Normalized Geometric Mean = t XT Ri
n

i=1

•  Compilers and hardware op4miza4ons can ar4ﬁcially inﬂate performance of

these benchmarks but not of real programs.
X n
–  Op4mizing compilers can discard more than 25% of Dhrystone code (e.g. loops
that are executed once).
1
Normalized Arithmetic Mean = XT Ri
n
–  Compilers can op4mize speciﬁc code sequences that appear in, say, Whetstone
i=1
END OF LECTURE 13
benchmark.
X = SQRT (EXP (ALOG (X) / T1)) à X = EXP (ALOG (X) / (2 * T1))
p X
eX = e 2

39 2 40 2

Introduc)on
•  We have seen the need for using real programs for benchmarking.
–  Benchmarks consist of a suite of programs.
•  How to consolidate all the run 4mes and come up with a single metric that
can be used for comparison?
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture 14: SUMMARIZING
Lecture PERFORMANCE
1: EVOLUTION OF RESULTS
COMPUTER SYSTEM –  Machine X may run benchmark B1 faster, while machine Y may run benchmark
B2 faster.
–  Is X faster than Y, or is Y faster than X?
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA •  We shall be discussing several measures that are used for consolida4on.
DR. KAMALIKA DATTA
DEPARTMENT OF
DEPARTMENT OFCOMPUTER
COMPUTERSCIENCE AND ENGINEERING,
SCIENCE AND ENGINEERING,NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA

41 2 42 2

7
24/07/17

What about Reproducibility? How to Summarize Performance Results?

•  The choice of a good benchmark suite that relates to real applica4ons is
•  Actual run 4me of a program on a machine depends on so many factors. essen4al to measuring performance.
–  Degree of mul4programming, disk usage, compiler op4miza4on, etc. •  For a single program, it is very easy to say which computer runs faster.
•  Reproducibility of the experiments is very important. •  However, when there are mul4ple programs, the comparison may not be
–  Anyone should be able to run the experiment and get the same results. so straighuorward.
–  Benchmarks must therefore specify the execu4on environment very clearly.
CPU A (in secs) CPU B (in secs) CPU C (in secs)
•  Example:- SPEC benchmarks men4on details such as: An example Program P1 1 10 25
–  Extensive descrip4on of the computer and the compiler ﬂags.
Program P2 500 250 10
–  Hardware, solware and baseline tuning parameters.
Total Time 501 260 35

43 2 44 2

CPU A (in secs) CPU B (in secs) CPU C (in secs)

(a) Total Execu)on Time
Program P1 1 10 25
Program P2 500 250 10
•  The simplest approach to summarize the rela4ve performances is to look at
Total Time 501 260 35 the total execu4on 4mes of the two programs.
•  We can make the following statements, which may depict a –  CPU A : 501, CPU B: 260, CPU C: 35.
confusing picture when considered together: •  Based on this measure, we can make the following comments:
–  A is 10 4mes faster than B for program P1. –  B is 501 / 260 = 1.93 4mes faster than A for the two programs.
–  B is 2 4mes faster than A for program P2.
–  C is 501 / 35 = 14.31 4mes faster than A for the two programs.
–  A is 25 4mes faster than C for program P1.
–  C is 260 / 35 = 7.43 4mes faster than B for the two programs.
–  C is 50 4mes faster than A for program P2.
–  B is 2.5 4mes faster than C for program P1. •  If the actual workload consists of running P1 and P2 unequal number of
–  X n
C is 25 4mes faster than B for program P2. 4mes, this measure will not give the correct result.
CPU clock cycles = (ICi ⇥ CP Ii)
i=1
45 2 46 2

n
X
Instruction Count (IC) = ICi
i=1

Pn
⇥ CP Ii)
i=1 (ICi
CPI =
IC
Xn ✓ ◆
ICi
= ⇥ CP Ii
i=1
IC

(b) Weighted Execu)on Time

ICi
Fi =
•  Arithme4c Mean: IC •  If the programs cons4tu4ng the workload do not run equally, we can add a
–  Deﬁned as the average execu4on 4me for all the programs in the
n weightage factor to every program and carry out the calcula4on accordingly.
X
benchmark suite.
CPI = (Fi ⇥ CP Ii) –  Thus, if 40% of the tasks in the workload is program P1, and 60% is program P2,
we can deﬁne the corresponding weights as W1 = 0.4, and W2 = 0.6.
–  If XTi denotes the execu4on 4me of the i-th program, and there are n
i=1
programs, we can write •  Two alternate approaches are possible:
n i.  Weighted Arithme4c Mean
1X ii.  Normalized Execu4on Time
Arithmetic Mean = XTi
n i=1

1
47 2 48 2

8
24/07/17

Example 1
•  Weighted Arithme4c Mean (WAM): CPU A (in secs) CPU B (in secs) CPU C (in secs)
–  It is computed as the sum of the products of weigh4ng factors and Program P1 1 10 25
execu4on 4mes. Program P2 500 250 10
–  If Wi denotes the weigh4ng factor of program i, we can write Total Time 501 260 35
n
X
Weighted Arithmetic Mean = (Wi ⇥ XTi) •  If W1 = 0.50 and W2 = 0.50, we get WAMA = 250.5, WAMB = 130, WAMC = 17.5
i=1 •  If W1 = 0.90 and W2 = 0.10, we get WAMA = 50.9, WAMB = 34, WAMC = 23.5

v •  If W1 = 0.10 and W2 = 0.90, we get WAMA = 450.1, WAMB = 226, WAMC = 11.5
u n
u Y
Geometric Mean = t XT Ri
n

i=1
49 2 50 2

n
X
Weighted Arithmetic Mean = (Wi ⇥ XTi)
n i=1
X
Weighted Arithmetic Mean = (Wi ⇥ XTi)
(c) Normalized Execu)on Time
i=1 v
u n
u Y
Normalized Geometric Mean = t XT Ri
n
v
•  As an alterna4ve, we can normalize all execu4on 4mes to a reference
u n
u Y
machine, and then take the average of the normalized execu4on 4mes. i=1
Normalized Geometric Mean =
n
XT Ri t
–  Followed in the SPEC benchmarks, where a SPARCsta4on is taken as the
reference machine. i=1 •  Here, XTRi denotes the execu4on 4me for the i-th program, normalized to the
•  Average normalized execu4on 4me can be expressed as either an arithme4c reference machine.
Xn
or geometric mean. 1
Normalized Arithmetic Mean = XT Ri
n n
2
1X i=1
Normalized Arithmetic Mean = XT Ri
n i=1

51 2 52 2

2
CPU A (in secs) CPU B (in secs) CPU C (in secs)
Example 2 Program P1
2
1 10 25 •  Summary:
Program P2 500 250 10 –  In contrast to arithme4c means, geometric means of normalized
Total Time 501 260 35 execu4on 4mes are consistent no macer which machine is the
reference.
Normalized to A Normalized to B Normalized to C –  Hence, the arithme4c mean should not be used to average normalized
A B C A B C A B C execu4on 4mes.
Program P1 1.0 10.0 25.0 0.1 1.0 2.5 0.04 0.4 1.0 –  One drawback of geometric mean is that they do not predict
execu4on 4mes.
Program P2 1.0 0.5 0.02 2.0 1.0 0.04 50.0 25.0 1.0
•  Also can encourage hardware and solware designers to focus their
Arithme4c mean 1.0 5.25 12.51 1.05 1.0 1.27 25.02 12.7 1.0 acen4on to those benchmarks where performance is easiest to improve
Geometric mean 1.0 2.24 0.71 0.45 1.0 0.32 1.41 3.16 1.0 rather than the ones that are the slowest.

53 2 54 2

9
24/07/17

END OF LECTURE 14
Lecture 1: EVOLUTION OF COMPUTER SYSTEM
Lecture1:14:
Lecture AMADAHL’S
EVOLUTION LAW (PART
OF COMPUTER 1)
SYSTEM

DR. KAMALIKA DATTA

55 2 56 2

Introduc)on What is Amadahl’s Law?

•  It can be used to ﬁnd the maximum expected improvement of an overall
•  Amadahl’s law was established in 1967 by Gene system when only part of the system is improved.
Amadahl. •  It basically states that the performance improvement to be gained from
•  Basically provides an understanding on scaling, using some faster mode of execu4on is limited by the frac4on of the 4me
limita4ons and economics of parallel compu4ng. the faster mode can be used.
•  Forms the basis for quan4ta4ve principles in •  Very useful to check whether any proposed improvement can provide
computer system design. expected return.
•  Can be applied to other applica4on domains as well. –  Used by computer designers to enhance only those architectural features that
result in reasonable performance improvement.
Gene Amadahl –  Referred to as quan6ta6ve principles in design.

57 2 58 2

•  Amadahl’s law demonstrates the law of diminishing returns. •  Amadahl’s law concerns the speedup achievable from an improvement in
computa4on that affects a frac4on F of the computa4on, where the
•  An example: improvement has a speedup of S.
–  Suppose we are improving a part of the computer system that affects
only 25% of the overall task.
–  The improvement can be very liUle or extremely large. Before improvement 1 – F F
–  With “infinite” speedup, the 25% of the task can be done in “zero” 4me.
–  Maximum possible speedup = XTorig / XTnew = 1 / (1 – 0.25) = 1.33
Aaer improvement 1 – F F / S
We can never get a speedup of more than 1.33

59 2 60 2

10
24/07/17

•  Illustra4on of law of diminishing returns:

1 / (1 – 0.25) = 1.33
–  Let F = 0.25.
•  Execu4on 4me before improvement: (1 – F) + F = 1
–  The table shows the speedup (= 1 / (1 – F + F / S) for various values of S.
•  Execu4on 4me aler improvement: (1 – F) + F / S
•  Speedup obtained: S Speedup S Speedup

1 1 1.00 50 1.32
Speedup = 2 1.14 100 1.33
(1 – F) + F / S
5 1.25 1000 1.33
•  As S à ∞, Speedup à 1 / (1 – F) 10 1.29 100,000 1.33
–  The frac4on F limits the maximum speedup that can be obtained.

61 2 62 2

Design Alterna)ve using Amadahl’s law

•  Illustra4on of law of diminishing returns: 1 / (1 – 0.75) = 4.00
–  Let F = 0.75.
–  The table shows the speedup for various values of S.
Loop 1 500 lines 10% of total execu4on 4me
S Speedup S Speedup
1 1.00 50 3.77
2 1.60 100 3.88
5 2.50 1000 3.99 Loop 2 20 lines 90% of total execu4on 4me
10 3.08 100,000 4.00

63 2 64 2

Example 1
•  Suppose we are running a set of programs on a RISC processor, for which the
•  Some examples: following instruc4on mix is observed:
–  We make 10% of a program 90X faster, speedup = 1 / (0.9 + 0.1 / 90) = 1.11 Opera)on Frequency CPIi Wi * CPIi % Time CPI = 2.08
–  We make 90% of a program 10X faster, speedup = 1 / (0.1 + 0.9 / 10) = 5.26 Load 20 % 5 1.00 0.48
–  We make 25% of a program 25X faster, speedup = 1 / (0.75 + 0.25 / 25) = 1.32 Store 8 % 3 0.24 0.12 1 / 2.08
–  We make 50% of a program 20X faster, speedup = 1 / (0.5 + 0.5 / 20) = 1.90 ALU 60 % 1 0.60 0.29
–  We make 90% of a program 50X faster, speedup = 1 / (0.1 + 0.9 / 50) = 8.47 Branch 12 % 2 0.24 0.11

We carry out a design enhancement by which the CPI of Load instruc4ons

reduces from 5 to 2. What will be the overall performance improvement?

65 2 66 2

11
n
X
Weighted Arithmetic Mean = (Wi ⇥ XTi) 24/07/17
i=1

v
u n
u Y
Normalized Geometric Mean = t XT Ri
n

i=1

n
1X
Normalized Arithmetic Mean = XT Ri
n i=1

Frac4on enhanced F = 0.48 p

•  Alternate way of calcula4on: X
Frac4on unaﬀected 1 – F = 1 – 0.48 = 0.52 –  Old CPI = 2.08
eX = e 2
Enhancement factor S = 5 / 2 = 2.5 –  New CPI = 0.20 * 2 + 0.08 * 3 + 0.60 * 1 + 0.12 * 2 = 1.48
Therefore, speedup is
1 1 XTorig IC ⇤ CP Iold ⇤ C
= = 1.40 Speedup = =
(1 – F) + F / S 0.52 + 0.48 / 2.5 XTnew IC ⇤ CP Inew ⇤ C
CP Iold 2.08
= = = 1.40
CP Inew 1.48

67 2 68 2

Example 2 Example 2a
•  The execu4on 4me of a program on a machine is found to be 50 seconds, •  The execu4on 4me of a program on a machine is found to be 50 seconds,
out of which 42 seconds is consumed by mul4ply opera4ons. It is required to out of which 42 seconds is consumed by mul4ply opera4ons. It is required to
make the program run 5 4mes faster. By how much must the speed of the make the program run 8 4mes faster. By how much must the speed of the
mul4plier be improved? mul4plier be improved?
–  Here, F = 42 / 50 = 0.84 –  Here, F = 42 / 50 = 0.84
–  According to Amadahl’s law, –  According to Amadahl’s law, No amount to speed improvement
5 = 1 / (0.16 + 0.84 / S) 8 = 1 / (0.16 + 0.84 / S) in the mul)plier can achieve this.
or, 0.80 + 4.2 / S = 1 or, 1.28 + 6.72 / S = 1 Maximum speedup achievable:
or, S = 21 or, S = – 24 1 / (1 – F) = 6.25

69 2 70 2

Example 3

•  Suppose we plan to upgrade the processor of a web server. The CPU is 30

4mes faster on search queries than the old processor. The old processor is
busy with search queries 80% of the 4me. Es4mate the speedup obtained by END OF LECTURE 15
the upgrade.
–  Here, F = 0.80 and S = 30
–  Thus, speedup = 1 / (0.20 + 0.80 / 30) = 4.41

71 2 72 2

12
24/07/17

Example 1

•  The total execu4on 4me of a typical program is made up of 60% of CPU 4me
and 40% of I/O 4me. Which of the following alterna4ves is becer?
a)  Increase the CPU speed by 50%
b)  Reduce the I/O 4me by half
Lecture 1: EVOLUTION OF COMPUTER SYSTEM Assume that there is no overlap between CPU and I/O opera4ons.
Lecture1:16:
Lecture AMADAHL’S
EVOLUTION LAW (PART
OF COMPUTER 2)
SYSTEM
CPU I/O CPU I/O CPU I/O
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA
DR. KAMALIKA DATTA
DEPARTMENT OF
DEPARTMENT OFCOMPUTER
COMPUTERSCIENCE AND ENGINEERING,
SCIENCE AND ENGINEERING,NIT
NIT MEGHALAYA
MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA

73 2 74 2

Example 2
•  Suppose that a compute-intensive bioinforma4cs program is running on a
•  Increase CPU speed by 50%
given machine X, which takes 10 days to run. The program spends 25% of its
–  Here, F = 0.60 and S = 1.5
4me doing integer instruc4ons, and 40% of 4me doing I/O. Which of the
–  Speedup = 1 / (0.40 + 0.60 / 1.5) = 1.25 following two alterna4ves provides a becer tradeoﬀ?
•  Reduce the I/O 4me by half a)  Use an op4mizing compiler that reduces the number of integer instruc4ons by
30% (assume all integer instruc4ons take the same 4me).
–  Here, F = 0.40 and S = 2
b)  Op4mizing the I/O subsystem that reduces the latency of I/O opera4ons from
–  Speedup = 1 / (0.60 + 0.40 / 2) = 1.25
10 μsec to 5 μsec (that is, speedup of 2).

Thus, both the alterna)ves result in the same speedup.

75 2 76 2

•  Alterna4ve (a): Amadahl’s Law Corollary 1

–  Here, F = 0.25 and S = 100 / 70
–  Speedup = 1 / (0.75 + 0.25 * 70 / 100) = 1.08 •  Make the common case fast.
•  Alterna4ve (b): –  Here “common” means most 4me consuming, and not “most frequent”.
–  Here, F = 0.40 and S = 2 –  According to Amadahl’s law, improving the “uncommon” case will not
–  Speedup = 1 / (0.60 + 0.40 / 2) = 1.25 result in much improvement.
–  The “common” case has to be determined through experimenta4on and
proﬁling.
•  When op4miza4ons are carried out, a case that was common earlier may
become uncommon later, or vice versa.

77 2 78 2

13
24/07/17

Amadahl’s Law Corollary 2 Amadahl’s Non-Corollary

Lnew = Lold * F / S + Lold * (1 – F)
•  Amadahl’s law for latency (L)
–  By deﬁni4on, Lnew = Lold / Speedup •  Amadahl’s law does not bound slowdown.
–  By Amadahl’s law, Lnew = Lold * ((1 – F) + F / S) –  Things can get arbitrarily slow if we hurt the non-common case too much.
–  We can write: •  Example: Suppose F = 0.01 and Lold = 1
Lnew = Lold * F / S + Lold * (1 – F) –  Case 1: S= 0.001 (1000 4mes worse)
Lnew = Lold * 0.01 / 0.001 + Lold * 0.99 = 10 * Lold
–  Case 2: S = 0.00001 (105 4mes worse)
Lnew = Lold * 0.01 / 0.00001 + Lold * 0.99 = 1000 * Lold

79 2 80 2

Extension to Mul)ple Enhancements •  In the calcula4on as shown, it is assumed that F1 and F2 are disjoint.
–  S1 and S2 do not apply to the same por4on of execu4on.
•  If it is not so, we have to treat the overlap as a separate por4on of execu4on
•  Suppose we carry out mul4ple op4miza4ons to a program:
and measure its speedup independently.
–  Op4miza4on 1 speeds up a frac4on F1 of the program by a factor S1
–  F1only , F2only , and F1&2 with speedups S1only , S2only , and S1&2
–  Op4miza4on 2 speeds up a frac4on F2 of the program by a factor S2
1 – F1only – F2only – F1&2 F1only F1&2 F2only
1 – F1 – F2 F1 F2
n
X F1only / F1&2 / F2only /
Weighted Arithmetic Mean = (Wi ⇥ XTi)Speedup 1 – F1only – F2only – F1&2
S1only S1&2 S2only
1 – F1 – F2 F1 / S1 F2 / S2 i=1
1 1
(1 – F1 – F2) + F1 / S1 + F2 / S2 Speedup =
(1 – F1only – F2only – F1&2) + F1only / S1only + F2only / S2only + F1&2 / S1&2
v
u n
u Y
Normalized Geometric Mean = t XT Ri
n 81 2 82 2

i=1

n
1X
Normalized Arithmetic Mean = XT Ri
n i=1

p X
eX = e 2

XTorig IC ⇤ CP Iold ⇤ C Example 3

Speedup = =
XTnew IC ⇤ CP Inew ⇤ C •  Consider an example of memory system.
•  General expression: CP Iold 2.08 –  Main memory and a fast memory called cache CPU
Cache Main
= = = 1.40 Memory Memory
–  Assume m enhancements of frac4ons F
CP Inew 1.48 1, F2, …, Fm by factors of S1, S2, memory.
…, Sm respec4vely. –  Frequently used parts of program/data are kept in
cache memory.
–  Use of the cache memory speeds up memory
1 accesses by a factor of 8.
Speedup = Pm Pm Fi –  Without the cache, memory opera4ons consume a
(1 F
i=1 i ) + i=1 Si frac4on 0.40 of the total execu4on 4me.
–  Es4mate the speedup. Solu)on
1 = 1 = 0.91
2 Speedup =
(1 – F) + F / S (1 – 0.4) + 0.4 / 8

83 2 84 2

14
24/07/17

Example 4
•  Solu4on:
•  Now we consider two levels of cache memory, L1-cache and L2-cache. –  Memory opera4ons = 0.3 Speedup
Assump4ons: –  FL1 = 0.3 * 0.8 = 0.24
1
–  Without the cache, memory opera4ons take 30% of execu4on 4me. –  SL1 = 4
(1 – FL1 – FL2) + FL1 / SL1 + FL2 / SL2
–  The L1-cache speeds up 80% of memory opera4ons by a factor of 4. –  FL2 = 0.3 * (1 – 0.8) * 0.5 = 0.03
–  The L2-cache speeds up 50% of the remaining 20% memory opera4ons by a –  SL2 = 2 1
factor of 2. (1 – 0.24 – 0.03) + 0.24 / 4 + 0.03 / 2
We want to ﬁnd out the overall speedup.
= 1.24

85 2 86 2

END OF LECTURE 16

87 2

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6434)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (641)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (996)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1853)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1018)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (279)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4360)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2010)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Economics of Lysine Production From Glucose
40% (5)
Economics of Lysine Production From Glucose
56 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Ebffiledocnew 3632
No ratings yet
Ebffiledocnew 3632
52 pages
9th Senate
No ratings yet
9th Senate
5 pages
Universal Motor and Stepper Motor
No ratings yet
Universal Motor and Stepper Motor
8 pages
Environmental Accounting: Emergy, Systems Ecology, and Ecological Modelling
No ratings yet
Environmental Accounting: Emergy, Systems Ecology, and Ecological Modelling
4 pages
Introduction To Data Processing Data
No ratings yet
Introduction To Data Processing Data
2 pages
Battery Lelvel Indicator Project
100% (1)
Battery Lelvel Indicator Project
10 pages
Warehousing Provisions: Vivekananda Reddy, Vikash Kumar, Susanta Mishra
No ratings yet
Warehousing Provisions: Vivekananda Reddy, Vikash Kumar, Susanta Mishra
34 pages
MC33879
No ratings yet
MC33879
26 pages
Wicked.part.I.2024.720p.camrip.dublado.srt
No ratings yet
Wicked.part.I.2024.720p.camrip.dublado.srt
149 pages
Open Woven Bandage Is 16469
No ratings yet
Open Woven Bandage Is 16469
6 pages
Basic Grammar in Use BTS AGE
No ratings yet
Basic Grammar in Use BTS AGE
19 pages
Placement Brochure Caet 2019-20
No ratings yet
Placement Brochure Caet 2019-20
30 pages
Thinking Outside The Box: by Sonia Buckley
No ratings yet
Thinking Outside The Box: by Sonia Buckley
6 pages
Cable-Schedule-MV & LV
No ratings yet
Cable-Schedule-MV & LV
2 pages
Cla d0836 Eng
100% (1)
Cla d0836 Eng
276 pages
Dual Thermometer & Thermostat
No ratings yet
Dual Thermometer & Thermostat
6 pages
Chapter 8: Band Theory of Solids Concept of Free Electron Theory: Hour 1
No ratings yet
Chapter 8: Band Theory of Solids Concept of Free Electron Theory: Hour 1
25 pages
2024-esg-in-real-estate-dompendium
No ratings yet
2024-esg-in-real-estate-dompendium
35 pages
Yxr660fa Rhino 660 5ug4 - 2005
No ratings yet
Yxr660fa Rhino 660 5ug4 - 2005
94 pages
Documents - Pub - Pakistan Hosiery Manufacturers Final Voter Listpdf Pakistan Hosiery Manufacturers
0% (1)
Documents - Pub - Pakistan Hosiery Manufacturers Final Voter Listpdf Pakistan Hosiery Manufacturers
60 pages
Unit 6
No ratings yet
Unit 6
9 pages
Final Serene Lakeview Apartments Brochure
No ratings yet
Final Serene Lakeview Apartments Brochure
18 pages
The Craft Of Language And Literary Research Syed Mohammed Haseebuddin Quadri instant download
No ratings yet
The Craft Of Language And Literary Research Syed Mohammed Haseebuddin Quadri instant download
33 pages
LMI Metering Pumps Brochure
No ratings yet
LMI Metering Pumps Brochure
8 pages
Reductive Dissolution of PbO2
No ratings yet
Reductive Dissolution of PbO2
6 pages
GVF 5000 Gillisons Rough Terrain Forklift Parts Manual InteractiveRFS
No ratings yet
GVF 5000 Gillisons Rough Terrain Forklift Parts Manual InteractiveRFS
106 pages
Cup Lump Modified Bitumen Bunting
No ratings yet
Cup Lump Modified Bitumen Bunting
8 pages
Catalogo - Sistema Resight - ZEISS
No ratings yet
Catalogo - Sistema Resight - ZEISS
2 pages
Grand Twin - Sept and Oct, 2019 Quotation
No ratings yet
Grand Twin - Sept and Oct, 2019 Quotation
28 pages

Uploaded by

Uploaded by

24/07/17

By Measuring the Execu)on Times

Example 1 Factors Aﬀec)ng Performance

• Execu4on 4me of the program: XT = IC x CPI x C Instruc)on set architecture X X

Instruc)on Types and CPI Example 4

DR. KAMALIKA DATTA

Some Early Metrics Used

Choosing Programs for Benchmarking

(a) Real Applica)ons SPEC95 Programs (Integer)

CFP2000 (Floa)ng-Point) (b) Kernel Benchmarks

(c) Toy Benchmarks (d) Synthe)c Benchmarks

• Compilers and hardware op4miza4ons can ar4ﬁcially inﬂate performance of

What about Reproducibility? How to Summarize Performance Results?

CPU A (in secs) CPU B (in secs) CPU C (in secs)

(b) Weighted Execu)on Time

DR. KAMALIKA DATTA

Introduc)on What is Amadahl’s Law?

• Illustra4on of law of diminishing returns:

Design Alterna)ve using Amadahl’s law

We carry out a design enhancement by which the CPI of Load instruc4ons

Frac4on enhanced F = 0.48 p

• Suppose we plan to upgrade the processor of a web server. The CPU is 30

Thus, both the alterna)ves result in the same speedup.

• Alterna4ve (a): Amadahl’s Law Corollary 1

Amadahl’s Law Corollary 2 Amadahl’s Non-Corollary

XTorig IC ⇤ CP Iold ⇤ C Example 3

You might also like

•  Execu4on 4me of the program: XT = IC x CPI x C Instruc)on set architecture X X

•  Compilers and hardware op4miza4ons can ar4ﬁcially inﬂate performance of

•  Illustra4on of law of diminishing returns:

•  Suppose we plan to upgrade the processor of a web server. The CPU is 30

•  Alterna4ve (a): Amadahl’s Law Corollary 1