Solution
Solution
1.7
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr.
Class D: 2 × 105
instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 ×
109) =
10.4 × 10-4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3
× 109) =
6.66 × 10-4 s
CPI(P1) = 10.4 × 10-4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10-4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 ×
3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 =
20 × 105
1.8
a. CPI = T
exec
× f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
b. f
B/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. T
A/Tnew = 1.67
T
B/Tnew = 2.27
1.9
1.9.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.9.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.9.3 (S
new
+ D
new
)/(Sold + Dold) = 0.90
D
new
= C × V
new
2 × F
S
old = Vold × I
S
new
= V
new
× I
Therefore:
V
new
= [D
new
/(C × F)]1/2
D
new
= 0.90 × (Sold + Dold) - Snew
S
new
= V
new
× (Sold/Vold)
Pentium 4:
S
new
= V
new
× (10/1.25) = V
new
× 8
D
new
= 0.90 × 100 - V
new
× 8 = 90 - V
new
× 8
V
new
= [(90 - V
new
× 8)/(3.2E8 × 3.6E9)]1/2
V
new
= 0.85 V
Core i5:
S
new
= V
new
× (30/0.9) = V
new
× 33.3
D
new
= 0.90 × 70 - V
new
× 33.3 = 63 - V
new
× 33.3
V
new
= [(63 - V
new
× 33.3)/(2.9E8 × 3.4E9)]1/2
V
new
= 0.64 V
1.11
1.11.1 die area
15cm = wafer area/dies per wafer = π × 7.52/84 = 2.10 cm2
yield15cm = 1/(1 + (0.020 × 2.10/2))2 = 0.9593
die area
20cm = wafer area/dies per wafer = π × 102/100 = 3.14 cm2
yield20cm = 1/(1 + (0.031 × 3.14/2))2 = 0.9093
1.11.2 cost/die
15cm = 12/(84 × 0.9593) = 0.1489
cost/die
20cm = 15/(100 × 0.9093) = 0.1650
1.11.3 die area
15cm = wafer area/dies per wafer = π × 7.52/(84 × 1.1) = 1.91 cm2
yield15cm = 1/(1 + (0.020 × 1.15 × 1.91/2))2 = 0.9575
die area
20cm = wafer area/dies per wafer = π × 102/(100 × 1.1) = 2.86 cm2
yield20cm = 1/(1 + (0.03 × 1.15 × 2.86/2))2 = 0.9082
1.11.4 defects per area0.92 = (1–y.5)/(y.5 × die_area/2) = (1 - 0.92.5)/
(0.92.5 × 2/2) = 0.043 defects/cm
Clock rate
new
= No. instr. × 0.85 × CPI/0.80 CPU time = 0.85/0.80, clock
rate
old = 3.18 GHz
1.13
1.13.1 T(P1) = 5 × 109 × 0.9/(4 × 109) = 1.125 s
T(P2) = 109 × 0.75/(3 × 109) = 0.25 s
clock rate(P1) > clock rate(P2), performance(P1) < performance(P2)
1.13.2 T(P1) = No. instr. × CPI/clock rate
T(P1) = 2.25 3 1021 s
T(P2) 5 N × 0.75/(3 × 109), then N = 9 × 108
1.13.3 MIPS = Clock rate × 10-6/CPI
MIPS(P1) = 4 × 109 × 10-6/0.9 = 4.44 × 103
MIPS(P2) = 3 × 109 × 10-6/0.75 = 4.0 × 103
MIPS(P1) > MIPS(P2), performance(P1) < performance(P2) (from 11a)
1.13.4 MFLOPS = No. FP operations × 10-6/T
MFLOPS(P1) = .4 × 5E9 × 1E-6/1.125 = 1.78E3
MFLOPS(P2) = .4 × 1E9 × 1E-6/.25 = 1.60E3
MFLOPS(P1) > MFLOPS(P2), performance(P1) < performance(P2) (from
11a)
1.14
1.14.1 T
fp = 70 × 0.8 = 56 s. Tnew = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6%
1.14.2 T
new
= 250 × 0.8 = 200 s, T
fp + Tl/s + Tbranch = 165 s, Tint = 35 s. Reduction time
INT: 58.8%
1.14.3 T
new
= 250 × 0.8 = 200 s, T
fp + Tint + Tl/s = 210 s. NO
1.15
1.15.1 Clock cycles = CPIfp × No. FP instr. + CPIint × No. INT instr. +
CPIl/s × No.
L/S instr. + CPI
branch × No. branch instr.
T
CPU = clock cycles/clock rate = clock cycles/2 × 109
clock cycles = 512 × 106; TCPU = 0.256 s
To have the number of clock cycles by improving the CPI of FP instructions:
CPI
improved fp × No. FP instr. + CPIint × No. INT instr. + CPIl/s × No. L/S
instr. +
CPI
branch × No. branch instr. = clock cycles/2
CPI
improved fp = (clock cycles/2 - (CPIint × No. INT instr. + CPIl/s × No. L/S
instr. + CPI
branch × No. branch instr.)) / No. FP instr.
CPI
improved fp = (256 - 462)/50 < 0 = = > not possible
1.15.2 Using the clock cycle data from a.
To have the number of clock cycles improving the CPI of L/S instructions:
CPI
fp × No. FP instr. + CPIint × No. INT instr. + CPIimproved l/s × No. L/S
instr.
+ CPI
branch × No. branch instr. = clock cycles/2
CPI
improved l/s = (clock cycles/2 - (CPIfp × No. FP instr. + CPIint × No. INT
instr. + CPI
branch × No. branch instr.)) / No. L/S instr.
CPI
improved l/s = (256 - 198)/80 = 0.725
1.15.3 Clock cycles = CPIfp × No. FP instr. + CPIint × No. INT instr. +
CPIl/s × No.
L/S instr. + CPI
branch × No. branch instr.
T
CPU = clock cycles/clock rate = clock cycles/2 × 109
CPI
int = 0.6 × 1 = 0.6; CPIfp = 0.6 × 1 = 0.6; CPIl/s = 0.7 × 4 = 2.8;
CPIbranch =
0.7 × 2 = 1.4
T
CPU (before improv.) = 0.256 s; TCPU (aer improv.) = 0.171 s