0% found this document useful (0 votes)
45 views14 pages

Homework4 v2 Solution

This document contains a homework assignment on cache performance for a computer architecture course. The homework has 4 questions that assess students' understanding of [1] direct-mapped and set-associative caches, [2] calculating cache parameters like block size from address sizes, [3] filling out cache configuration details, and [4] calculating performance improvement from eliminating cache misses. The homework is worth 100 marks and must be completed in pairs and submitted before the due date.

Uploaded by

razi haider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views14 pages

Homework4 v2 Solution

This document contains a homework assignment on cache performance for a computer architecture course. The homework has 4 questions that assess students' understanding of [1] direct-mapped and set-associative caches, [2] calculating cache parameters like block size from address sizes, [3] filling out cache configuration details, and [4] calculating performance improvement from eliminating cache misses. The homework is worth 100 marks and must be completed in pairs and submitted before the due date.

Uploaded by

razi haider
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Habib University

Dhanani School of Science and Engineering

Computer Architecture
EE 371 / CS 330 / CE 321 (Spring 2023)

Homework 4
Total marks: 100 Marks obtained:

Student Name: Student ID: Section:

Purpose:
The purpose of this assignment is to help you in understanding the performance of cache operations.

Instructions:
1. This assignment should be done in pairs.
2. All questions should be answered in black ink only.
3. Scan your answer sheet and upload it on HU LMS before the due date.

Grading Criteria:
1. Your assignments will be checked by instructor/TA.
2. You can also be asked to give a viva where you will be judged whether you understood the question
yourself or not. If you are unable to answer correctly to the question you have attempted right, you may
lose your marks.
3. Zero will be given if the assignment is found to be plagiarized.
4. Untidy work will result in reduction of your points.

Submission policy:
1. No submission will be accepted after the instructor releases the solution on HU LMS.

CLO Assessment:
This assignment assesses students for the following course learning outcomes.

CLO
Course Learning Outcomes
Assessed
Explain the role of ISA in modern processors and instruction encodings and assembly
CLO 1
language programming
CLO 2 Explain the architecture and working of a single cycle processor
CLO 3 Design the architecture to mitigate issues of a pipelined processor
CLO 4 Analyze the performance of cache operations ✓

1
Question 1 [10 Marks]

Caches are important to providing a high-performance memory hierarchy to processors. Below is


a list of 32-bit memory address references, given as word addresses.
0x03, 0xb4, 0x2b, 0x02, 0xbf, 0x58, 0xbe, 0x0e, 0xb5, 0x2c, 0xba, 0xfd

(a) [05 Marks] For each of these references, identify the binary word address, the tag, and the
index given a direct-mapped cache with 16 one-word blocks. Also list whether each
reference is a hit or a miss, assuming the cache is initially empty. First insertion is already
done so that you may get the idea.

1 word per block = no offset bits


Cache size = 16 blocks = 24 = 4 bits for index
Tag bits = 32 – 4 = 28 bits

Word Address Binary Address Tag Index Hit/Miss


0x03 0000 0011 0 3 M
0xb4 1011 0100 b 4 M
0x2b 0010 1011 2 b M
0x02 0000 0010 0 2 M
0xbf 1011 1111 b f M
0x58 0101 1000 5 8 M
0xbe 1011 1110 b e M
0x0e 0000 1110 0 e M
0xb5 1011 0101 b 5 M
0x2c 0010 1100 2 c M
0xba 1011 1010 b a M
0xfd 1111 1101 f d M

2
(b) [05 Marks] For each of these references, identify the binary word address, the tag, the
index, and the offset given a direct-mapped cache with two-word blocks and a total size of
eight blocks. Also list if each reference is a hit or a miss, assuming the cache is initially
empty.

2 words per block = 21 = 1 bit offset


Cache size = 8 blocks = 23 = 3 bits for index
Tag bits = 32 – 3 – 1 = 28 bits

Word Binary
Tag Index Offset Hit/Miss
Address Address
0x03 0000 0011 0 1 1 M
0xb4 1011 0100 b 2 0 M
0x2b 0010 1011 2 5 1 M
0x02 0000 0010 0 1 0 H
0xbf 1011 1111 b 7 1 M
0x58 0101 1000 5 4 0 M
0xbe 1011 1110 b 7 0 H
0x0e 0000 1110 0 7 0 M
0xb5 1011 0101 b 2 1 H
0x2c 0010 1100 2 6 0 M
0xba 1011 1010 b 5 0 M
0xfd 1111 1101 f 6 1 M

3
Question 2 [10 Marks]

For a direct-mapped cache design with a 64-bit address, the following bits of the address are
used to access the cache.
Tag Index Offset
63-10 9-5 4-0

Offset = 5 bits
Index = 5 bits
Tag = 54 bits

a) [03 Marks] What is the cache block size (in words)?

Offset = 5 bits = 25 bytes in a block = 32 bytes in a block = 8 words in a block.

b) [03 Marks] How many blocks does the cache have?

There are 5 bits for index which means there are 25 blocks = 32 blocks.

c) [04 Marks] What is the ratio between total bits required for such a cache implementation
over the data storage bits?

The total number of bits in a direct-mapped cache is:


2n * (block size + tag size + valid field size)
25 * ((32*8) + 54 + 1) = 32 * (311) = 9952 bits

The cache stores 32 * 32 * 8 = 8192 bits of data

Total bits required for cache implementation / data storage bits


= 9952 / 8192 = 1.21

4
Question 3 [15 Marks]

Considering the address size of 64-bits, fill in the data for difference types of caches:

Data per Associativity Offset


Blocks Sets Tag Bits Index Bits
block -ways Bits
Fully
Associative 8 8 words -- 8 way 59 - 5
Cache
Direct Mapped
16 8 words -- 1 way 55 4 5
Cache
Set Associative
32 8 words 4 8 way 57 2 5
Cache
Direct Mapped
64 8 words -- 1 way 53 6 5
Cache
Set Associative
128 8 words 32 4 way 54 5 5
Cache
Set Associative
256 8 words 32 8 way 54 5 5
Cache
Fully
Associative 512 8 words -- 512 way 59 - 5
Cache
Direct Mapped
1024 8 words -- 1 way 49 10 5
Cache
Set Associative
2048 8 words 64 32 way 53 6 5
Cache
Direct Mapped
4096 8 words -- 1 way 47 12 5
Cache

5
Question 4 [05 Marks]

Assume the miss rate of an instruction cache is 4% and the miss rate of the data cache is 6%. If a
processor has a CPI of 3 without any memory stalls, and the miss penalty is 100 cycles for all
misses, determine how much faster a processor would run with a perfect cache that never missed.
Assume the frequency of all loads and stores is 26%.

The number of memory miss cycles for instructions in terms of the Instruction count (I) is
Instruction miss cycles = I * 4% * 100 = I * 4

As the frequency of all loads and stores is 26%, we can find the number of memory miss cycles
for data references:

Data miss cycles = I * 26% * 6% * 100 = I * 1.56

The total number of memory-stall cycles is 4 I + 1.56 I = 5.56 I

This is more than five cycles of memory stall per instruction.


Accordingly, the total CPI including memory stalls is 3 + 5.56 = 8.56.
Since there is no change in instruction count or clock rate, the ratio of the CPU execution times is:

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑠𝑡𝑎𝑙𝑙𝑠 𝐼 ∗ 𝐶𝑃𝐼 𝑠𝑡𝑎𝑙𝑙 ∗ 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 8.56


= = = 2.85
𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑤𝑖𝑡ℎ 𝑝𝑒𝑟𝑓𝑒𝑐𝑡 𝑐𝑎𝑐ℎ𝑒 𝐼 ∗ 𝐶𝑃𝐼𝑝𝑒𝑟𝑓𝑒𝑐𝑡 ∗ 𝐶𝑙𝑜𝑐𝑘 𝑐𝑦𝑐𝑙𝑒 3

The performance with the perfect cache is better by 2.85 times.

6
Question 5 [10 Marks]
We are given 4 arrays of size 6. Each element in an array is of 32 bits i.e., one word. Following is
the data stored in the array:

A = (25, 48, 43, 30, 47, 36)


B = (16, 29, 35, 38, 32, 41)
C = (24, 33, 5, 39, 10, 14)
D = (23, 7, 11, 44, 42, 22)

The array data is arranged in main memory as follows:

00000 A[0]
00001 A[1]
00010 A[2]
00011 A[3]
00100 A[4]
00101 A[5]
00110
00111
01000 B[0]
01001 B[1]
01010 B[2]
01011 B[3]
01100 B[4]
01101 B[5]
01110
01111
10000 C[0]
10001 C[1]
10010 C[2]
10011 C[3]
10100 C[4]
10101 C[5]
10110
10111
11000 D[0]
11001 D[1]
11010 D[2]
11011 D[3]
11100 D[4]
11101 D[5]
11110
11111

7
We are given a direct mapped cache which contains 8 blocks (each block will contain one word).
Insert the following elements in cache one by one and also mention whether it was a hit or a miss.
Assume that the first block of the cache will be populated by the first element of the array and so
on. First insertion is already done so that you may get the idea.

Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3 4 5 6 7
A[0] M A[0]
A[1] M A[0] A[1]
A[2] M A[0] A[1] A[2]
A[1] H A[0] A[1] A[2]
A[5] M A[0] A[1] A[2] A[5]
B[5] M A[0] A[1] A[2] B[5]
B[4] M A[0] A[1] A[2] B[4] B[5]
B[3] M A[0] A[1] A[2] B[3] B[4] B[5]
B[3] H A[0] A[1] A[2] B[3] B[4] B[5]
B[4] H A[0] A[1] A[2] B[3] B[4] B[5]
D[1] M A[0] D[1] A[2] B[3] B[4] B[5]
D[2] M A[0] D[1] D[2] B[3] B[4] B[5]
D[3] M A[0] D[1] D[2] D[3] B[4] B[5]
D[4] M A[0] D[1] D[2] D[3] D[4] B[5]
C[3] M A[0] D[1] D[2] C[3] D[4] B[5]
C[2] M A[0] D[1] C[2] C[3] D[4] B[5]
C[4] M A[0] D[1] C[2] C[3] C[4] B[5]
C[2] H A[0] D[1] C[2] C[3] C[4] B[5]

What is the Hit Ratio and the Miss Ratio in the above case?

Hit ratio = number of hits / number of accesses = 4/18 = 0.22 = 22%


Miss ratio = number of misses / number of accesses = 14/18 = 0.78 = 78%

8
Question 6 [10 Marks]

Whenever an element from an array is accessed, it is most probable that some other remaining
elements of the array are also accessed. Repeat the same task as in Question 5 but this time design
a cache with 4 blocks in which each block can accommodate 2 words. The first insertion is done
again so that you may get the idea.

Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3
A[0] M A[0] A[1]
A[1] H A[0] A[1]
A[2] M A[0] A[1] A[2] A[3]
A[1] H A[0] A[1] A[2] A[3]
A[5] M A[0] A[1] A[2] A[3] A[4] A[5]
B[5] M A[0] A[1] A[2] A[3] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] B[4] B[5]
B[3] M A[0] A[1] B[2] B[3] B[4] B[5]
B[3] H A[0] A[1] B[2] B[3] B[4] B[5]
B[4] H A[0] A[1] B[2] B[3] B[4] B[5]
D[1] M D[0] D[1] B[2] B[3] B[4] B[5]
D[2] M D[0] D[1] D[2] D[3] B[4] B[5]
D[3] H D[0] D[1] D[2] D[3] B[4] B[5]
D[4] M D[0] D[1] D[2] D[3] D[4] D[5]
C[3] M D[0] D[1] C[2] C[3] D[4] D[5]
C[2] H D[0] D[1] C[2] C[3] D[4] D[5]
C[4] M D[0] D[1] C[2] C[3] C[4] C[5]
C[2] H D[0] D[1] C[2] C[3] C[4] C[5]

What is the Hit Ratio and the Miss Ratio in this case? Is it better than the previous? Does loading
the two array elements into the cache helps us in accessing the elements fast?

Hit ratio = number of hits / number of accesses = 8/18 = 0.44 = 44%


Miss ratio = number of misses / number of accesses = 10/18 = 0.56 = 56%

The hit rate has improved in this case. Yes, loading two array elements into the cache helps in
faster access time.
Loading the entire array into cache will improve the spatial locality.

9
Question 7 [20 Marks]

a) [05 Marks] Repeat Question 5, this time using a fully associative cache containing 8
blocks. Use LRU replacement scheme for eviction.

Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3 4 5 6 7
A[0] M A[0]
A[1] M A[0] A[1]
A[2] M A[0] A[1] A[2]
A[1] H A[0] A[1] A[2]
A[5] M A[0] A[1] A[2] A[5]
B[5] M A[0] A[1] A[2] A[5] B[5]
B[4] M A[0] A[1] A[2] A[5] B[5] B[4]
B[3] M A[0] A[1] A[2] A[5] B[5] B[4] B[3]
B[3] H A[0] A[1] A[2] A[5] B[5] B[4] B[3]
B[4] H A[0] A[1] A[2] A[5] B[5] B[4] B[3]
D[1] M A[0] A[1] A[2] A[5] B[5] B[4] B[3] D[1]
D[2] M D[2] A[1] A[2] A[5] B[5] B[4] B[3] D[1]
D[3] M D[2] A[1] D[3] A[5] B[5] B[4] B[3] D[1]
D[4] M D[2] D[4] D[3] A[5] B[5] B[4] B[3] D[1]
C[3] M D[2] D[4] D[3] C[3] B[5] B[4] B[3] D[1]
C[2] M D[2] D[4] D[3] C[3] C[2] B[4] B[3] D[1]
C[4] M D[2] D[4] D[3] C[3] C[2] B[4] C[4] D[1]
C[2] H D[2] D[4] D[3] C[3] C[2] B[4] C[4] D[1]

Hit ratio = number of hits / number of accesses = 4/18 = 0.22 = 22%


Miss ratio = number of misses / number of accesses = 14/18 = 0.78 = 78%

10
b) [05 Marks] Repeat Question 6, this time using a fully associative cache containing 4 blocks
in which each block can accommodate 2 words. Use LRU replacement scheme for eviction.

Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3
A[0] M A[0] A[1]
A[1] H A[0] A[1]
A[2] M A[0] A[1] A[2] A[3]
A[1] H A[0] A[1] A[2] A[3]
A[5] M A[0] A[1] A[2] A[3] A[4] A[5]
B[5] M A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[3] M A[0] A[1] B[2] B[3] A[4] A[5] B[4] B[5]
B[3] H A[0] A[1] B[2] B[3] A[4] A[5] B[4] B[5]
B[4] H A[0] A[1] B[2] B[3] A[4] A[5] B[4] B[5]
D[1] M D[0] D[1] B[2] B[3] A[4] A[5] B[4] B[5]
D[2] M D[0] D[1] B[2] B[3] D[2] D[3] B[4] B[5]
D[3] H D[0] D[1] B[2] B[3] D[2] D[3] B[4] B[5]
D[4] M D[0] D[1] D[4] D[5] D[2] D[3] B[4] B[5]
C[3] M D[0] D[1] D[4] D[5] D[2] D[3] C[2] C[3]
C[2] H D[0] D[1] D[4] D[5] D[2] D[3] C[2] C[3]
C[4] M C[4] C[5] D[4] D[5] D[2] D[3] C[2] C[3]
C[2] H C[4] C[5] D[4] D[5] D[2] D[3] C[2] C[3]

Hit ratio = number of hits / number of accesses = 8/18 = 0.44 = 44%


Miss ratio = number of misses / number of accesses = 10/18 = 0.56 = 56%

c) [05 Marks] Compare the Hit & Miss Ratio for both the cases (i.e., loading a single element
vs. loading two elements) for both the caches. Which cache helps us in accessing elements
faster? and in which case? [Hint: To answer which cache helps us in accessing the elements
faster, compare the Hit Ratios for both the caches]

Hit ratio(single element) = 22%


Miss ratio(single element) = 78%

Hit ratio(two elements) = 44%


Miss ratio(two elements) = 56%

Fully associative cache with two words per block, loading two array elements is faster.

11
d) [05 Marks] Repeat Question 6, this time using a two-way set associative cache with 2 sets
of 2 blocks. Use LRU replacement scheme for eviction.

1 bit for index

Data to
be Hit/Miss Cache Index
Inserted
Set 0 Set 1
Block 0 Block 1 Block 0 Block 1
A[0] M A[0] A[1]
A[1] H A[0] A[1]
A[2] M A[0] A[1] A[2] A[3]
A[1] H A[0] A[1] A[2] A[3]
A[5] M A[0] A[1] A[2] A[3] A[4] A[5]
B[5] M A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[3] M A[0] A[1] A[2] A[3] B[2] B[3] B[4] B[5]
B[3] H A[0] A[1] A[2] A[3] B[2] B[3] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] B[2] B[3] B[4] B[5]
D[1] M A[0] A[1] A[2] A[3] D[0] D[1] B[4] B[5]
D[2] M A[0] A[1] D[2] D[3] D[0] D[1] B[4] B[5]
D[3] H A[0] A[1] D[2] D[3] D[0] D[1] B[4] B[5]
D[4] M D[4] D[5] D[2] D[3] D[0] D[1] B[4] B[5]
C[3] M D[4] D[5] D[2] D[3] D[0] D[1] C[2] C[3]
C[2] H D[4] D[5] D[2] D[3] D[0] D[1] C[2] C[3]
C[4] M D[4] D[5] C[4] C[5] D[0] D[1] C[2] C[3]
C[2] H D[4] D[5] C[4] C[5] D[0] D[1] C[2] C[3]

12
Question 8 [20 Marks]

In this exercise, we will examine how replacement schemes affect miss rates. Assume a two-way
set associative cache with four one-word blocks. Consider the following word address sequence:
0, 1, 2, 3, 4, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0.

a) [05 Marks] Assuming an LRU replacement scheme, which accesses are hits?

Word Cache
Hit/Miss Cache Index
Address Index
Set 0 Set 1
Block 0 Block 1 Block 0 Block 1
0 0 M MEM[0]
1 1 M MEM[0] MEM[1]
2 0 M MEM[0] MEM[2] MEM[1]
3 1 M MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[4] MEM[2] MEM[1] MEM[3]
2 0 H MEM[4] MEM[2] MEM[1] MEM[3]
3 1 H MEM[4] MEM[2] MEM[1] MEM[3]
4 0 H MEM[4] MEM[2] MEM[1] MEM[3]
5 1 M MEM[4] MEM[2] MEM[5] MEM[3]
6 0 M MEM[4] MEM[6] MEM[5] MEM[3]
7 1 M MEM[4] MEM[6] MEM[5] MEM[7]
0 0 M MEM[0] MEM[6] MEM[5] MEM[7]
1 1 M MEM[0] MEM[6] MEM[1] MEM[7]
2 0 M MEM[0] MEM[2] MEM[1] MEM[7]
3 1 M MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[4] MEM[2] MEM[1] MEM[3]
5 1 M MEM[4] MEM[2] MEM[5] MEM[3]
6 0 M MEM[4] MEM[6] MEM[5] MEM[3]
7 1 M MEM[4] MEM[6] MEM[5] MEM[7]
0 0 M MEM[0] MEM[6] MEM[5] MEM[7]

Block Address Cache Set


0 (0 modulo 2) = 0
1 (1 modulo 2) = 1
2 (2 modulo 2) = 0
3 (3 modulo 2) = 1
4 (4 modulo 2) = 0
5 (5 modulo 2) = 1
6 (6 modulo 2) = 0
7 (7 modulo 2) = 1

Hit rate = number of hits / number of accesses = 3/20 = 0.15 = 15%


Miss rate = number of misses / number of accesses = 17/20 = 0.85 = 85%

13
b) [05 Marks] Most Recently Used (MRU) is a cache replacement scheme which removes
the most recently used items first. A MRU scheme is good in situations in which the older
an item is, the more likely it is to be accessed. Assuming an MRU (most recently used)
replacement scheme, which accesses are hits?

Word Cache
Hit/Miss Cache Index
Address Index
Set 0 Set 1
Block 0 Block 1 Block 0 Block 1
0 0 M MEM[0]
1 1 M MEM[0] MEM[1]
2 0 M MEM[0] MEM[2] MEM[1]
3 1 M MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[0] MEM[4] MEM[1] MEM[3]
2 0 M MEM[0] MEM[2] MEM[1] MEM[3]
3 1 H MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[0] MEM[4] MEM[1] MEM[3]
5 1 M MEM[0] MEM[4] MEM[1] MEM[5]
6 0 M MEM[0] MEM[6] MEM[1] MEM[5]
7 1 M MEM[0] MEM[6] MEM[1] MEM[7]
0 0 H MEM[0] MEM[6] MEM[1] MEM[7]
1 1 H MEM[0] MEM[6] MEM[1] MEM[7]
2 0 M MEM[2] MEM[6] MEM[1] MEM[7]
3 1 M MEM[2] MEM[6] MEM[3] MEM[7]
4 0 M MEM[4] MEM[6] MEM[3] MEM[7]
5 1 M MEM[4] MEM[6] MEM[5] MEM[7]
6 0 H MEM[4] MEM[6] MEM[5] MEM[7]
7 1 H MEM[4] MEM[6] MEM[5] MEM[7]
0 0 M MEM[4] MEM[0] MEM[5] MEM[7]

Hit rate = number of hits / number of accesses = 5/20 = 0.25 = 25%


Miss rate = number of misses / number of accesses = 15/20 = 0.75 = 75%

c) [05 Marks] Describe an optimal replacement scheme for this sequence. Which accesses
are hits using this policy?

By comparing the Hit rates of LRU and MRU schemes, we can say that MRU is optimal
replacement scheme for the given sequence.

d) [05 Marks] Describe why it is difficult to implement a cache replacement scheme that is
optimal for all address sequences.

The best block to evict is the one that will cause the fewest misses in the future.
Unfortunately, a cache controller cannot know the future! Our best alternative is to make
a good prediction.

14

You might also like