Homework4 v2 Solution
Homework4 v2 Solution
Computer Architecture
EE 371 / CS 330 / CE 321 (Spring 2023)
Homework 4
Total marks: 100 Marks obtained:
Purpose:
The purpose of this assignment is to help you in understanding the performance of cache operations.
Instructions:
1. This assignment should be done in pairs.
2. All questions should be answered in black ink only.
3. Scan your answer sheet and upload it on HU LMS before the due date.
Grading Criteria:
1. Your assignments will be checked by instructor/TA.
2. You can also be asked to give a viva where you will be judged whether you understood the question
yourself or not. If you are unable to answer correctly to the question you have attempted right, you may
lose your marks.
3. Zero will be given if the assignment is found to be plagiarized.
4. Untidy work will result in reduction of your points.
Submission policy:
1. No submission will be accepted after the instructor releases the solution on HU LMS.
CLO Assessment:
This assignment assesses students for the following course learning outcomes.
CLO
Course Learning Outcomes
Assessed
Explain the role of ISA in modern processors and instruction encodings and assembly
CLO 1
language programming
CLO 2 Explain the architecture and working of a single cycle processor
CLO 3 Design the architecture to mitigate issues of a pipelined processor
CLO 4 Analyze the performance of cache operations ✓
1
Question 1 [10 Marks]
(a) [05 Marks] For each of these references, identify the binary word address, the tag, and the
index given a direct-mapped cache with 16 one-word blocks. Also list whether each
reference is a hit or a miss, assuming the cache is initially empty. First insertion is already
done so that you may get the idea.
2
(b) [05 Marks] For each of these references, identify the binary word address, the tag, the
index, and the offset given a direct-mapped cache with two-word blocks and a total size of
eight blocks. Also list if each reference is a hit or a miss, assuming the cache is initially
empty.
Word Binary
Tag Index Offset Hit/Miss
Address Address
0x03 0000 0011 0 1 1 M
0xb4 1011 0100 b 2 0 M
0x2b 0010 1011 2 5 1 M
0x02 0000 0010 0 1 0 H
0xbf 1011 1111 b 7 1 M
0x58 0101 1000 5 4 0 M
0xbe 1011 1110 b 7 0 H
0x0e 0000 1110 0 7 0 M
0xb5 1011 0101 b 2 1 H
0x2c 0010 1100 2 6 0 M
0xba 1011 1010 b 5 0 M
0xfd 1111 1101 f 6 1 M
3
Question 2 [10 Marks]
For a direct-mapped cache design with a 64-bit address, the following bits of the address are
used to access the cache.
Tag Index Offset
63-10 9-5 4-0
Offset = 5 bits
Index = 5 bits
Tag = 54 bits
There are 5 bits for index which means there are 25 blocks = 32 blocks.
c) [04 Marks] What is the ratio between total bits required for such a cache implementation
over the data storage bits?
4
Question 3 [15 Marks]
Considering the address size of 64-bits, fill in the data for difference types of caches:
5
Question 4 [05 Marks]
Assume the miss rate of an instruction cache is 4% and the miss rate of the data cache is 6%. If a
processor has a CPI of 3 without any memory stalls, and the miss penalty is 100 cycles for all
misses, determine how much faster a processor would run with a perfect cache that never missed.
Assume the frequency of all loads and stores is 26%.
The number of memory miss cycles for instructions in terms of the Instruction count (I) is
Instruction miss cycles = I * 4% * 100 = I * 4
As the frequency of all loads and stores is 26%, we can find the number of memory miss cycles
for data references:
6
Question 5 [10 Marks]
We are given 4 arrays of size 6. Each element in an array is of 32 bits i.e., one word. Following is
the data stored in the array:
00000 A[0]
00001 A[1]
00010 A[2]
00011 A[3]
00100 A[4]
00101 A[5]
00110
00111
01000 B[0]
01001 B[1]
01010 B[2]
01011 B[3]
01100 B[4]
01101 B[5]
01110
01111
10000 C[0]
10001 C[1]
10010 C[2]
10011 C[3]
10100 C[4]
10101 C[5]
10110
10111
11000 D[0]
11001 D[1]
11010 D[2]
11011 D[3]
11100 D[4]
11101 D[5]
11110
11111
7
We are given a direct mapped cache which contains 8 blocks (each block will contain one word).
Insert the following elements in cache one by one and also mention whether it was a hit or a miss.
Assume that the first block of the cache will be populated by the first element of the array and so
on. First insertion is already done so that you may get the idea.
Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3 4 5 6 7
A[0] M A[0]
A[1] M A[0] A[1]
A[2] M A[0] A[1] A[2]
A[1] H A[0] A[1] A[2]
A[5] M A[0] A[1] A[2] A[5]
B[5] M A[0] A[1] A[2] B[5]
B[4] M A[0] A[1] A[2] B[4] B[5]
B[3] M A[0] A[1] A[2] B[3] B[4] B[5]
B[3] H A[0] A[1] A[2] B[3] B[4] B[5]
B[4] H A[0] A[1] A[2] B[3] B[4] B[5]
D[1] M A[0] D[1] A[2] B[3] B[4] B[5]
D[2] M A[0] D[1] D[2] B[3] B[4] B[5]
D[3] M A[0] D[1] D[2] D[3] B[4] B[5]
D[4] M A[0] D[1] D[2] D[3] D[4] B[5]
C[3] M A[0] D[1] D[2] C[3] D[4] B[5]
C[2] M A[0] D[1] C[2] C[3] D[4] B[5]
C[4] M A[0] D[1] C[2] C[3] C[4] B[5]
C[2] H A[0] D[1] C[2] C[3] C[4] B[5]
What is the Hit Ratio and the Miss Ratio in the above case?
8
Question 6 [10 Marks]
Whenever an element from an array is accessed, it is most probable that some other remaining
elements of the array are also accessed. Repeat the same task as in Question 5 but this time design
a cache with 4 blocks in which each block can accommodate 2 words. The first insertion is done
again so that you may get the idea.
Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3
A[0] M A[0] A[1]
A[1] H A[0] A[1]
A[2] M A[0] A[1] A[2] A[3]
A[1] H A[0] A[1] A[2] A[3]
A[5] M A[0] A[1] A[2] A[3] A[4] A[5]
B[5] M A[0] A[1] A[2] A[3] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] B[4] B[5]
B[3] M A[0] A[1] B[2] B[3] B[4] B[5]
B[3] H A[0] A[1] B[2] B[3] B[4] B[5]
B[4] H A[0] A[1] B[2] B[3] B[4] B[5]
D[1] M D[0] D[1] B[2] B[3] B[4] B[5]
D[2] M D[0] D[1] D[2] D[3] B[4] B[5]
D[3] H D[0] D[1] D[2] D[3] B[4] B[5]
D[4] M D[0] D[1] D[2] D[3] D[4] D[5]
C[3] M D[0] D[1] C[2] C[3] D[4] D[5]
C[2] H D[0] D[1] C[2] C[3] D[4] D[5]
C[4] M D[0] D[1] C[2] C[3] C[4] C[5]
C[2] H D[0] D[1] C[2] C[3] C[4] C[5]
What is the Hit Ratio and the Miss Ratio in this case? Is it better than the previous? Does loading
the two array elements into the cache helps us in accessing the elements fast?
The hit rate has improved in this case. Yes, loading two array elements into the cache helps in
faster access time.
Loading the entire array into cache will improve the spatial locality.
9
Question 7 [20 Marks]
a) [05 Marks] Repeat Question 5, this time using a fully associative cache containing 8
blocks. Use LRU replacement scheme for eviction.
Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3 4 5 6 7
A[0] M A[0]
A[1] M A[0] A[1]
A[2] M A[0] A[1] A[2]
A[1] H A[0] A[1] A[2]
A[5] M A[0] A[1] A[2] A[5]
B[5] M A[0] A[1] A[2] A[5] B[5]
B[4] M A[0] A[1] A[2] A[5] B[5] B[4]
B[3] M A[0] A[1] A[2] A[5] B[5] B[4] B[3]
B[3] H A[0] A[1] A[2] A[5] B[5] B[4] B[3]
B[4] H A[0] A[1] A[2] A[5] B[5] B[4] B[3]
D[1] M A[0] A[1] A[2] A[5] B[5] B[4] B[3] D[1]
D[2] M D[2] A[1] A[2] A[5] B[5] B[4] B[3] D[1]
D[3] M D[2] A[1] D[3] A[5] B[5] B[4] B[3] D[1]
D[4] M D[2] D[4] D[3] A[5] B[5] B[4] B[3] D[1]
C[3] M D[2] D[4] D[3] C[3] B[5] B[4] B[3] D[1]
C[2] M D[2] D[4] D[3] C[3] C[2] B[4] B[3] D[1]
C[4] M D[2] D[4] D[3] C[3] C[2] B[4] C[4] D[1]
C[2] H D[2] D[4] D[3] C[3] C[2] B[4] C[4] D[1]
10
b) [05 Marks] Repeat Question 6, this time using a fully associative cache containing 4 blocks
in which each block can accommodate 2 words. Use LRU replacement scheme for eviction.
Data to
be Hit/Miss Cache Index
Inserted
0 1 2 3
A[0] M A[0] A[1]
A[1] H A[0] A[1]
A[2] M A[0] A[1] A[2] A[3]
A[1] H A[0] A[1] A[2] A[3]
A[5] M A[0] A[1] A[2] A[3] A[4] A[5]
B[5] M A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[3] M A[0] A[1] B[2] B[3] A[4] A[5] B[4] B[5]
B[3] H A[0] A[1] B[2] B[3] A[4] A[5] B[4] B[5]
B[4] H A[0] A[1] B[2] B[3] A[4] A[5] B[4] B[5]
D[1] M D[0] D[1] B[2] B[3] A[4] A[5] B[4] B[5]
D[2] M D[0] D[1] B[2] B[3] D[2] D[3] B[4] B[5]
D[3] H D[0] D[1] B[2] B[3] D[2] D[3] B[4] B[5]
D[4] M D[0] D[1] D[4] D[5] D[2] D[3] B[4] B[5]
C[3] M D[0] D[1] D[4] D[5] D[2] D[3] C[2] C[3]
C[2] H D[0] D[1] D[4] D[5] D[2] D[3] C[2] C[3]
C[4] M C[4] C[5] D[4] D[5] D[2] D[3] C[2] C[3]
C[2] H C[4] C[5] D[4] D[5] D[2] D[3] C[2] C[3]
c) [05 Marks] Compare the Hit & Miss Ratio for both the cases (i.e., loading a single element
vs. loading two elements) for both the caches. Which cache helps us in accessing elements
faster? and in which case? [Hint: To answer which cache helps us in accessing the elements
faster, compare the Hit Ratios for both the caches]
Fully associative cache with two words per block, loading two array elements is faster.
11
d) [05 Marks] Repeat Question 6, this time using a two-way set associative cache with 2 sets
of 2 blocks. Use LRU replacement scheme for eviction.
Data to
be Hit/Miss Cache Index
Inserted
Set 0 Set 1
Block 0 Block 1 Block 0 Block 1
A[0] M A[0] A[1]
A[1] H A[0] A[1]
A[2] M A[0] A[1] A[2] A[3]
A[1] H A[0] A[1] A[2] A[3]
A[5] M A[0] A[1] A[2] A[3] A[4] A[5]
B[5] M A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] A[4] A[5] B[4] B[5]
B[3] M A[0] A[1] A[2] A[3] B[2] B[3] B[4] B[5]
B[3] H A[0] A[1] A[2] A[3] B[2] B[3] B[4] B[5]
B[4] H A[0] A[1] A[2] A[3] B[2] B[3] B[4] B[5]
D[1] M A[0] A[1] A[2] A[3] D[0] D[1] B[4] B[5]
D[2] M A[0] A[1] D[2] D[3] D[0] D[1] B[4] B[5]
D[3] H A[0] A[1] D[2] D[3] D[0] D[1] B[4] B[5]
D[4] M D[4] D[5] D[2] D[3] D[0] D[1] B[4] B[5]
C[3] M D[4] D[5] D[2] D[3] D[0] D[1] C[2] C[3]
C[2] H D[4] D[5] D[2] D[3] D[0] D[1] C[2] C[3]
C[4] M D[4] D[5] C[4] C[5] D[0] D[1] C[2] C[3]
C[2] H D[4] D[5] C[4] C[5] D[0] D[1] C[2] C[3]
12
Question 8 [20 Marks]
In this exercise, we will examine how replacement schemes affect miss rates. Assume a two-way
set associative cache with four one-word blocks. Consider the following word address sequence:
0, 1, 2, 3, 4, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 7, 0.
a) [05 Marks] Assuming an LRU replacement scheme, which accesses are hits?
Word Cache
Hit/Miss Cache Index
Address Index
Set 0 Set 1
Block 0 Block 1 Block 0 Block 1
0 0 M MEM[0]
1 1 M MEM[0] MEM[1]
2 0 M MEM[0] MEM[2] MEM[1]
3 1 M MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[4] MEM[2] MEM[1] MEM[3]
2 0 H MEM[4] MEM[2] MEM[1] MEM[3]
3 1 H MEM[4] MEM[2] MEM[1] MEM[3]
4 0 H MEM[4] MEM[2] MEM[1] MEM[3]
5 1 M MEM[4] MEM[2] MEM[5] MEM[3]
6 0 M MEM[4] MEM[6] MEM[5] MEM[3]
7 1 M MEM[4] MEM[6] MEM[5] MEM[7]
0 0 M MEM[0] MEM[6] MEM[5] MEM[7]
1 1 M MEM[0] MEM[6] MEM[1] MEM[7]
2 0 M MEM[0] MEM[2] MEM[1] MEM[7]
3 1 M MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[4] MEM[2] MEM[1] MEM[3]
5 1 M MEM[4] MEM[2] MEM[5] MEM[3]
6 0 M MEM[4] MEM[6] MEM[5] MEM[3]
7 1 M MEM[4] MEM[6] MEM[5] MEM[7]
0 0 M MEM[0] MEM[6] MEM[5] MEM[7]
13
b) [05 Marks] Most Recently Used (MRU) is a cache replacement scheme which removes
the most recently used items first. A MRU scheme is good in situations in which the older
an item is, the more likely it is to be accessed. Assuming an MRU (most recently used)
replacement scheme, which accesses are hits?
Word Cache
Hit/Miss Cache Index
Address Index
Set 0 Set 1
Block 0 Block 1 Block 0 Block 1
0 0 M MEM[0]
1 1 M MEM[0] MEM[1]
2 0 M MEM[0] MEM[2] MEM[1]
3 1 M MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[0] MEM[4] MEM[1] MEM[3]
2 0 M MEM[0] MEM[2] MEM[1] MEM[3]
3 1 H MEM[0] MEM[2] MEM[1] MEM[3]
4 0 M MEM[0] MEM[4] MEM[1] MEM[3]
5 1 M MEM[0] MEM[4] MEM[1] MEM[5]
6 0 M MEM[0] MEM[6] MEM[1] MEM[5]
7 1 M MEM[0] MEM[6] MEM[1] MEM[7]
0 0 H MEM[0] MEM[6] MEM[1] MEM[7]
1 1 H MEM[0] MEM[6] MEM[1] MEM[7]
2 0 M MEM[2] MEM[6] MEM[1] MEM[7]
3 1 M MEM[2] MEM[6] MEM[3] MEM[7]
4 0 M MEM[4] MEM[6] MEM[3] MEM[7]
5 1 M MEM[4] MEM[6] MEM[5] MEM[7]
6 0 H MEM[4] MEM[6] MEM[5] MEM[7]
7 1 H MEM[4] MEM[6] MEM[5] MEM[7]
0 0 M MEM[4] MEM[0] MEM[5] MEM[7]
c) [05 Marks] Describe an optimal replacement scheme for this sequence. Which accesses
are hits using this policy?
By comparing the Hit rates of LRU and MRU schemes, we can say that MRU is optimal
replacement scheme for the given sequence.
d) [05 Marks] Describe why it is difficult to implement a cache replacement scheme that is
optimal for all address sequences.
The best block to evict is the one that will cause the fewest misses in the future.
Unfortunately, a cache controller cannot know the future! Our best alternative is to make
a good prediction.
14