0% found this document useful (0 votes)
21 views30 pages

Unit_4_COA_MemorySystem-1

The document outlines the Memory Hierarchy in computer organization, detailing its types, characteristics, and design principles. It explains the importance of memory interleaving, cache memory, and the advantages and disadvantages of different memory types. Additionally, it covers cache performance metrics such as hit and miss ratios, and various cache mapping techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

Unit_4_COA_MemorySystem-1

The document outlines the Memory Hierarchy in computer organization, detailing its types, characteristics, and design principles. It explains the importance of memory interleaving, cache memory, and the advantages and disadvantages of different memory types. Additionally, it covers cache performance metrics such as hit and miss ratios, and various cache mapping techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT-3

Study Materials
B.Tech IV-Semester School of Technology Academic Year: 2024-25

Computer Organization and Architecture

Unit IV: Syllabus


The Memory System: Memory systems hierarchy-Main memory organization-Types of Main
memory-memory inter- leaving and its characteristics and performance- Cache memories: address
mapping-line size- replacement and policies- coherence- Virtual memory systems- TLBReliability
of memory systems- error detecting and error correcting systems.

1 Memory Hierarchy Design and its Characteristics


In the Computer System Design, Memory Hierarchy is an enhancement to organize the memory
such that it can minimize the access time. The Memory Hierarchy was developed based on
a program behavior known as locality of references (same data or nearby data is likely to be
accessed again and again). The figure below clearly demonstrates the different levels of the memory
hierarchy.

1.1 Why Memory Hierarchy is Required in the System?


Memory Hierarchy helps in optimizing the memory available in the computer. There are multiple
levels present in the memory, each one having a different size, different cost, etc. Some types of
memory like cache, and main memory are faster as compared to other types of memory but they
are having a little less size and are also costly whereas some memory has a little higher storage
value, but they are a little slower. Accessing of data is not similar in all types of memory, some
have faster access whereas some have slower access.

2 Types of Memory Hierarchy


This Memory Hierarchy Design is divided into 2 main types:

• External Memory or Secondary Memory: Comprising of Magnetic Disk, Optical Disk,


and Magnetic Tape i.e. peripheral storage devices which are accessible by the processor via
an I/O Module.

• Internal Memory or Primary Memory: Comprising of Main Memory, Cache Memory


& CPU registers. This is directly accessible by the processor.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 2 of 30

3 Memory Hierarchy Design


1. Registers:
Registers are small, high-speed memory units located in the CPU. They are used to store
the most frequently used data and instructions. Registers have the fastest access time and
the smallest storage capacity, typically ranging from 16 to 64 bits.

2. Cache Memory:
Cache memory is a small, fast memory unit located close to the CPU. It stores frequently
used data and instructions that have been recently accessed from the main memory. Cache
memory is designed to minimize the time it takes to access data by providing the CPU with
quick access to frequently used data.

3. Main Memory:
Main memory, also known as RAM (Random Access Memory), is the primary memory of a
computer system. It has a larger storage capacity than cache memory, but it is slower. Main
memory is used to store data and instructions that are currently in use by the CPU.

Types of Main Memory

• Static RAM: Static RAM stores the binary information in flip flops and information
remains valid until power is supplied. Static RAM has a faster access time and is used
in implementing cache memory.

• Dynamic RAM: It stores the binary information as a charge on the capacitor. It


requires refreshing circuitry to maintain the charge on the capacitors after a few milliseconds.
It contains more memory cells per unit area as compared to SRAM.

4. Secondary Storage:
Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD) , is a non-
volatile memory unit that has a larger storage capacity than main memory. It is used to
store data and instructions that are not currently in use by the CPU. Secondary storage has
the slowest access time and is typically the least expensive type of memory in the memory
hierarchy.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 3 of 30

5. Magnetic Disk:
Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic
or a magnetized material. The Magnetic disks work at a high speed inside the computer and
these are frequently used.

6. Magnetic Tape:
Magnetic Tape is simply a magnetic recording device that is covered with a plastic film.
Magnetic Tape is generally used for the backup of data. In the case of a magnetic tape, the
access time for a computer is a little slower and therefore, it requires some amount of time
for accessing the strip.

4 Characteristics of Memory Hierarchy


• Capacity: It is the global volume of information the memory can store. As we move from
top to bottom in the Hierarchy, the capacity increases.

• Access Time: It is the time interval between the read/write request and the availability of
the data. As we move from top to bottom in the Hierarchy, the access time increases.

• Performance: The Memory Hierarch design ensures that frequently accessed data is stored
in faster memory to improve system performance.

• Cost Per Bit: As we move from bottom to top in the Hierarchy, the cost per bit increases
i.e. Internal Memory is costlier than External Memory.

5 System-Supported Memory Standards


According to the memory Hierarchy, the system-supported memory standards are defined below:

Advantages of Memory Hierarchy

• Performance: Frequently used data is stored in faster memory (like cache), reducing access
time and improving overall system performance.

• Cost Efficiency: By combining small, fast memory (like registers and cache) with larger,
slower memory (like RAM and HDD), the system achieves a balance between cost and
performance. It saves the consumer’s price and time.

• Optimized Resource Utilization: Combines the benefits of small, fast memory and large,
cost-effective storage to maximize system performance.

• Efficient Data Management: Frequently accessed data is kept closer to the CPU, while
less frequently used data is stored in larger, slower memory, ensuring efficient data handling.

Disadvantages of Memory Hierarchy

• Complex Design: Managing and coordinating data across different levels of the hierarchy
adds complexity to the system’s design and operation.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 4 of 30

• Cost: Faster memory components like registers and cache are expensive, limiting their size
and increasing the overall cost of the system.

• Latency: Accessing data stored in slower memory (like secondary or tertiary storage)
increases the latency and reduces system performance.

• Maintenance Overhead: Managing and maintaining different types of memory adds


overhead in terms of hardware and software.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 5 of 30

6 Memory Interleaving
Memory Interleaving is less or More an Abstraction technique. Though it’s a bit different from
Abstraction. It is a Technique that divides memory into a number of modules such that Successive
words in the address space are placed in the Different modules.

Consecutive Word in a Module:

Figure 1: Consecutive Word in a Module

Let us assume 16 Data’s to be Transferred to the Four Module. Where Module 00 be Module 1,
Module 01 be Module 2, Module 10 be Module 3 & Module 11 be Module 4. Also, 10, 20, 30. . . .130
are the data to be transferred.

From the figure above in Module 1, 10 [Data] is transferred then 20, 30 & finally, 40 which are the
Data. That means the data are added consecutively in the Module till its max capacity.

Most significant bit (MSB) provides the Address of the Module & the least significant bit (LSB)
provides the address of the data in the module.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 6 of 30

For Example, to get 90 (Data) 1000 will be provided by the processor. This 10 will indicate that
the data is in module 10 (module 3) & 00 is the address of 90 in Module 10 (module 3). So,

Module 1 Contains Data : 10, 20, 30, 40


Module 2 Contains Data : 50, 60, 70, 80
Module 3 Contains Data : 90, 100, 110, 120
Module 4 Contains Data : 130, 140, 150, 160

Consecutive Word in Consecutive Module:

Figure 2: Consecutive Word in Consecutive Module

Now again we assume 16 Data’s to be transferred to the Four Module. But Now the consecutive
Data are added in Consecutive Module. That is, 10 [Data] is added in Module 1, 20 [Data] in
Module 2 and So on.

Least Significant Bit (LSB) provides the Address of the Module & Most significant bit (MSB)
provides the address of the data in the module.

For Example, to get 90 (Data) 1000 will be provided by the processor. This 00 will indicate that

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 7 of 30

the data is in module 00 (module 1) & 10 is the address of 90 in Module 00 (module 1). That is,

Module 1 Contains Data : 10, 50, 90, 130


Module 2 Contains Data : 20, 60, 100, 140
Module 3 Contains Data : 30, 70, 110, 150
Module 4 Contains Data : 40, 80, 120, 160

Why do we use Memory Interleaving? [Advantages]:


Whenever Processor requests Data from the main memory. A block (chunk) of Data is Transferred
to the cache and then to Processor. So whenever a cache miss occurs the Data is to be fetched
from the main memory. But main memory is relatively slower than the cache. So to improve the
access time of the main memory interleaving is used.

We can access all four Modules at the same time thus achieving Parallelism. From Figure 2 the
data can be acquired from the Module using the Higher bits. This method Uses memory effectively.

7 Cache Memory in Computer Organization


Cache memory is a small, high-speed storage area in a computer. The cache is a smaller and faster
memory that stores copies of the data from frequently used main memory locations. There are
various independent caches in a CPU, which store instructions and data.

• The most important use of cache memory is that it is used to reduce the average time to
access data from the main memory.

• The concept of cache works because there exists locality of reference (the same items or
nearby items are more likely to be accessed next) in processes.

By storing this information closer to the CPU, cache memory helps speed up the overall processing
time. Cache memory is much faster than the main memory (RAM). When the CPU needs data,
it first checks the cache. If the data is there, the CPU can access it quickly. If not, it must fetch
the data from the slower main memory.

7.1 Characteristics of Cache Memory

• Extremely fast memory type that acts as a buffer between RAM and the CPU.

• Holds frequently requested data and instructions, ensuring that they are immediately available
to the CPU when needed.

• Costlier than main memory or disk memory but more economical than CPU registers.

• Used to speed up processing and synchronize with the high-speed CPU.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 8 of 30

7.2 Levels of Memory:


• Level 1 or Register: It is a type of memory in which data is stored and accepted that are
immediately stored in the CPU. The most commonly used register is Accumulator, Program
counter , Address Register, etc.

• Level 2 or Cache memory: It is the fastest memory that has faster access time where
data is temporarily stored for faster access.

• Level 3 or Main Memory: It is the memory on which the computer works currently. It
is small in size and once power is off data no longer stays in this memory.

• Level 4 or Secondary Memory: It is external memory that is not as fast as the main
memory but data stays permanently in this memory.

7.3 Cache Performance


When the processor needs to read or write a location in the main memory, it first checks for a
corresponding entry in the cache.

• If the processor finds that the memory location is in the cache, a Cache Hit has occurred and
data is read from the cache.

• If the processor does not find the memory location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry and copies in data from the main memory,
then the request is fulfilled from the contents of the cache.

The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.

• Hit Ratio(H) = hit / (hit + miss) = no. of hits/total accesses

• Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 - hit ratio(H)

We can improve Cache performance using higher cache block size, and higher associativity, reduce
miss rate, reduce miss penalty, and reduce the time to hit in the cache.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 9 of 30

Problem 1: Cache Hit and Miss Ratio


A processor accesses memory 1000 times, out of which 850 accesses are hits. Calculate:

1. Cache hit ratio

2. Cache miss ratio

Solution:

• Cache Hit Ratio = (Number of Cache Hits) / (Total Memory Accesses)


850
Cache Hit Ratio = = 0.85 or 85%
1000

• Cache Miss Ratio = (Number of Cache Misses) / (Total Memory Accesses)


1000 − 850
Cache Miss Ratio = = 0.15 or 15%
1000

Problem 2: Average Memory Access Time (AMAT)


Consider

• Cache Hit Time = 2 ns

• Main Memory Access Time = 50 ns

• Hit Ratio = 90%

Calculate the average memory access time (AMAT).

Solution:
We know that AMAT = Hit Time + (Miss Ratio × Miss Penalty)

• Miss Ratio = 1 - Hit Ratio = 1 - 0.90 = 0.10

• Miss Penalty = Main Memory Access Time = 50 ns

AM AT = 2 + (0.10 × 50) = 2 + 5 = 7 ns

Advantages:

• Cache Memory is faster in comparison to main memory and secondary memory.

• Programs stored by Cache Memory can be executed in less time.

• The data access time of Cache Memory is less than that of the main memory.

• Cache Memory stored data and instructions that are regularly used by the CPU, therefore
it increases the performance of the CPU.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 10 of 30

Disadvantages:

• Cache Memory is costlier than primary memory and secondary memory .

• Data is stored on a temporary basis in Cache Memory.

• Whenever the system is turned off, data and instructions stored in cache memory get destroyed.

• The high cost of cache memory increases the price of the Computer System.

8 Cache Mapping
Cache mapping refers to the method used to store data from main memory into the cache. It
determines how data from memory is mapped to specific locations in the cache.

There are three different types of mapping used for the purpose of cache memory which is as
follows:

• Direct Mapping

• Fully Associative Mapping

• Set-Associative Mapping

8.1 Direct Mapping


Direct mapping is a simple and commonly used cache mapping technique where each block of main
memory is mapped to exactly one location in the cache called cache line. If two memory blocks
map to the same cache line, one will overwrite the other, leading to potential cache misses. Direct
mapping’s performance is directly proportional to the Hit ratio.

Memory block is assigned to cache line using the formula below:

i = j modulo m = j % m
where,
i = cache line number
j = main memory block number
m = number of lines in the cache

For example, consider a memory with 8 blocks(j) and a cache with 4 lines(m). Using direct
mapping, block 0 of memory might be stored in cache line 0, block 1 in line 1, block 2 in line 2,
and block 3 in line 3. If block 4 of memory is accessed, it would be mapped to cache line 0 (as i
= j modulo m i.e. i = 4 % 4 = 0), replacing memory block 0.

The Main Memory consists of memory blocks and these blocks are made up of fixed number of
words. A typical address in main memory is split into two parts:

1. Index Field: It represent the block number. Index Field bits tells us the location of block
where a word can be.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 11 of 30

2. Block Offset: It represent words in a memory block. These bits determines the location of
word in a memory block.

The Cache Memory consists of cache lines. These cache lines has same size as memory blocks. The
address in cache memory consists of:

1. Block Offset: This is the same block offset we use in Main Memory.

2. Index: It represent cache line number. This part of the memory address determines which
cache line (or slot) the data will be placed in.

3. Tag: The Tag is the remaining part of the address that uniquely identifies which block is
currently occupying the cache line.

The index field in main memory maps directly to the index in cache memory, which determines the
cache line where the block will be stored. The block offset in both main memory and cache memory
indicates the exact word within the block. In the cache, the tag identifies which memory block
is currently stored in the cache line. This mapping ensures that each memory block is mapped
to exactly one cache line, and the data is accessed using the tag and index while the block offset
specifies the exact word in the block.

Advantages of Direct-mapping:

• It requires very less hardware complexity.

• It is a very simple method of implementation.

• Access time is very low.

Disadvantages of Direct-mapping:

• Cache performance is unpredictable in direct mapping.

• Handling of spatial locality is poor.

• Use of cache space is inefficient.

• Conflict misses are high.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 12 of 30

Problem 3: Direct-Mapped Cache Addressing


A 64 KB cache uses 16-byte blocks in a direct-mapped cache. The system has a 32-bit address.
Find:

1. Number of Cache Blocks

2. Number of Index Bits

3. Number of Tag Bits

Solution:

1. Number of Cache Blocks:

Total Cache Size 94 × 1024


Number of Cache Blocks = = = 4096 Blocks
Block Size 16

2. Index Bits = log2 (Number of Blocks) = log2 (4096) = 12 bits

3. Offset Bits = log2 (Block Size) = log2 (16) = 4 bits

4. Tag Bits = Total Address Bits - (Index Bits + Offset Bits)

Tag Bits = 32 − (12 + 4) = 16 bits

Problem 4: Direct-Mapped Cache with Conflict Misses


A computer system uses a direct-mapped cache with the following parameters:

• Cache Size: 16 KB

• Block Size: 128 bytes

• Addressing: Memory addresses are given in decimal.

A program accesses the following memory addresses (in decimal):

{1024, 2048, 3072, 4096, 5120, 6144, 7168, 8192}

Tasks:

1. Compute the index for each memory address.

2. Determine if any conflict misses occur in the direct-mapped cache.

Solution:

• Cache Index Calculation


A direct-mapped cache determines the cache block using:

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 13 of 30

Memory Address Cache Size



Index = mod (1)
Block Size Block Size
– Cache Size = 16 × 1024 = 16384 bytes

– Block Size = 128 bytes

– Number of Cache Blocks = 16384


128 = 128

• Computing the Cache Index for Each Address:

Memory Address (Decimal) Block Number Cache Index


1024 1024/128 = 8 8 mod 128 = 8
2048 2048/128 = 16 16 mod 128 = 16
3072 3072/128 = 24 24 mod 128 = 24
4096 4096/128 = 32 32 mod 128 = 32
5120 5120/128 = 40 40 mod 128 = 40
6144 6144/128 = 48 48 mod 128 = 48
7168 7168/128 = 56 56 mod 128 = 56
8192 8192/128 = 64 64 mod 128 = 64

• Conflict Miss Analysis:

– Each memory address maps to a different cache index.

– Since there are no multiple addresses mapping to the same index, no conflict misses
occur.

– Each memory block has a unique location in the cache, so all accesses result in compulsory
(cold) misses, but not conflict misses.

• Conclusion: No conflict misses occur because all addresses map to different cache indices.

8.2 Associative / Fully Associative Mapping


Fully associative mapping is a type of cache mapping where any block of main memory can be
stored in any cache line. Unlike direct-mapped cache, where each memory block is restricted to a
specific cache line based on its index, fully associative mapping gives the cache the flexibility to
place a memory block in any available cache line. This improves the hit ratio but requires a more
complex system for searching and managing cache lines.

The address structure of Cache Memory is different in fully associative mapping from direct
mapping. In fully associative mapping, the cache does not have an index field. It only have a
tag which is same as Index Field in memory address. Any block of memory can be placed in any
cache line. This flexibility means that there’s no fixed position for memory blocks in the cache.

To determine whether a block is present in the cache, the tag is compared with the tags stored in
all cache lines. If a match is found, it is a cache hit, and the data is retrieved from that cache line.
If no match is found, it’s a cache miss, and the required data is fetched from main memory.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 14 of 30

Advantages of Associative Mapping:

• It is the fastest and most flexible technique.

• It solves the conflict miss problem

Disadvantages of Associative Mapping:

• The comparison time for search a block is very high.

• It is expensive.

Problem 5: Fully Associative Cache Address Calculation


A computer system has a fully associative cache with the following specifications:

• Cache Size: 32 KB

• Block Size: 64 bytes

• Memory Address: 32-bit

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 15 of 30

Determine the following:

1. Number of Blocks in the Cache

2. Offset Bits

3. Tag Bits

Solution:

1. Number of Blocks in Cache: The number of blocks in the cache is given by:

Cache Size
Number of Blocks = (2)
Block Size

Substituting the given values:

32 × 1024 32768
Number of Blocks = = = 512 (3)
64 64

2. Offset Bits Calculation The number of offset bits is determined by the block size:

Offset Bits = log2 (Block Size) = log2 (64) = 6 bits (4)

3. Tag Bits Calculation Since the system uses a 32-bit address and there is no index field in a
fully associative cache, the tag is computed as:

Tag Bits = Total Address Bits − Offset Bits (5)

Tag Bits = 32 − 6 = 26 bits (6)

4. Final Answer

• Number of Blocks in Cache: 512

• Offset Bits: 6 bits

• Tag Bits: 26 bits

Conclusion: In a fully associative cache, the memory address consists of an offset to locate
data within a block and a tag to uniquely identify the memory block.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 16 of 30

8.3 Set-Associative Mapping


Set-associative mapping is a compromise between direct-mapped and fully-associative mapping in
cache systems. It combines the flexibility of fully associative mapping with the efficiency of direct
mapping. In this scheme, multiple cache lines (typically 2, 4, or more) are grouped into sets.

v=m/k
where,
m = number of cache lines in the cache memory
k = number of cache lines we want in each set
v = number of sets

Like direct mapping, now each memory block can be placed into any cache line within a specific
set.

i = j modulo v = j % v
where,
j = main memory block number
v = number of sets
i = cache line set number

The Cache address structure is as follows:

This reduces the conflict misses that occur in direct mapping while still limiting the search space
compared to fully-associative mapping.

For example, consider a 2-way set associative cache, which means 2 cache lines make a set in this
cache structure. There are 8 memory blocks and 4 cache lines, thus the number of sets will be 4/2
= 2 sets. Using direct mapping strategy first, block 0 will be in set 0, block 1 in set 1, block 2
in set 0 and so on. Then, the tag is used to search through all cache lines in that set to find the
correct block (Associative Mapping).

Advantages of Set-Associative Mapping:

• It has highest hit rate.

• Conflict Misses are very few.

Disadvantages of Set-Associative Mapping:

• It is very expensive.

• It is slower than direct mapping.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 17 of 30

8.4 Application of Cache Memory


• Primary Cache: A primary cache is always located on the processor chip. This cache is
small and its access time is comparable to that of processor registers.

• Secondary Cache: Secondary cache is placed between the primary cache and the rest of
the memory. It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed
on the processor chip.

• Spatial Locality of Reference: Spatial Locality of Reference says that there is a chance
that the element will be present in close proximity to the reference point and next time if
again searched then more close proximity to the point of reference.

• Temporal Locality of Reference: Temporal Locality of Reference uses the Least recently
used algorithm will be used. Whenever there is page fault occurs within a word will not only
load the word in the main memory but the complete page fault will be loaded because the
spatial locality of reference rule says that if you are referring to any word next word will be
referred to in its register that’s why we load complete page table so the complete block will
be loaded.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 18 of 30

Problem 6: Set-Associative Cache Access


A 128 KB cache is 4-way set-associative with a 32-byte block size. Determine:

1. Number of Sets

2. Index Bits

3. Tag Bits (for a 32-bit address)

1. Number of Cache Blocks:

Total Cache Size 128 × 1024


Number of Cache Blocks = = = 4096 Blocks
Block Size 32

2. Number of Sets (since it’s 4-way associative)

4096
= 1024 sets
4

3. Index Bits = log2 (Number of Blocks) = log2 (1024) = 10 bits

4. Offset Bits = log2 (Block Size) = log2 (32) = 5 bits

5. Tag Bits = Total Address Bits - (Index Bits + Offset Bits)

Tag Bits = 32 − (10 + 5) = 17 bits

Problem 7: Consider 1MB Cache Memory (CM)and 4GB Main Memory (MM) are partitioned
into 64kB blocks

(i) How many bits are required for physical address if word size is 1B

(ii) How many blocks are there in MM and CM

(iii) How many TAG bits are required for Direct, Associative, and 8-way set associative mapping

(iv) How many TAG memory in bytes are required for each case.

Solution: Given Data

• Cache Memory (CM) Size: 1M B = 220 bytes

• Main Memory (MM) Size: 4GB = 232 bytes

• Block Size: 64KB = 216 bytes

• Word Size: 1B

(i) Physical Address Bits:

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 19 of 30

Since the total main memory size is 4GB = 232 bytes, the number of bits required for
addressing is:
Physical Address Bits = log2 (232 ) = 32 bits (7)

(ii) Number of Blocks in MM and CM

The number of blocks is calculated as:


232
Total Blocks in MM = = 216 = 65536 blocks (8)
216
220
Total Blocks in CM = = 24 = 16 blocks (9)
216

(ii) TAG Bits for Different Mappings:

• Direct Mapping
Index bits = log2 (16) = 4 bits (10)
TAG bits = 32 − (4 + 16) = 12 bits (11)

• Fully Associative Mapping

TAG bits = 32 − 16 = 16 bits (12)

• 8-Way Set Associative Mapping


16
Number of Sets = =2 (13)
8
Index bits = log2 (2) = 1 bit (14)
TAG bits = 32 − (1 + 16) = 15 bits (15)

(iv) TAG Memory in Bytes

• Direct Mapping
192
TAG Memory = 16 × 12 = 192 bits = = 24 bytes (16)
8

• Fully Associative Mapping


256
TAG Memory = 16 × 16 = 256 bits = = 32 bytes (17)
8

• 8-Way Set Associative Mapping


240
TAG Memory = 16 × 15 = 240 bits = = 30 bytes (18)
8

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 20 of 30

Final Summary

Mapping Type Index Bits TAG Bits TAG Memory (Bytes)


Direct Mapping 4 12 24
Fully Associative 0 16 32
8-Way Set Associative 1 15 30

9 Cache Write Policies


When memory write operations are performed, CPU first writes into the cache memory. These
modifications made by CPU during a write operations, on the data saved in cache, need to be
written back to main memory or to auxiliary memory.

If a cache line has not been modified, then it can be overwritten immediately; however, if one or
more words have been written to a cache line, then main memory must be updated before replacing
the cache line.

There are two main potential write problems:

• If an I/O module is able to read/write to memory directly, then if the cache has been
modified a memory read cannot happen right away. If memory is written to, then the cache
line becomes invalid.

• If multiple processors each have their own cache, if one processor modifies its cache, then the
cache lines of the other processors could be invalid.

The two popular cache write policies are: Write –through and Write back

1. Write-through:

• Each write operation involves writing to the main memory.

• If the processor has to wait for the write operation to be complete, it slows down the
processor.

• Processor does not depend on the results of the write operation.

• Write buffer can be included for temporary storage of write requests.

• Processor places each write request into the buffer and continues execution.

• If a subsequent Read request references data which is still in the write buffer, then this
data is referenced in the write buffer.

Write- Through Advantages:

• Simpler to be implemented, but to be effective it requires a write buffer to do , not to


wait for the lower level of the memory hierarchy (to avoid write stalls)

• The read misses are cheaper because they do not require any write to the lower level of
the memory hierarchy

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 21 of 30

• Memory always up to date.

2. Write-back:

• Block is written back to the main memory when it is replaced due to a miss.

• If the processor waits for this write to complete, before reading the new block, it is
slowed down.

• Fast write buffer can hold the block to be written, and the new block can be read first.

• Dirty bit is set when we write to the cache, this indicates the cache is now inconsistent
with main memory.

• Dirty bit for cache slot is cleared when update occurs.

Write- Back Advantages:

• The block can be written by the processor at the frequency at which the cache, and not
the main memory, can accept it.

• Multiple writes to the same block require only a single write to the main memory.

Note:

• Comparing write-through vs write-back data cache policy – write-back one is faster as


memory source data is used only once.

• Important note is that this dual approach is used only with data cache, but not with
instruction cache.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 22 of 30

9.1 Additional Points


cache coherency: keeps the same word in other caches up to date using some technique. This is
an active field of reseach.

Cache Line Size: Cache lines sizes between 8 to 64 bytes seem to produce optimum results.
Prefetching

• New data are brought into the processor when they are first needed.

• Processor has to wait before the data transfer is complete.

• Prefetch the data into the cache before they are actually needed, or a before a Read miss
occurs.

• Prefetching can be accomplished through software by including a special instruction in the


machine language of the processor.

– Inclusion of prefetch instructions increases the length of the programs.

• Prefetching can also be accomplished using hardware:

– Circuitry that attempts to discover patterns in memory references and then prefetches
according to this pattern.

Lockup-Free Cache

• Prefetching scheme does not work if it stops other accesses to the cache until the prefetch is
completed.

• A cache of this type is said to be “locked” while it services a miss.

• Cache structure which supports multiple outstanding misses is called a lockup free cache.

• Since only one miss can be serviced at a time, a lockup free cache must include circuits that
keep track of all the outstanding misses.

• Special registers may hold the necessary information about these misses.

Multilevel Caches:

• An on-chip cache reduces the processor’s external bus activity. Further, an off-chip cache is
usually desirable. This is the typical level 1 (L1) and level 2 (L2) cache design where the L2
cache is composed of static RAM.

• As chip densities have increased, the L2 cache has been moved onto the on-chip area and an
additional L3 cache has been added.

Unified vs Split Caches: Recent cache designs have gone from a unified cache to a split cache
design (one for instructions and one for data).

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 23 of 30

Unified caches have the following advantages:

1. unified caches typically have a higher hit rate

2. only one cache is designed and implemented

Split caches have the following advantages:

• parallel instruction execution and prefetching is better handled because of the elimination of
contention between the instruction fetch/decode unit and execution unit.

10 Cache Replacement Algorithms


Replacement algorithms are used when there are no available space in a cache in which to place a
data. Four of the most common cache replacement algorithms are described.

Replacement algorithms are only needed for associative and set associative techniques.

1. First-in, First-out(FIFO): Evict the page that has been in the cache the longest time.

2. Least recently used (LRU): Evict the page whose last request occurred furthest in the
past.(least recently used page)

3. Random: Choose a page at random to evict from the cache.

4. Least Frequently Used (LFU): The LRU algorithm selects for replacement the item that
has been least frequently used by the CPU.

10.1 The FIFO Page Replacement Policy


• In the FIFO page replacement policy, the page that is brought in the earliest will be replaced.
As you will see in the next example, the FIFO policy will replace the pages in a round-robin
manner (i.e., cyclicly).

• Implementing the FIFO page replacement policy:

– Queue-based approach: A circular queue is maintained in memory.

– Use an index variable to point to the memory frame that will be replaced.

– Set this index variable to 0 initially.

– Increase index by ONE each time a page is brought in from disk. The page will be store
in the memory frame pointed to by the index variable.

• In all the examples of the different page replacement policies, I will use the following page
request summary:

Page request summary: 0 4 1 4 2 4 3 4 2 4 0 4 1 4 2 4 3 4

and 3 memory frames.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 24 of 30

• The following figure shows the behavior of the program in paging using the FIFO page
replacement policy:

• We can see notably bad replacement decisions made in this short example (see figure above).
Due to the program locality, it is not a good idea to replace a page that was used last.

• There are a total of 11 page read operations to satisfy the total of 18 page requests.

Advantage of FIFO replacement policy:

• Easy to implement in hardware (round robin)

• Low overhead (no access tracking)

• Predictable behavior

• FIFO abstraction and the mapping into simple processor instructions enables low-power
processors to efficiently manage incoming and outgoing data.

• Hardware FIFO consumes, generates, and moves data at high data rates characteristic of
network switches and converters.

• Mitigates latency penalties for memory access and avoids undesirable results of clogging of
the queued data pipeline.

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 25 of 30

Disadvantage of FIFO replacement policy:

• Can sometimes make dumb replcament decisions.

• Ignores usage patterns completely

• Generally worse hit rates than LRU/LFU

• Can evict frequently used data if it happens to be old

• No adaptation to temporal or spatial locality

Applications:

• Instruction caches

• Hardware queues (e.g., network packet buffering)

• Simple embedded systems with limited memory

10.2 The LRU Page Replacement Policy


• In the Least Recently Used (LRU) page replacement policy, the page that is used least
recently will be replaced.

• Implementation:

– Counter-based approach: A counter increments on every memory access; each cache


block stores its last access counter value.

– Stack-based approach: Cache blocks are maintained in an ordered list; when accessed,
a block moves to the top of the list.

– Add a register to every page frame - contain the last time that the page in that frame
was accessed.

– Use a ”logical clock” that advance by 1 tick each time a memory reference is made.

– Each time a page is referenced, update its register

• The following figure shows the behavior of the program in paging using the LRU page
replacement policy:

– We can see notably that the bad replacement decisions made by FIFO is not present in
LRU.

– There are a total of 9 page read operations to satisfy the total of 18 page requests - that
is almost a 20% improvement over FIFO in such a short experiment.

(I only want to make the point here that page replacement policy can affect the system
performance. I do not want to get into the question of ”how much better is LRU than
FIFO”).

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 26 of 30

• In fact, it has been shown empirically that LRU is the preferred page replacement policy

• Note1: LRU is probably the most effective

Advantage of LRU replacement policy:

• Good for general-purpose caching in CPUs, databases, and memory hierarchies.

• Performs well when temporal locality (recently accessed data is reused soon) is high.

Disadvantage of LRU replacement policy:

• High implementation complexity due to maintaining timestamps or linked lists.

• Cache thrashing can occur in certain workloads (e.g., cyclic access patterns).

• To identify the page to replace, you need to find the minimum time stamp value in all the
registers... Lots of work...

There is a more efficient scheme that approximate the behavior of LRU that runs more efficiently...
That method is ”second chance” - next.

Applications:

• CPU cache hierarchy (L1, L2, L3)

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 27 of 30

• Virtual memory page replacement

• Database and file system caching

10.3 Random Replacement (RR) Policy


RR chooses a random cache block for eviction when new data needs to be loaded.

Implementation:

• Pseudo-random number generator (PRNG): Used to select a random index in the


cache.

Note2: Simulations have shown that random is only slightly inferior to an algorithms based on
usage

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 28 of 30

11 Problems
11.1 Problem-01:
A system uses 3 page frames for storing process pages in main memory. It uses the First in First
out (FIFO) page replacement policy. Assume that all the page frames are initially empty. What
is the total number of page faults that will occur while processing the page reference string given
below:

4 , 7, 6, 1, 7, 6, 1, 2, 7, 2

Also calculate the hit ratio and miss ratio.

Solution:

Total number of references = 10

From here,

Total number of page faults occurred = 6

Calculating Hit ratio:

Total number of page hits = Total number of references – Total number of page misses or page faults
= 10–6
=4

Thus, Hit ratio = Total number of page hits / Total number of references
= 4/10
0.4 or 40%

Calculating Miss ratio:

Total number of page misses or page faults = 6

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 29 of 30

Thus, Miss ratio = Total number of page misses / Total number of references
= 6/10
= 0.6 or60%
Alternatively,
Miss ratio = 1 – Hit ratio
= 1–0.4
= 0.6 or 60%

11.2 Problem-02:
A system uses 3 page frames for storing process pages in main memory. It uses the Least Recently
Used (LRU) page replacement policy. Assume that all the page frames are initially empty. What
is the total number of page faults that will occur while processing the page reference string given
below-

4 , 7, 6, 1, 7, 6, 1, 2, 7, 2

Also calculate the hit ratio and miss ratio.

Solution:
Total number of references = 10

From here,

Total number of page faults occurred = 6

Calculating Hit ratio:

Total number of page hits = Total number of references – Total number of page misses or page faults
= 10–6
=4

Thus, Hit ratio = Total number of page hits / Total number of references
= 4/10
0.4 or 40%

Calculating Miss ratio:

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.


The Apollo University Computer Organization and Architecture Page 30 of 30

Total number of page misses or page faults = 6

Thus, Miss ratio = Total number of page misses / Total number of references
= 6/10
= 0.6 or 60%
Alternatively,
Miss ratio = 1 – Hit ratio
= 1–0.4
= 0.6 or 60%

Dr. Thirumalesu Kudithi, and Keerthi N, SOT, TAU, Chittoor, A.P.

You might also like