Unit_4_COA_MemorySystem-1
Unit_4_COA_MemorySystem-1
Study Materials
B.Tech IV-Semester School of Technology Academic Year: 2024-25
2. Cache Memory:
Cache memory is a small, fast memory unit located close to the CPU. It stores frequently
used data and instructions that have been recently accessed from the main memory. Cache
memory is designed to minimize the time it takes to access data by providing the CPU with
quick access to frequently used data.
3. Main Memory:
Main memory, also known as RAM (Random Access Memory), is the primary memory of a
computer system. It has a larger storage capacity than cache memory, but it is slower. Main
memory is used to store data and instructions that are currently in use by the CPU.
• Static RAM: Static RAM stores the binary information in flip flops and information
remains valid until power is supplied. Static RAM has a faster access time and is used
in implementing cache memory.
4. Secondary Storage:
Secondary storage, such as hard disk drives (HDD) and solid-state drives (SSD) , is a non-
volatile memory unit that has a larger storage capacity than main memory. It is used to
store data and instructions that are not currently in use by the CPU. Secondary storage has
the slowest access time and is typically the least expensive type of memory in the memory
hierarchy.
5. Magnetic Disk:
Magnetic Disks are simply circular plates that are fabricated with either a metal or a plastic
or a magnetized material. The Magnetic disks work at a high speed inside the computer and
these are frequently used.
6. Magnetic Tape:
Magnetic Tape is simply a magnetic recording device that is covered with a plastic film.
Magnetic Tape is generally used for the backup of data. In the case of a magnetic tape, the
access time for a computer is a little slower and therefore, it requires some amount of time
for accessing the strip.
• Access Time: It is the time interval between the read/write request and the availability of
the data. As we move from top to bottom in the Hierarchy, the access time increases.
• Performance: The Memory Hierarch design ensures that frequently accessed data is stored
in faster memory to improve system performance.
• Cost Per Bit: As we move from bottom to top in the Hierarchy, the cost per bit increases
i.e. Internal Memory is costlier than External Memory.
• Performance: Frequently used data is stored in faster memory (like cache), reducing access
time and improving overall system performance.
• Cost Efficiency: By combining small, fast memory (like registers and cache) with larger,
slower memory (like RAM and HDD), the system achieves a balance between cost and
performance. It saves the consumer’s price and time.
• Optimized Resource Utilization: Combines the benefits of small, fast memory and large,
cost-effective storage to maximize system performance.
• Efficient Data Management: Frequently accessed data is kept closer to the CPU, while
less frequently used data is stored in larger, slower memory, ensuring efficient data handling.
• Complex Design: Managing and coordinating data across different levels of the hierarchy
adds complexity to the system’s design and operation.
• Cost: Faster memory components like registers and cache are expensive, limiting their size
and increasing the overall cost of the system.
• Latency: Accessing data stored in slower memory (like secondary or tertiary storage)
increases the latency and reduces system performance.
6 Memory Interleaving
Memory Interleaving is less or More an Abstraction technique. Though it’s a bit different from
Abstraction. It is a Technique that divides memory into a number of modules such that Successive
words in the address space are placed in the Different modules.
Let us assume 16 Data’s to be Transferred to the Four Module. Where Module 00 be Module 1,
Module 01 be Module 2, Module 10 be Module 3 & Module 11 be Module 4. Also, 10, 20, 30. . . .130
are the data to be transferred.
From the figure above in Module 1, 10 [Data] is transferred then 20, 30 & finally, 40 which are the
Data. That means the data are added consecutively in the Module till its max capacity.
Most significant bit (MSB) provides the Address of the Module & the least significant bit (LSB)
provides the address of the data in the module.
For Example, to get 90 (Data) 1000 will be provided by the processor. This 10 will indicate that
the data is in module 10 (module 3) & 00 is the address of 90 in Module 10 (module 3). So,
Now again we assume 16 Data’s to be transferred to the Four Module. But Now the consecutive
Data are added in Consecutive Module. That is, 10 [Data] is added in Module 1, 20 [Data] in
Module 2 and So on.
Least Significant Bit (LSB) provides the Address of the Module & Most significant bit (MSB)
provides the address of the data in the module.
For Example, to get 90 (Data) 1000 will be provided by the processor. This 00 will indicate that
the data is in module 00 (module 1) & 10 is the address of 90 in Module 00 (module 1). That is,
We can access all four Modules at the same time thus achieving Parallelism. From Figure 2 the
data can be acquired from the Module using the Higher bits. This method Uses memory effectively.
• The most important use of cache memory is that it is used to reduce the average time to
access data from the main memory.
• The concept of cache works because there exists locality of reference (the same items or
nearby items are more likely to be accessed next) in processes.
By storing this information closer to the CPU, cache memory helps speed up the overall processing
time. Cache memory is much faster than the main memory (RAM). When the CPU needs data,
it first checks the cache. If the data is there, the CPU can access it quickly. If not, it must fetch
the data from the slower main memory.
• Extremely fast memory type that acts as a buffer between RAM and the CPU.
• Holds frequently requested data and instructions, ensuring that they are immediately available
to the CPU when needed.
• Costlier than main memory or disk memory but more economical than CPU registers.
• Level 2 or Cache memory: It is the fastest memory that has faster access time where
data is temporarily stored for faster access.
• Level 3 or Main Memory: It is the memory on which the computer works currently. It
is small in size and once power is off data no longer stays in this memory.
• Level 4 or Secondary Memory: It is external memory that is not as fast as the main
memory but data stays permanently in this memory.
• If the processor finds that the memory location is in the cache, a Cache Hit has occurred and
data is read from the cache.
• If the processor does not find the memory location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry and copies in data from the main memory,
then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.
• Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 - hit ratio(H)
We can improve Cache performance using higher cache block size, and higher associativity, reduce
miss rate, reduce miss penalty, and reduce the time to hit in the cache.
Solution:
Solution:
We know that AMAT = Hit Time + (Miss Ratio × Miss Penalty)
AM AT = 2 + (0.10 × 50) = 2 + 5 = 7 ns
Advantages:
• The data access time of Cache Memory is less than that of the main memory.
• Cache Memory stored data and instructions that are regularly used by the CPU, therefore
it increases the performance of the CPU.
Disadvantages:
• Whenever the system is turned off, data and instructions stored in cache memory get destroyed.
• The high cost of cache memory increases the price of the Computer System.
8 Cache Mapping
Cache mapping refers to the method used to store data from main memory into the cache. It
determines how data from memory is mapped to specific locations in the cache.
There are three different types of mapping used for the purpose of cache memory which is as
follows:
• Direct Mapping
• Set-Associative Mapping
i = j modulo m = j % m
where,
i = cache line number
j = main memory block number
m = number of lines in the cache
For example, consider a memory with 8 blocks(j) and a cache with 4 lines(m). Using direct
mapping, block 0 of memory might be stored in cache line 0, block 1 in line 1, block 2 in line 2,
and block 3 in line 3. If block 4 of memory is accessed, it would be mapped to cache line 0 (as i
= j modulo m i.e. i = 4 % 4 = 0), replacing memory block 0.
The Main Memory consists of memory blocks and these blocks are made up of fixed number of
words. A typical address in main memory is split into two parts:
1. Index Field: It represent the block number. Index Field bits tells us the location of block
where a word can be.
2. Block Offset: It represent words in a memory block. These bits determines the location of
word in a memory block.
The Cache Memory consists of cache lines. These cache lines has same size as memory blocks. The
address in cache memory consists of:
1. Block Offset: This is the same block offset we use in Main Memory.
2. Index: It represent cache line number. This part of the memory address determines which
cache line (or slot) the data will be placed in.
3. Tag: The Tag is the remaining part of the address that uniquely identifies which block is
currently occupying the cache line.
The index field in main memory maps directly to the index in cache memory, which determines the
cache line where the block will be stored. The block offset in both main memory and cache memory
indicates the exact word within the block. In the cache, the tag identifies which memory block
is currently stored in the cache line. This mapping ensures that each memory block is mapped
to exactly one cache line, and the data is accessed using the tag and index while the block offset
specifies the exact word in the block.
Advantages of Direct-mapping:
Disadvantages of Direct-mapping:
Solution:
• Cache Size: 16 KB
Tasks:
Solution:
– Since there are no multiple addresses mapping to the same index, no conflict misses
occur.
– Each memory block has a unique location in the cache, so all accesses result in compulsory
(cold) misses, but not conflict misses.
• Conclusion: No conflict misses occur because all addresses map to different cache indices.
The address structure of Cache Memory is different in fully associative mapping from direct
mapping. In fully associative mapping, the cache does not have an index field. It only have a
tag which is same as Index Field in memory address. Any block of memory can be placed in any
cache line. This flexibility means that there’s no fixed position for memory blocks in the cache.
To determine whether a block is present in the cache, the tag is compared with the tags stored in
all cache lines. If a match is found, it is a cache hit, and the data is retrieved from that cache line.
If no match is found, it’s a cache miss, and the required data is fetched from main memory.
• It is expensive.
• Cache Size: 32 KB
2. Offset Bits
3. Tag Bits
Solution:
1. Number of Blocks in Cache: The number of blocks in the cache is given by:
Cache Size
Number of Blocks = (2)
Block Size
32 × 1024 32768
Number of Blocks = = = 512 (3)
64 64
2. Offset Bits Calculation The number of offset bits is determined by the block size:
3. Tag Bits Calculation Since the system uses a 32-bit address and there is no index field in a
fully associative cache, the tag is computed as:
4. Final Answer
Conclusion: In a fully associative cache, the memory address consists of an offset to locate
data within a block and a tag to uniquely identify the memory block.
v=m/k
where,
m = number of cache lines in the cache memory
k = number of cache lines we want in each set
v = number of sets
Like direct mapping, now each memory block can be placed into any cache line within a specific
set.
i = j modulo v = j % v
where,
j = main memory block number
v = number of sets
i = cache line set number
This reduces the conflict misses that occur in direct mapping while still limiting the search space
compared to fully-associative mapping.
For example, consider a 2-way set associative cache, which means 2 cache lines make a set in this
cache structure. There are 8 memory blocks and 4 cache lines, thus the number of sets will be 4/2
= 2 sets. Using direct mapping strategy first, block 0 will be in set 0, block 1 in set 1, block 2
in set 0 and so on. Then, the tag is used to search through all cache lines in that set to find the
correct block (Associative Mapping).
• It is very expensive.
• Secondary Cache: Secondary cache is placed between the primary cache and the rest of
the memory. It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed
on the processor chip.
• Spatial Locality of Reference: Spatial Locality of Reference says that there is a chance
that the element will be present in close proximity to the reference point and next time if
again searched then more close proximity to the point of reference.
• Temporal Locality of Reference: Temporal Locality of Reference uses the Least recently
used algorithm will be used. Whenever there is page fault occurs within a word will not only
load the word in the main memory but the complete page fault will be loaded because the
spatial locality of reference rule says that if you are referring to any word next word will be
referred to in its register that’s why we load complete page table so the complete block will
be loaded.
1. Number of Sets
2. Index Bits
4096
= 1024 sets
4
Problem 7: Consider 1MB Cache Memory (CM)and 4GB Main Memory (MM) are partitioned
into 64kB blocks
(i) How many bits are required for physical address if word size is 1B
(iii) How many TAG bits are required for Direct, Associative, and 8-way set associative mapping
(iv) How many TAG memory in bytes are required for each case.
• Word Size: 1B
Since the total main memory size is 4GB = 232 bytes, the number of bits required for
addressing is:
Physical Address Bits = log2 (232 ) = 32 bits (7)
• Direct Mapping
Index bits = log2 (16) = 4 bits (10)
TAG bits = 32 − (4 + 16) = 12 bits (11)
• Direct Mapping
192
TAG Memory = 16 × 12 = 192 bits = = 24 bytes (16)
8
Final Summary
If a cache line has not been modified, then it can be overwritten immediately; however, if one or
more words have been written to a cache line, then main memory must be updated before replacing
the cache line.
• If an I/O module is able to read/write to memory directly, then if the cache has been
modified a memory read cannot happen right away. If memory is written to, then the cache
line becomes invalid.
• If multiple processors each have their own cache, if one processor modifies its cache, then the
cache lines of the other processors could be invalid.
The two popular cache write policies are: Write –through and Write back
1. Write-through:
• If the processor has to wait for the write operation to be complete, it slows down the
processor.
• Processor places each write request into the buffer and continues execution.
• If a subsequent Read request references data which is still in the write buffer, then this
data is referenced in the write buffer.
• The read misses are cheaper because they do not require any write to the lower level of
the memory hierarchy
2. Write-back:
• Block is written back to the main memory when it is replaced due to a miss.
• If the processor waits for this write to complete, before reading the new block, it is
slowed down.
• Fast write buffer can hold the block to be written, and the new block can be read first.
• Dirty bit is set when we write to the cache, this indicates the cache is now inconsistent
with main memory.
• The block can be written by the processor at the frequency at which the cache, and not
the main memory, can accept it.
• Multiple writes to the same block require only a single write to the main memory.
Note:
• Important note is that this dual approach is used only with data cache, but not with
instruction cache.
Cache Line Size: Cache lines sizes between 8 to 64 bytes seem to produce optimum results.
Prefetching
• New data are brought into the processor when they are first needed.
• Prefetch the data into the cache before they are actually needed, or a before a Read miss
occurs.
– Circuitry that attempts to discover patterns in memory references and then prefetches
according to this pattern.
Lockup-Free Cache
• Prefetching scheme does not work if it stops other accesses to the cache until the prefetch is
completed.
• Cache structure which supports multiple outstanding misses is called a lockup free cache.
• Since only one miss can be serviced at a time, a lockup free cache must include circuits that
keep track of all the outstanding misses.
• Special registers may hold the necessary information about these misses.
Multilevel Caches:
• An on-chip cache reduces the processor’s external bus activity. Further, an off-chip cache is
usually desirable. This is the typical level 1 (L1) and level 2 (L2) cache design where the L2
cache is composed of static RAM.
• As chip densities have increased, the L2 cache has been moved onto the on-chip area and an
additional L3 cache has been added.
Unified vs Split Caches: Recent cache designs have gone from a unified cache to a split cache
design (one for instructions and one for data).
• parallel instruction execution and prefetching is better handled because of the elimination of
contention between the instruction fetch/decode unit and execution unit.
Replacement algorithms are only needed for associative and set associative techniques.
1. First-in, First-out(FIFO): Evict the page that has been in the cache the longest time.
2. Least recently used (LRU): Evict the page whose last request occurred furthest in the
past.(least recently used page)
4. Least Frequently Used (LFU): The LRU algorithm selects for replacement the item that
has been least frequently used by the CPU.
– Use an index variable to point to the memory frame that will be replaced.
– Increase index by ONE each time a page is brought in from disk. The page will be store
in the memory frame pointed to by the index variable.
• In all the examples of the different page replacement policies, I will use the following page
request summary:
• The following figure shows the behavior of the program in paging using the FIFO page
replacement policy:
• We can see notably bad replacement decisions made in this short example (see figure above).
Due to the program locality, it is not a good idea to replace a page that was used last.
• There are a total of 11 page read operations to satisfy the total of 18 page requests.
• Predictable behavior
• FIFO abstraction and the mapping into simple processor instructions enables low-power
processors to efficiently manage incoming and outgoing data.
• Hardware FIFO consumes, generates, and moves data at high data rates characteristic of
network switches and converters.
• Mitigates latency penalties for memory access and avoids undesirable results of clogging of
the queued data pipeline.
Applications:
• Instruction caches
• Implementation:
– Stack-based approach: Cache blocks are maintained in an ordered list; when accessed,
a block moves to the top of the list.
– Add a register to every page frame - contain the last time that the page in that frame
was accessed.
– Use a ”logical clock” that advance by 1 tick each time a memory reference is made.
• The following figure shows the behavior of the program in paging using the LRU page
replacement policy:
– We can see notably that the bad replacement decisions made by FIFO is not present in
LRU.
– There are a total of 9 page read operations to satisfy the total of 18 page requests - that
is almost a 20% improvement over FIFO in such a short experiment.
(I only want to make the point here that page replacement policy can affect the system
performance. I do not want to get into the question of ”how much better is LRU than
FIFO”).
• In fact, it has been shown empirically that LRU is the preferred page replacement policy
• Performs well when temporal locality (recently accessed data is reused soon) is high.
• Cache thrashing can occur in certain workloads (e.g., cyclic access patterns).
• To identify the page to replace, you need to find the minimum time stamp value in all the
registers... Lots of work...
There is a more efficient scheme that approximate the behavior of LRU that runs more efficiently...
That method is ”second chance” - next.
Applications:
Implementation:
Note2: Simulations have shown that random is only slightly inferior to an algorithms based on
usage
11 Problems
11.1 Problem-01:
A system uses 3 page frames for storing process pages in main memory. It uses the First in First
out (FIFO) page replacement policy. Assume that all the page frames are initially empty. What
is the total number of page faults that will occur while processing the page reference string given
below:
4 , 7, 6, 1, 7, 6, 1, 2, 7, 2
Solution:
From here,
Total number of page hits = Total number of references – Total number of page misses or page faults
= 10–6
=4
Thus, Hit ratio = Total number of page hits / Total number of references
= 4/10
0.4 or 40%
Thus, Miss ratio = Total number of page misses / Total number of references
= 6/10
= 0.6 or60%
Alternatively,
Miss ratio = 1 – Hit ratio
= 1–0.4
= 0.6 or 60%
11.2 Problem-02:
A system uses 3 page frames for storing process pages in main memory. It uses the Least Recently
Used (LRU) page replacement policy. Assume that all the page frames are initially empty. What
is the total number of page faults that will occur while processing the page reference string given
below-
4 , 7, 6, 1, 7, 6, 1, 2, 7, 2
Solution:
Total number of references = 10
From here,
Total number of page hits = Total number of references – Total number of page misses or page faults
= 10–6
=4
Thus, Hit ratio = Total number of page hits / Total number of references
= 4/10
0.4 or 40%
Thus, Miss ratio = Total number of page misses / Total number of references
= 6/10
= 0.6 or 60%
Alternatively,
Miss ratio = 1 – Hit ratio
= 1–0.4
= 0.6 or 60%