Memory Hierarchy and Cache Memory
Memory Hierarchy and Cache Memory
A memory element is the set of storage devices which stores the binary data in the type
of bits. In general, the storage of memory can be classified into two categories such as
volatile as well as non- volatile.
Memory Hierarchy
Primary Memory
The primary memory is also known as internal memory, and this is accessible by the
processor straightly. This memory includes main, cache, as well as CPU registers.
Secondary Memory
The secondary memory is also known as external memory, and this is accessible by
the processor through an input/output module. This memory includes an optical disk,
magnetic disk, and magnetic tape.
Performance
Previously, the designing of a computer system was done without memory hierarchy, and
the speed gap among the main memory as well as the CPU registers enhances because
of the huge disparity in access time, which will cause the lower performance of the
system. So, the enhancement was mandatory. The enhancement of this was designed in
the memory hierarchy model due to the system’s performance increase.
Ability
The ability of the memory hierarchy is the total amount of data the memory can store.
Because whenever we shift from top to bottom inside the memory hierarchy, then the
capacity will increase.
Access Time
The access time in the memory hierarchy is the interval of the time among the data
availability as well as request to read or write. Because whenever we shift from top to
bottom inside the memory hierarchy, then the access time will increase
When we shift from bottom to top inside the memory hierarchy, then the cost for each bit
will increase which means an internal Memory is expensive compared with external
memory.
Cache Memory
Cache memory can also be found in the processor, however rarely it may be another IC
(integrated circuit) which is separated into levels. The cache holds the chunk of data
which are frequently used from main memory. When the processor has a single core then
it will have two (or) more cache levels rarely. Present multi-core processors will be having
three, 2-levels for each one core, and one level is shared.
Main Memory
The main memory in the computer is nothing but, the memory unit in the CPU that
communicates directly. It is the main storage unit of the computer. This memory is fast as
well as large memory used for storing the data throughout the operations of the computer.
This memory is made up of RAM as well as ROM.
Magnetic Disks
The magnetic disks in the computer are circular plates fabricated of plastic otherwise
metal by magnetized material. Frequently, two faces of the disk are utilized as well as
many disks may be stacked on one spindle by read or write heads obtainable on every
plane. All the disks in computer turn jointly at high speed. The tracks in the computer are
nothing but bits which are stored within the magnetized plane in spots next to concentric
circles. These are usually separated into sections which are named as sectors.
Magnetic Tape
This tape is a normal magnetic recording which is designed with a slender magnetizable
covering on an extended, plastic film of the thin strip. This is mainly used to back up huge
data. Whenever the computer requires to access a strip, first it will mount to access the
data. Once the data is allowed, then it will be unmounted. The access time of memory will
be slower within magnetic strip as well as it will take a few minutes for accessing a strip.
Advantages of Memory Hierarchy
The need for a memory hierarchy includes the following.
Hence, the CPU can access alternate sections immediately without waiting for memory to be
cached. There are multiple memory banks that take turns for the supply of data.
In the above example of 4 memory banks, data with virtual addresses 0, 1, 2 and 3 can be
accessed simultaneously as they reside in separate memory banks. Hence we do not have to
wait to complete a data fetch to begin the next operation.
An interleaved memory with n banks is said to be n-way interleaved. There are still two
banks of DRAM in an interleaved memory system, but logically, the system seems one bank
of memory that is twice as large.
In the interleaved bank representation below with 2 memory banks, the first long word of bank
0 is flowed by that of bank 1, followed by the second long word of bank 0, followed by the
second long word of bank 1 and so on.
The following image shows the organization of two physical banks of n long words. All even
long words of the logical bank are located in physical bank 0, and all odd long words are
located in physical bank 1.
Why do we use Memory Interleaving?
When the processor requests data from the main memory, a block (chunk) of data is transferred
to the cache and then to processor. So whenever a cache miss occurs, the data is to be fetched
from the main memory. But main memory is relatively slower than the cache. So to improve
the access time of the main memory, interleaving is used.
For example, we can access all four modules at the same time, thus achieving parallelism. The
data can be acquired from the module using the higher bits. This method uses memory
effectively.
In high order memory interleaving, the most significant bits of the memory address decides
memory banks where a particular location resides. But, in low order interleaving the least
significant bits of the memory address decides the memory banks.
The least significant bits are sent as addresses to each chip. One problem is that consecutive
addresses tend to be in the same chip. The maximum rate of data transfer is limited by the
memory cycle time. It is also known as Memory Banking.
The least significant bits select the memory bank (module) in low-order interleaving. In this,
consecutive memory addresses are in different memory modules, allowing memory access
faster than the cycle time.
Benefits of Interleaved Memory
An instruction pipeline may require instruction and operands both at the same time from main
memory, which is not possible in the traditional method of memory access. Similarly, an
arithmetic pipeline requires two operands to be fetched simultaneously from the main memory.
So, to overcome this problem, memory interleaving comes to resolve this.
Data in DRAM is stored in units of pages. Each DRAM bank has a row buffer that serves as a
cache for accessing any page in the bank. Before a page in the DRAM bank is read, it is first
loaded into the row-buffer. If the page is immediately read from the row-buffer, it has the
shortest memory access latency in one memory cycle. Suppose it is a row buffer miss, which is
also called a row-buffer conflict. It is slower because the new page has to be loaded into the
row-buffer before it is read. Row-buffer misses happening as access requests on different
memory pages in the same bank are serviced. A row-buffer conflict incurs a substantial delay
for memory access. In contrast, memory accesses to different banks can proceed in parallel with
high throughput.
In traditional layouts, memory banks can be allocated a contiguous block of memory addresses,
which is very simple for the memory controller and gives an equal performance in completely
random-access scenarios compared to performance levels achieved through interleaving.
However, memory reads are rarely random due to the locality of reference, and optimizing for
close together access gives far better performance in interleaved layouts.
The way memory is addressed does not affect the access time for memory locations
that are already cached, impacting only on memory locations that need to be retrieved
from DRAM.
Cache Memory:
Cache memory is a chip-based computer component that makes retrieving data from the
computer's memory more efficient. It acts as a temporary storage area that the computer's
processor can retrieve data from easily. This temporary storage area, known as a cache,
is more readily available to the processor than the computer's main memory source,
typically some form of DRAM.
In order to be close to the processor, cache memory needs to be much smaller than main
memory. Consequently, it has less storage space. It is also more expensive than main
memory, as it is a more complex chip that yields higher performance.
What it sacrifices in size and price, it makes up for in speed. Cache memory operates
between 10 to 100 times faster than RAM, requiring only a few nanoseconds to respond
to a CPU request.
The name of the actual hardware that is used for cache memory is high-speed static
random-access memory (SRAM). The name of the hardware that is used in a computer's
main memory is dynamic random-access memory (DRAM).
Cache memory is not to be confused with the broader term cache. Caches are temporary
stores of data that can exist in both hardware and software. Cache memory refers to the
specific hardware component that allows computers to create caches at various levels of
the network.
Cache memory is fast and expensive. Traditionally, it is categorized as "levels" that describe its
closeness and accessibility to the microprocessor. There are three general cache levels:
L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in
the processor chip as CPU cache.
L2 cache, or secondary cache, is often more capacious than L1. L2 cache may be embedded on
the CPU, or it can be on a separate chip or coprocessor and have a high-speed alternative
system bus connecting the cache and CPU. That way it doesn't get slowed by traffic on the
main system bus.
Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and
L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed of
DRAM. With multicore processors, each core can have dedicated L1 and L2 cache, but they
can share an L3 cache. If an L3 cache references an instruction, it is usually elevated to a higher
level of cache.
In the past, L1, L2 and L3 caches have been created using combined processor and
motherboard components. Recently, the trend has been toward consolidating all three levels of
memory caching on the CPU itself. That's why the primary means for increasing cache size has
begun to shift from the acquisition of a specific motherboard with different chipsets and bus
architectures to buying a CPU with the right amount of integrated L1, L2 and L3 cache.
Contrary to popular belief, implementing flash or more dynamic RAM (DRAM) on a system
won't increase cache memory. This can be confusing since the terms memory caching (hard
disk buffering) and cache memory are often used interchangeably. Memory caching, using
DRAM or flash to buffer disk reads, is meant to improve storage I/O by caching data that is
frequently referenced in a buffer ahead of slower magnetic disk or tape. Cache memory, on the
other hand, provides read buffering for the CPU.
A diagram of the architecture and data flow of a typical cache memory unit.
Fully associative cache mapping is similar to direct mapping in structure but allows a
memory block to be mapped to any cache location rather than to a prespecified cache
memory location as is the case with direct mapping.
Write-through. Data is written to both the cache and main memory at the same time.
Write-back. Data is only written to the cache initially. Data may then be written to
main memory, but this does not need to happen and does not inhibit the interaction
from taking place.
The way data is written to the cache impacts data consistency and efficiency. For example,
when using write-through, more writing needs to happen, which causes latency upfront. When
using write-back, operations may be more efficient, but data may not be consistent between the
main and cache memories.
One way a computer determines data consistency is by examining the dirty bit in memory. The
dirty bit is an extra bit included in memory blocks that indicates whether the information has
been modified. If data reaches the processor's register file with an active dirty bit, it means that
it is not up to date and there are more recent versions elsewhere. This scenario is more likely to
happen in a write-back scenario, because the data is written to the two storage areas
asynchronously.
Translation lookaside buffers (TLBs) are also specialized memory caches whose function is to
record virtual address to physical address translations.
Still other caches are not, technically speaking, memory caches at all. Disk caches, for instance,
can use DRAM or flash memory to provide data caching similar to what memory caches do
with CPU instructions. If data is frequently accessed from the disk, it is cached into DRAM or
flash-based silicon storage technology for faster access time and response.
Specialized caches are also available for applications such as web browsers, databases, network
address binding and client-side Network File System protocol support. These types of caches
might be distributed across multiple networked hosts to provide greater scalability or
performance to an application that uses them.
A depiction of the memory hierarchy and how it functions
Locality
The ability of cache memory to improve a computer's performance relies on the concept of
locality of reference. Locality describes various situations that make a system more predictable.
Cache memory takes advantage of these situations to create a pattern of memory access that it
can rely upon.
There are several types of locality. Two key ones for cache are:
Performance
Cache memory is important because it improves the efficiency of data retrieval. It stores
program instructions and data that are used repeatedly in the operation of programs or
information that the CPU is likely to need next. The computer processor can access this
information more quickly from the cache than from the main memory. Fast access to these
instructions increases the overall speed of the program.
Aside from its main function of improving performance, cache memory is a valuable resource
for evaluating a computer's overall performance. Users can do this by looking at cache's hit-to-
miss ratio. Cache hits are instances in which the system successfully retrieves data from the
cache. A cache miss is when the system looks for the data in the cache, can't find it, and looks
somewhere else instead. In some cases, users can improve the hit-miss ratio by adjusting the
cache memory block size -- the size of data units stored.
Improved performance and ability to monitor performance are not just about improving general
convenience for the user. As technology advances and is increasingly relied upon in mission-
critical scenarios, having speed and reliability becomes crucial. Even a few milliseconds of
latency could potentially lead to enormous expenses, depending on the situation.
DRAM serves as a computer's main memory, performing calculations on data retrieved from
storage. Both DRAM and cache memory are volatile memories that lose their contents when the
power is turned off. DRAM is installed on the motherboard, and the CPU accesses it through a
bus connection.
DRAM is usually about half as fast as L1, L2 or L3 cache memory, and much less expensive. It
provides faster data access than flash storage, hard disk drives (HDD) and tape storage. It came
into use in the last few decades to provide a place to store frequently accessed disk data to
improve I/O performance.
DRAM must be refreshed every few milliseconds. Cache memory, which also is a type of
random-access memory, does not need to be refreshed. It is built directly into the CPU to give
the processor the fastest possible access to memory locations and provides nanosecond speed
access time to frequently referenced instructions and data. SRAM is faster than DRAM, but
because it's a more complex chip, it's also more expensive to make.
A computer has a limited amount of DRAM and even less cache memory. When a large program
or multiple programs are running, it's possible for memory to be fully used. To compensate for a
shortage of physical memory, the computer's operating system (OS) can create virtual memory.
To do this, the OS temporarily transfers inactive data from DRAM to disk storage. This approach
increases virtual address space by using active memory in DRAM and inactive memory in HDDs
to form contiguous addresses that hold both an application and its data. Virtual memory lets a
computer run larger programs or multiple programs simultaneously, and each program operates as
though it has unlimited memory.
In order to copy virtual memory into physical memory, the OS divides memory into page files or
swap files that contain a certain number of addresses. Those pages are stored on a disk and when
they're needed, the OS copies them from the disk to main memory and translates the virtual
memory address into a physical one. These translations are handled by a memory management unit
(MMU).
Mainframes used an early version of cache memory, but the technology as it is known today began
to be developed with the advent of microcomputers. With early PCs, processor performance
increased much faster than memory performance, and memory became a bottleneck, slowing
systems.
In the 1980s, the idea took hold that a small amount of more expensive, faster SRAM could be
used to improve the performance of the less expensive, slower main memory. Initially, the memory
cache was separate from the system processor and not always included in the chipset. Early PCs
typically had from 16 KB to 128 KB of cache memory.
With 486 processors, Intel added 8 KB of memory to the CPU as Level 1 (L1) memory. As much
as 256 KB of external Level 2 (L2) cache memory was used in these systems. Pentium processors
saw the external cache memory double again to 512 KB on the high end. They also split the
internal cache memory into two caches: one for instructions and the other for data.
Processors based on Intel's P6 microarchitecture, introduced in 1995, were the first to incorporate
L2 cache memory into the CPU and enable all of a system's cache memory to run at the same clock
speed as the processor. Prior to the P6, L2 memory external to the CPU was accessed at a much
slower clock speed than the rate at which the processor ran and slowed system performance
considerably.
Early memory cache controllers used a write-through cache architecture, where data written into
cache was also immediately updated in RAM. This approached minimized data loss, but also
slowed operations. With later 486-based PCs, the write-back cache architecture was developed,
where RAM isn't updated immediately. Instead, data is stored on cache and RAM is updated only
at specific intervals or under certain circumstances where data is missing or old.