0% found this document useful (0 votes)
155 views

Memory Hierarchy and Cache Memory

The document discusses memory hierarchy in computer architecture. It describes how memory is divided into a hierarchy with different levels based on speed and usage, from fastest to slowest: registers, cache, main memory, magnetic disks, and magnetic tapes. This hierarchy improves performance by allowing faster memory levels to be accessed more quickly while larger, slower memory levels provide greater storage capacity.

Uploaded by

bhavya g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
155 views

Memory Hierarchy and Cache Memory

The document discusses memory hierarchy in computer architecture. It describes how memory is divided into a hierarchy with different levels based on speed and usage, from fastest to slowest: registers, cache, main memory, magnetic disks, and magnetic tapes. This hierarchy improves performance by allowing faster memory levels to be accessed more quickly while larger, slower memory levels provide greater storage capacity.

Uploaded by

bhavya g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Memory Hierarchy in Computer Architecture

In the design of the computer system,  a processor, as well as a large amount of


memory devices, has been used. However, the main problem is, these parts are
expensive. So the memory organization of the system can be done by memory
hierarchy. It has several levels of memory with different performance rates. But all
these can supply an exact purpose, such that the access time can be reduced. The
memory hierarchy was developed depending upon the behavior of the program. This
article discusses an overview of the memory hierarchy in computer architecture.

What is Memory Hierarchy?


The memory in a computer can be divided into five hierarchies based on the speed as
well as use. The processor can move from one level to another based on its
requirements. The five hierarchies in the memory are registers, cache, main memory,
magnetic discs, and magnetic tapes. The first three hierarchies are volatile memories
which mean when there is no power, and then automatically they lose their stored
data. Whereas the last two hierarchies are not volatile which means they store the data
permanently.

A memory element is the set of storage devices which stores the binary data in the type
of bits. In general, the storage of memory can be classified into two categories such as
volatile as well as non- volatile.

Memory Hierarchy in Computer Architecture


The memory hierarchy design in a computer system mainly includes different storage
devices. Most of the computers were inbuilt with extra storage to run more powerfully
beyond the main memory capacity. The following memory hierarchy diagram is a
hierarchical pyramid for computer memory. The designing of the memory hierarchy is
divided into two types such as primary (Internal) memory and secondary (External)
memory.

Memory Hierarchy
Primary Memory

The primary memory is also known as internal memory, and this is accessible by the
processor straightly. This memory includes main, cache, as well as CPU registers.

Secondary Memory

The secondary memory is also known as external memory, and this is accessible by
the processor through an input/output module. This memory includes an optical disk,
magnetic disk, and magnetic tape.

Characteristics of Memory Hierarchy


The memory hierarchy characteristics mainly include the following.

Performance

Previously, the designing of a computer system was done without memory hierarchy, and
the speed gap among the main memory as well as the CPU registers enhances because
of the huge disparity in access time, which will cause the lower performance of the
system. So, the enhancement was mandatory. The enhancement of this was designed in
the memory hierarchy model due to the system’s performance increase.

Ability

The ability of the memory hierarchy is the total amount of data the memory can store.
Because whenever we shift from top to bottom inside the memory hierarchy, then the
capacity will increase.

Access Time

The access time in the memory hierarchy is the interval of the time among the data
availability as well as request to read or write. Because whenever we shift from top to
bottom inside the memory hierarchy, then the access time will increase

Cost per bit

When we shift from bottom to top inside the memory hierarchy, then the cost for each bit
will increase which means an internal Memory is expensive compared with external
memory.

Memory Hierarchy Design


The memory hierarchy in computers mainly includes the following.
Registers
Usually, the register is a static RAM or SRAM in the processor of the computer which is
used for holding the data word which is typically 64 or 128 bits. The program
counter register is the most important as well as found in all the processors. Most of the
processors use a status word register as well as an accumulator. A status word register is
used for decision making, and the accumulator is used to store the data like mathematical
operation. Usually, computers like complex instruction set computers have so many
registers for accepting main memory, and RISC- reduced instruction set computers have
more registers.

Cache Memory

Cache memory can also be found in the processor, however rarely it may be another IC
(integrated circuit) which is separated into levels. The cache holds the chunk of data
which are frequently used from main memory. When the processor has a single core then
it will have two (or) more cache levels rarely. Present multi-core processors will be having
three, 2-levels for each one core, and one level is shared.

Main Memory

The main memory in the computer is nothing but, the memory unit in the CPU that
communicates directly. It is the main storage unit of the computer. This memory is fast as
well as large memory used for storing the data throughout the operations of the computer.
This memory is made up of RAM as well as ROM.

Magnetic Disks

The magnetic disks in the computer are circular plates fabricated of plastic otherwise
metal by magnetized material. Frequently, two faces of the disk are utilized as well as
many disks may be stacked on one spindle by read or write heads obtainable on every
plane. All the disks in computer turn jointly at high speed. The tracks in the computer are
nothing but bits which are stored within the magnetized plane in spots next to concentric
circles. These are usually separated into sections which are named as sectors.

Magnetic Tape

This tape is a normal magnetic recording which is designed with a slender magnetizable
covering on an extended, plastic film of the thin strip. This is mainly used to back up huge
data. Whenever the computer requires to access a strip, first it will mount to access the
data. Once the data is allowed, then it will be unmounted. The access time of memory will
be slower within magnetic strip as well as it will take a few minutes for accessing a strip.
Advantages of Memory Hierarchy
The need for a memory hierarchy includes the following.

 Memory distributing is simple and economical


 Removes external destruction
 Data can be spread all over
 Permits demand paging & pre-paging
 Swapping will be more proficient

What is Interleaved Memory?


Interleaved memory is designed to compensate for the relatively slow speed of dynamic
random-access memory (DRAM) or core memory by spreading memory addresses evenly
across memory banks. In this way, contiguous memory reads and writes use each memory
bank, resulting in higher memory throughput due to reduced waiting for memory banks to
become ready for the operations.

It is different from multi-channel memory architectures, primarily as interleaved memory does


not add more channels between the main memory and the memory controller. However,
channel interleaving is also possible, for example, in Freescale i.MX6 processors, which allow
interleaving to be done between two channels. With interleaved memory, memory addresses are
allocated to each memory bank.

Example of Interleaved Memory


It is an abstraction technique that divides memory into many modules such that successive
words in the address space are placed in different modules.
Suppose we have 4 memory banks, each containing 256 bytes, and then the Block Oriented
scheme (no interleaving) will assign virtual addresses 0 to 255 to the first bank and 256 to 511
to the second bank. But in Interleaved memory, virtual address 0 will be with the first bank, 1
with the second memory bank, 2 with the third bank and 3 with the fourth, and then 4 with the
first memory bank again.
00:00/07:31

Hence, the CPU can access alternate sections immediately without waiting for memory to be
cached. There are multiple memory banks that take turns for the supply of data.

In the above example of 4 memory banks, data with virtual addresses 0, 1, 2 and 3 can be
accessed simultaneously as they reside in separate memory banks. Hence we do not have to
wait to complete a data fetch to begin the next operation.

An interleaved memory with n banks is said to be n-way interleaved. There are still two
banks of DRAM in an interleaved memory system, but logically, the system seems one bank
of memory that is twice as large.

In the interleaved bank representation below with 2 memory banks, the first long word of bank
0 is flowed by that of bank 1, followed by the second long word of bank 0, followed by the
second long word of bank 1 and so on.

The following image shows the organization of two physical banks of n long words. All even
long words of the logical bank are located in physical bank 0, and all odd long words are
located in physical bank 1.
Why do we use Memory Interleaving?
When the processor requests data from the main memory, a block (chunk) of data is transferred
to the cache and then to processor. So whenever a cache miss occurs, the data is to be fetched
from the main memory. But main memory is relatively slower than the cache. So to improve
the access time of the main memory, interleaving is used.

For example, we can access all four modules at the same time, thus achieving parallelism. The
data can be acquired from the module using the higher bits. This method uses memory
effectively.

Types of Interleaved Memory


In an operating system, there are two types of interleaved memory, such as:

1. High order interleaving: 

In high order memory interleaving, the most significant bits of the memory address decides
memory banks where a particular location resides. But, in low order interleaving the least
significant bits of the memory address decides the memory banks.

The least significant bits are sent as addresses to each chip. One problem is that consecutive
addresses tend to be in the same chip. The maximum rate of data transfer is limited by the
memory cycle time. It is also known as Memory Banking.

2. Low order interleaving: 

The least significant bits select the memory bank (module) in low-order interleaving. In this,
consecutive memory addresses are in different memory modules, allowing memory access
faster than the cycle time.
Benefits of Interleaved Memory
An instruction pipeline may require instruction and operands both at the same time from main
memory, which is not possible in the traditional method of memory access. Similarly, an
arithmetic pipeline requires two operands to be fetched simultaneously from the main memory.
So, to overcome this problem, memory interleaving comes to resolve this.

o It allows simultaneous access to different modules of memory. The modular


memory technique allows the CPU to initiate memory access with one module
while others are busy with the CPU in reading or write operations. So, we can say
interleave memory honours every memory request independent of the state of
the other modules.
o So, for this obvious reason, interleave memory makes a system more responsive
and faster than non-interleaving. Additionally, with simultaneous memory access,
the CPU processing time also decreases and increasing throughput. Interleave
memory is useful in the system with pipelining and vector processing.
o In an interleaved memory, consecutive memory addresses are spread across
different memory modules. Say, in a byte-addressable 4 way interleave memory,
if byte 0 is in the first module, then byte 1 will be in the 2nd module, byte 2 will
be in the 3rd module, byte 3 will be in the 4th module, and again byte 4 will fall
in the first module, and this goes on.
o An n-way interleaved memory where main memory is divided into n-banks and
system can access n operands/instruction simultaneously from n different
memory banks. This kind of memory access can reduce the memory access time
by a factor close to the number of memory banks. In this memory interleaving
memory location, i can be found in bank i mod n.
Interleaving DRAM
Main memory is usually composed of a collection of DRAM memory chips, where many chips
can be grouped together to form a memory bank. With a memory controller that supports
interleaving, it is then possible to layout these memory banks so that the memory banks will be
interleaved.

Data in DRAM is stored in units of pages. Each DRAM bank has a row buffer that serves as a
cache for accessing any page in the bank. Before a page in the DRAM bank is read, it is first
loaded into the row-buffer. If the page is immediately read from the row-buffer, it has the
shortest memory access latency in one memory cycle. Suppose it is a row buffer miss, which is
also called a row-buffer conflict. It is slower because the new page has to be loaded into the
row-buffer before it is read. Row-buffer misses happening as access requests on different
memory pages in the same bank are serviced. A row-buffer conflict incurs a substantial delay
for memory access. In contrast, memory accesses to different banks can proceed in parallel with
high throughput.

In traditional layouts, memory banks can be allocated a contiguous block of memory addresses,
which is very simple for the memory controller and gives an equal performance in completely
random-access scenarios compared to performance levels achieved through interleaving.
However, memory reads are rarely random due to the locality of reference, and optimizing for
close together access gives far better performance in interleaved layouts.

The way memory is addressed does not affect the access time for memory locations
that are already cached, impacting only on memory locations that need to be retrieved
from DRAM.

Cache Memory:

Cache memory is a chip-based computer component that makes retrieving data from the
computer's memory more efficient. It acts as a temporary storage area that the computer's
processor can retrieve data from easily. This temporary storage area, known as a cache,
is more readily available to the processor than the computer's main memory source,
typically some form of DRAM.

Cache memory is sometimes called CPU (central processing unit) memory because it is


typically integrated directly into the CPU chip or placed on a separate chip that has a
separate bus interconnect with the CPU. Therefore, it is more accessible to the processor,
and able to increase efficiency, because it's physically close to the processor.

In order to be close to the processor, cache memory needs to be much smaller than main
memory. Consequently, it has less storage space. It is also more expensive than main
memory, as it is a more complex chip that yields higher performance.
What it sacrifices in size and price, it makes up for in speed. Cache memory operates
between 10 to 100 times faster than RAM, requiring only a few nanoseconds to respond
to a CPU request.

The name of the actual hardware that is used for cache memory is high-speed static
random-access memory (SRAM). The name of the hardware that is used in a computer's
main memory is dynamic random-access memory (DRAM).

Cache memory is not to be confused with the broader term cache. Caches are temporary
stores of data that can exist in both hardware and software. Cache memory refers to the
specific hardware component that allows computers to create caches at various levels of
the network.

Types of cache memory

Cache memory is fast and expensive. Traditionally, it is categorized as "levels" that describe its
closeness and accessibility to the microprocessor. There are three general cache levels:

L1 cache, or primary cache, is extremely fast but relatively small, and is usually embedded in
the processor chip as CPU cache.

L2 cache, or secondary cache, is often more capacious than L1. L2 cache may be embedded on
the CPU, or it can be on a separate chip or coprocessor and have a high-speed alternative
system bus connecting the cache and CPU. That way it doesn't get slowed by traffic on the
main system bus.

Level 3 (L3) cache is specialized memory developed to improve the performance of L1 and
L2. L1 or L2 can be significantly faster than L3, though L3 is usually double the speed of
DRAM. With multicore processors, each core can have dedicated L1 and L2 cache, but they
can share an L3 cache. If an L3 cache references an instruction, it is usually elevated to a higher
level of cache.

In the past, L1, L2 and L3 caches have been created using combined processor and
motherboard components. Recently, the trend has been toward consolidating all three levels of
memory caching on the CPU itself. That's why the primary means for increasing cache size has
begun to shift from the acquisition of a specific motherboard with different chipsets and bus
architectures to buying a CPU with the right amount of integrated L1, L2 and L3 cache.

Contrary to popular belief, implementing flash or more dynamic RAM (DRAM) on a system
won't increase cache memory. This can be confusing since the terms memory caching (hard
disk buffering) and cache memory are often used interchangeably. Memory caching, using
DRAM or flash to buffer disk reads, is meant to improve storage I/O by caching data that is
frequently referenced in a buffer ahead of slower magnetic disk or tape. Cache memory, on the
other hand, provides read buffering for the CPU.

A diagram of the architecture and data flow of a typical cache memory unit.

Cache memory mapping


Caching configurations continue to evolve, but cache memory traditionally works under three
different configurations:

 Direct mapped cache has each block mapped to exactly one cache memory location.


Conceptually, a direct mapped cache is like rows in a table with three columns: the cache
block that contains the actual data fetched and stored, a tag with all or part of the address
of the data that was fetched, and a flag bit that shows the presence in the row entry of a
valid bit of data.

 Fully associative cache mapping is similar to direct mapping in structure but allows a
memory block to be mapped to any cache location rather than to a prespecified cache
memory location as is the case with direct mapping.

 Set associative cache mapping can be viewed as a compromise between direct mapping


and fully associative mapping in which each block is mapped to a subset of cache
locations. It is sometimes called N-way set associative mapping, which provides for a
location in main memory to be cached to any of "N" locations in the L1 cache.
Data writing policies
Data can be written to memory using a variety of techniques, but the two main ones involving
cache memory are:

 Write-through. Data is written to both the cache and main memory at the same time.
 Write-back. Data is only written to the cache initially. Data may then be written to
main memory, but this does not need to happen and does not inhibit the interaction
from taking place.

The way data is written to the cache impacts data consistency and efficiency. For example,
when using write-through, more writing needs to happen, which causes latency upfront. When
using write-back, operations may be more efficient, but data may not be consistent between the
main and cache memories.

One way a computer determines data consistency is by examining the dirty bit in memory. The
dirty bit is an extra bit included in memory blocks that indicates whether the information has
been modified. If data reaches the processor's register file with an active dirty bit, it means that
it is not up to date and there are more recent versions elsewhere. This scenario is more likely to
happen in a write-back scenario, because the data is written to the two storage areas
asynchronously.

Specialization and functionality


In addition to instruction and data caches, other caches are designed to provide specialized
system functions. According to some definitions, the L3 cache's shared design makes it a
specialized cache. Other definitions keep the instruction cache and the data cache separate and
refer to each as a specialized cache.

Translation lookaside buffers (TLBs) are also specialized memory caches whose function is to
record virtual address to physical address translations.

Still other caches are not, technically speaking, memory caches at all. Disk caches, for instance,
can use DRAM or flash memory to provide data caching similar to what memory caches do
with CPU instructions. If data is frequently accessed from the disk, it is cached into DRAM or
flash-based silicon storage technology for faster access time and response.

Specialized caches are also available for applications such as web browsers, databases, network
address binding and client-side Network File System protocol support. These types of caches
might be distributed across multiple networked hosts to provide greater scalability or
performance to an application that uses them.
A depiction of the memory hierarchy and how it functions

Locality
The ability of cache memory to improve a computer's performance relies on the concept of
locality of reference. Locality describes various situations that make a system more predictable.
Cache memory takes advantage of these situations to create a pattern of memory access that it
can rely upon.

There are several types of locality. Two key ones for cache are:

 Temporal locality. This is when the same resources are accessed repeatedly in a


short amount of time.
 Spatial locality. This refers to accessing various data or resources that are near
each other.

Performance
Cache memory is important because it improves the efficiency of data retrieval. It stores
program instructions and data that are used repeatedly in the operation of programs or
information that the CPU is likely to need next. The computer processor can access this
information more quickly from the cache than from the main memory. Fast access to these
instructions increases the overall speed of the program.
Aside from its main function of improving performance, cache memory is a valuable resource
for evaluating a computer's overall performance. Users can do this by looking at cache's hit-to-
miss ratio. Cache hits are instances in which the system successfully retrieves data from the
cache. A cache miss is when the system looks for the data in the cache, can't find it, and looks
somewhere else instead. In some cases, users can improve the hit-miss ratio by adjusting the
cache memory block size -- the size of data units stored.  

Improved performance and ability to monitor performance are not just about improving general
convenience for the user. As technology advances and is increasingly relied upon in mission-
critical scenarios, having speed and reliability becomes crucial. Even a few milliseconds of
latency could potentially lead to enormous expenses, depending on the situation.

A chart comparing cache memory to other memory types

Cache vs. main memory

DRAM serves as a computer's main memory, performing calculations on data retrieved from
storage. Both DRAM and cache memory are volatile memories that lose their contents when the
power is turned off. DRAM is installed on the motherboard, and the CPU accesses it through a
bus connection.

DRAM is usually about half as fast as L1, L2 or L3 cache memory, and much less expensive. It
provides faster data access than flash storage, hard disk drives (HDD) and tape storage. It came
into use in the last few decades to provide a place to store frequently accessed disk data to
improve I/O performance.

DRAM must be refreshed every few milliseconds. Cache memory, which also is a type of
random-access memory, does not need to be refreshed. It is built directly into the CPU to give
the processor the fastest possible access to memory locations and provides nanosecond speed
access time to frequently referenced instructions and data. SRAM is faster than DRAM, but
because it's a more complex chip, it's also more expensive to make.

An example of dynamic RAM.

Cache vs. virtual memory

A computer has a limited amount of DRAM and even less cache memory. When a large program
or multiple programs are running, it's possible for memory to be fully used. To compensate for a
shortage of physical memory, the computer's operating system (OS) can create virtual memory.

To do this, the OS temporarily transfers inactive data from DRAM to disk storage. This approach
increases virtual address space by using active memory in DRAM and inactive memory in HDDs
to form contiguous addresses that hold both an application and its data. Virtual memory lets a
computer run larger programs or multiple programs simultaneously, and each program operates as
though it has unlimited memory.

In order to copy virtual memory into physical memory, the OS divides memory into page files or
swap files that contain a certain number of addresses. Those pages are stored on a disk and when
they're needed, the OS copies them from the disk to main memory and translates the virtual
memory address into a physical one. These translations are handled by a memory management unit
(MMU).

Implementation and history

Mainframes used an early version of cache memory, but the technology as it is known today began
to be developed with the advent of microcomputers. With early PCs, processor performance
increased much faster than memory performance, and memory became a bottleneck, slowing
systems.

In the 1980s, the idea took hold that a small amount of more expensive, faster SRAM could be
used to improve the performance of the less expensive, slower main memory. Initially, the memory
cache was separate from the system processor and not always included in the chipset. Early PCs
typically had from 16 KB to 128 KB of cache memory.

With 486 processors, Intel added 8 KB of memory to the CPU as Level 1 (L1) memory. As much
as 256 KB of external Level 2 (L2) cache memory was used in these systems. Pentium processors
saw the external cache memory double again to 512 KB on the high end. They also split the
internal cache memory into two caches: one for instructions and the other for data.

Processors based on Intel's P6 microarchitecture, introduced in 1995, were the first to incorporate
L2 cache memory into the CPU and enable all of a system's cache memory to run at the same clock
speed as the processor. Prior to the P6, L2 memory external to the CPU was accessed at a much
slower clock speed than the rate at which the processor ran and slowed system performance
considerably.

Early memory cache controllers used a write-through cache architecture, where data written into
cache was also immediately updated in RAM. This approached minimized data loss, but also
slowed operations. With later 486-based PCs, the write-back cache architecture was developed,
where RAM isn't updated immediately. Instead, data is stored on cache and RAM is updated only
at specific intervals or under certain circumstances where data is missing or old.

You might also like