Computer Architecture 1.1
Computer Architecture 1.1
By:
Nguyen Thi Hoang Lan
Computer Architecture
By:
Nguyen Thi Hoang Lan
Online:
< http://cnx.org/content/col10761/1.1/ >
CONNEXIONS
Course Information
1
1.1 Syllabus
1.1.5 5. Prerequisites
Circuits and Electronics, Digital Electronics, Digital System, Analysis and Design of Digital Integrated
Circuits, Computation Structures, Programming.
1
2 CHAPTER 1. COURSE INFORMATION
• Multilevel Memories;
• Cache Memory;
• Internal Memory;
• External Memory;
• Input/Output;
• Operating System Support;
• Virtual Memory;
• Advanced Architectures.
1.1.9 9. Assessment
• Mid-term grade: 0.4 (40%)
2
1.2 Calendar
Course Calendar
Table 1.1
3
1.3 Exercise
1. -720
2. 0.645
Lecture Notes
2.1 Module 1: Introduction to Organization and Architecture Of
1
Computer
7
8 CHAPTER 2. LECTURE NOTES
a set of components and their interrelationships. The behavior at each level depends only on a simplied,
abstracted characterization of the system at the nest lower level. At each level, the designer is concerned
with structure and function:
In term of description, we have to choices: starting ant the bottom and building up to a complete description
or with a top view and decomposing the system, describing their structure and function, and proceed to
successively lower layer of the hierarchy. The approach taken is this course follows the latter.
Functions
In general terms, there are four main functions of a computer:
• Data processing
• Data storage
• Data movement
• Control
The computer, of course, must be able to process data. The data may take a wide variety of forms, and
the range of processing requirements is broad. However, we shall see that there are only a few fundamental
methods or types of data processing.
It is also essential that a computer store data. Event if the computer is processing data on the y (i.e.,
data come in and get processed, and the results go right out), the computer must temporarily store at least
those pieces of data that are being worked on at any given moment. Thus, there is at least a short-term
data storage function. Files of data are stored on the computer for subsequent retrieval and update.
The computer must be able to move data between itself and the outside world. The computer's operating
environment consists of devices that serve as either sources or destinations of data. When data are received
from or delivered to a device that is directly connected to the computer, the process id known as input-output
(I/O), and the device is referred to as a peripheral. When data are moved over longer distances, to or from
a remote device, the process is known as data communications.
Finally, there must be control of there three functions. Ultimately, this control is exercised by the
individual who provides the computer with instructions. Within the computer system, a control unit man-
ages the computer's resources and orchestrates the performance of its functional parts in response to those
instructions.
At this general level of discussion, the number of possible operations that can be performed is few. The
Figure 2.2 depicts the four possible types of operations. The computer can function as a data movement
device (Figure 2.2(a)), simply transferring data from one peripheral or communications line to another. It can
also function as a data storage device (Figure 2.2(b)), with data transferred from the external environment
to computer storage (read) and vice versa (write). The nal two diagrams show operations involving data
processing, on data either in storage or in route between storage and the external environment.
Structure
Figure 2.3 is the simplest possible depiction of a computer. The computer is an entity that interacts in
some fashion with its external environment. In general, all of its linkages to the external environment can
be classied as peripheral devices or communication lines. We will have something to say about both types
of linkages.
• Central Processing Unit (CPU): Controls the operation of the computer and performs its data pro-
cessing functions. Often simply referred to as processor.
• Main Memory: Stores data.
• I/O: Moves data between the computer and its external environment.
• System Interconnection: Some mechanism that provides for communication among CPU, main memory,
and I/O.
There may be one or more of each of the above components. Traditionally, there has been just a single
CPU. In recent years, there has been increasing use of multiple processors, in a single system. Each of these
components will be examined in some detail in later lectures. However, for our purpose, the most interesting
and in some ways the most complex component is the CPU; its structure is depicted in Figure 2.4. Its major
structural components are:
• Control Unit (CU): Controls the operation of the CPU and hence the computer.
• Arithmetic and Logic Unit (ALU): Performs computer's data processing functions.
• Register: Provides storage internal to the CPU.
• CPU Interconnection: Some mechanism that provides for communication among the control unit, ALU,
and register.
The ENIAC (Electronic Numerical Integrator And Computer), designed by and constructed under the
supervision of Jonh Mauchly and John Presper Eckert at the University of Pennsylvania, was the world's rst
general-purpose electronic digital computer. The project was a response to U.S. wartime needs. Mauchly, a
professor of electrical engineering at the University of Pennsylvania and Eckert, one of his graduate students,
proposed to build a general-purpose computer using vacuum tubes. In 1943, this proposal was accepted by the
Army, and work began on the ENIAC. The resulting machine was enormous, weighting 30 tons, occupying
15,000 square feet of oor space, and containing more than 18,000 vacuum tubes. When operating, it
consumed 140 kilowatts of power. It was aloes substantially faster than any electronic-mechanical computer,
being capable of 5000 additions per second.
The ENIAC was decimal rather than a binary machine. That is, numbers were represented in decimal
form and arithmetic was performed in the decimal system. Its memory consisted of 20 accumulators, each
capable of holding a 10-digit decimal number. Each digit was represented by a ring of 10 vacuum tubes. At
any time, only one vacuum tube was in the ON state, representing one of the 10 digits. The major drawback
of the ENIAC was that it had to be programmed manually by setting switches and plugging and unplugging
cables.
The ENIAC was completed in 1946, too late to be used in the war eort. Instead, its rst task was to
perform a series of complex calculations that were used to help determine the feasibility of the H-bomb. The
ENIAC continued to be used until 1955.
As was mentioned, the task of entering and altering programs for the ENIAC was extremely tedious. The
programming process could be facilitated if the program could be represented in a form suitable for storing
in memory alongside the data. Then, a computer could get its instructions by reading them from memory,
and a program could be set of altered by setting the values of a portion of memory.
This idea, known as the Stored-program concept, is usually attributed to the ENIAC designers, most
notably the mathematician John von Neumann, who was a consultant on the ENIAC project. The idea was
also developed at about the same time by Turing. The rst publication of the idea was in a 1945 proposal
by von Neumann for a new computer, the EDVAC (Electronic Discrete Variable Computer).
In 1946, von Neumann and his colleagues began the design of a new stored-program computer, referred
to as the IAS computer, at the Princeton Institute for Advanced Studies. The IAS computer, although not
completed until 1952, is the prototype of all subsequent general-purpose computers. Figure 2.5 shows the
general structure of the IAS computer. It consists of:
• Commercial Computers
The 1950s saw the birth of the computer industry with two companies, Sperry and IBM, dominating the
marketplace.
In 1947, Eckert and Mauchly formed the Eckert-Maunchly computer Corporation to manufacture com-
puters commercially. Their rst successful machine was the UNIVAC I (Universal Automatic Computer),
which was commissioned by the Bureau of the Census for the 1950 calculations. The Eckert-Maunchly
Computer Corporation became part of the UNIVAC division of Sperry-Rand Corporation, which went on to
build a series of successor machines.
The UNIVAC II, which had greater memory capacity and higher performance than the UNIVAC I, was
delivered in the late 1950s and illustrates several trends that have remained characteristic of the computer
industry. First, advances in technology allow companies to continue to build larger, more powerful computers.
Second, each company tries to make its new machines upward compatible with the older machines. This
means that the programs written for the older machines can be executed on the new machine. This strategy
is adopted in the hopes of retaining the customer base; that is, when a customer decides to buy a newer
machine, he is likely to get it from the same company to avoid losing the investment in programs.
The UNIVAC division also began development of the 1100 series of computers, which was to be its bread
and butler. This series illustrates a distinction that existed at one time. The rst model, the UNIVAC
1103, and its successors for many years were primarily intended for scientic applications, involving long and
complex calculations. Other companies concentrated on business applications, which involved processing
large amounts of text data. This split has largely disappeared but it was evident for a number of years.
IBM. which was then the major manufacturer of punched-card processing equipment, delivered its rst
electronic stored-program computer, the 701, in 1953. The 70l was intended primarily for scientic appli-
cations. In 1955, IBM introduced the companion 702 product, which had a number of hardware features
that suited it to business applications. These were the rst of a long series of 700/7000 computers that
established IBM as the overwhelmingly dominant computer manufacturer.
generation of computers. Perhaps the two most important members of the third generation are the IBM
System/360 and the DEC PDP-8.
Suppose that there is a small set of basic logic components that can be combined in various ways to store
binary data and to perform arithmetic and logical operations on that data. If there is a particular computa-
tion to be performed, a conguration of logic components designed specically for that computation could be
constructed. We can think of the process of connecting the various components in the desired conguration
as a form of programming. The resulting "program"' is in the form of hardware and is termed a hardwired
program.
Now consider other alternative. Suppose we construct a general-purpose conguration of arithmetic and
logic functions, this set of hardware will perform various functions on data depending on control signals
applied to the hardware. In the original case of customized hardware, the system accepts data and produces
results (Figure 1a). With general-purpose hardware, the system accepts data and control signals and pro-
duces results. Thus, instead of rewiring the hardware for each new program, The programmer merely needs
to supply a new set of control signals.
How shall control signals be supplied? The answer is simple but subtle. The entire program is actually
a sequence of steps. At each step, some arithmetic or logical operation is performed on some data. For each
step, a new set of control signals is needed. Let us provide a unique code for each possible set of control
signals, and let us add to the general-purpose hardware a segment that can accept a code and generate
control signals (Figure 1b).
Programming is now much easier. Instead of rewiring the hardware for each new program, all we need
to do is provide a new sequence of codes. Each code is, in eect, an instruction, and part of the hardware
interprets each instruction and generates control signals. To distinguish this new method of programming,
a sequence of codes or instructions is called software.
Figure 2.6
by the system. A means of reporting results is needed, and this is in the form of an output module. Taken
together, these are referred to as I/O components.
One more component is needed. An input device will bring instructions and data in sequentially, but a
program is not invariably executed sequentially: it may jump around. Similarly, operations on data may
require access to more than just one element at a lime in a predetermined sequence. Thus, There must be a
place to store temporarily both instructions and data. That module is called memory, or main memory to
distinguish it from external storage or peripheral devices. Von Neumann pointed out that the same memory
could be used to store both instructions and data.
Figure 2 illustrates these top-level components and suggests the interactions among them. The CPU
exchanges data with memory. For this purpose, it typically makes use of two internal (to the CPU) registers:
a memory address register (MAR), which species the address in memory for the next read or write, and a
memory buer register (MBR), which contains the data to be written into memory or receives the data read
from memory. Similarly, an I/O address register (I/OAR) species a particular I/O device. An I/O buer
register (I/OBR) is used for the exchange of data between an I/O module and the CPU.
A memory module consists of a set of locations, dened by sequentially numbered addresses. Each
location contains a binary number that can be interpreted as either an instruction or data. An I/O module
transfers data from external devices to CPU and memory, and vice versa. It contains internal buers for
temporarily holding these data until they can be sent on.
Having looked briey al these major components, we now turn to an overview of how these components
function together to execute programs.
Figure 2.7
The processing required for a single instruction is called an instruction cycle. Using the simplied two-step
description given previously, the instruction cycle is depicted in Figure 3
Figure 2.8
At the beginning of each instruction cycle, the processor fetches an instruction from memory. In a typical
processor, a register called the program counter (PC) holds the address of the instruction to be fetched next.
Unless told otherwise, the processor always increments the PC after each instruction fetch so that it will
fetch the next instruction in sequence. The fetched instruction is loaded into a register in the processor
known as the instruction register (IR). The instruction contains bits that specify the action the processor is
to take. The processor interprets the instruction and performs the required action.
• Processor-memory: Data may be transferred from processor to memory or from memory to processor.
• Processor-I/O: Data may be transferred to or from a peripheral device by transferring between the
processor and an I/O module.
• Data processing: The processor may perform some arithmetic or logic operation on data.
• Control: An instruction may specify that the sequence of execution be altered.
• An instruction's execution may involve a combination of these actions.
Figure 2.9
• Instruction address calculation (iac): Determine the address of the next instruction to be executed.
Usually, this involves adding a xed number to the address of the previous instruction. For example, if
each instruction is 16 bits long and memory is organized into 16-bit words, then add 1 to the previous
address. If, instead, memory is organized as individually addressable 8-bit bytes, then add 2 to the
previous address.
• Instruction fetch (if ): Read instruction from its memory location into the processor.
• Instruction operation decoding (iod): Analyze instruction to determine type of operation to he per-
formed and operand(s) to be used.
• Operand address calculation (oac): If the operation involves reference to an operand in memory or
available via I/O. then determine the address of the operand.
• Operand fetch (of ): Fetch the operand from memory or read it in from I/O,
• Data operation (do): Perform the operation indicated in the instruction.
• Operand store (os): Write the result into memory or out to I/O
processing eciency.
For example, most external devices are much slower than the processor. Suppose that the processor is
transferring data to a printer using the instruction cycle scheme of Figure 3. After each write operation, the
processor must pause and remain idle until the printer catches up. The length of this pause may be on the
order of many hundreds or even thousands of instruction cycles that do not involve memory. Clearly, this is
a very wasteful use of the processor. The gure 5a illustrates this state of aairs.
Figure 2.10
• A sequence of instructions, labeled 4 in the gure, to prepare for the actual I/O operation. This may
include copying the data to be output into a special buer and preparing the parameters for a device
command.
• The actual I/O command. Without the use of interrupts, once this command is issued, the program
must wait for the I/O device to perform the requested function (or periodically poll the device). The
program might wail by simply repeatedly performing a test operation to determine if the I/O operation
is done.
• A sequence of instructions, labeled 5 in the gure, to complete the operation. This may include setting
a ag indicating the success or failure of the operation.
Because the I/O operation may lake a relatively long time to complete. The I/O program is hung up wailing
for the operation to complete; hence. The user program is slopped at the point of the WRITE call for some
considerable period of time
Figure 2.11
• It suspends execution of the current program being executed and saves its context. This means saving
the address of the next instruction to be executed (current contents of the program counter) and any
other data relevant to the processor's current activity.
• It sets the program counter to the starting address of an interrupt handler routine.
Figure 2.12
Figure 2.13
The rst is to disable interrupts while an interrupt is being processed. A disabled interrupt simply means
that the processor can and will ignore that interrupt request signal. If an interrupt occurs during this time,
it generally remains pending and will be cheeked by the processor after the processor has enabled interrupts.
Thus, when a user program is executing and an interrupt occurs, interrupts are disabled immediately. After
the interrupt handler routine completes, interrupts are enabled before resuming the user program, and the
processor checks to see if additional interrupts have occurred. This approach is nice and simple, as interrupts
are handled in strict sequential order (Figure 2.10).
Figure 2.14
Figure 2.15
• Memory
• Input/Output
• CPU
• Memory to processor: The processor reads an instruction or a unit of data from memory.
• Processor to memory: The processor writes a unit of data to memory.
• I/O to processor: The processor reads data from an I/O device via an I/O module.
Over the years, a number of interconnection structures have been tried. By far the most common is the bus
and various multiple-bus structures.
· Memory write: Causes data on the bus to be written into the addressed location.
· Memory read: Causes data from the addressed location to be placed on the bus.
· I/O write: Causes data on the bus to be output to the addressed I/O port.
· I/O read: Causes data from the addressed I/O port to be placed on the bus.
· Transfer ACK: Indicates that data have been accepted from or placed on the bus.
· Bus request: Indicates that a module needs to gain control of the bus.
· Bus grant: Indicates that a requesting module has been granted control of the bus.
· Interrupt request: Indicates that an interrupt is pending.
· Interrupt ACK: Acknowledges that the pending interrupt has been recognized.
· Clock: Used to synchronize operations.
· Reset: Initializes all modules.
Figure 2.17
• In general, the more devices attached to the bus, the greater the bus length and hence the greater the
propagation delay. This delay determines the time it takes for devices to coordinate the use of the bus.
When control of the bus passes from one device to another frequently, these propagation delays can
noticeably aect performance.
• The bus may become a bottleneck as the aggregate data transfer demand approaches the capacity of
the bus. This problem can be countered to some extent by increasing the data rate that the bus can
carry and by using wider buses (e.g., increasing the data bus from 32 to 64 bit). However, because the
data rates generated by attached devices (e.g.. graphics and video controllers, network interfaces) are
growing rapidly, this is a race that a single bus is ultimately destined to lose.
Accordingly, most computer systems use multiple buses, generally laid out in a hierarchy. A typical tradi-
tional structure is shown in Figure 13. There is a local bus that connects the processor to a cache memory
and that may support one or more local devices. The cache memory controller connects the cache not only
to this local bus, but to a system bus to which are attached all of the main memory modules.
It is possible to connected I/O controllers directly onto the system bus. A more ecient solution is to
make use of one or more expansion buses for this purpose. An expansion bus interface buers data transfers
between the system bus and the I/O controllers on the expansion bus. This arrangement allows the system
to support a wide variety of I/O devices and at the same time insulate memory-to-processor trac from I/O
trac.
Traditional (ISA) (with cache):
Figure 2.18
2.2.4.3.3 Timing:
Synchronous
Asynchronous
3
2.3 Module 3: Computer Arithmetic
Figure 2.19
The ones complement of a number is represented by ipping the number's bits one at a time. For example,
the value 01001001 becomes 10110110.
Suppose you are working with B-bit numbers. Then the ones complement for the number N is 2B -1 -N
• Twos complement
Like sign magnitude, twos complement representation uses the most signicant bit as a sign bit, making
it easy to test whether an integer is positive or negative. It diers from the use of the sign-magnitude
representation in the way that the other bits are interpreted.
Consider an n-bit integer. A, in twos complement representation. If A is positive, then the sign bit an-1 is
zero. The remaining, bits represent the magnitude of the number in the same fashion as for sign magnitude:
Pn−2
A= i=0 ai 2i If an−1 = 0
The number zero is identied as positive and therefore has a 0 sign bit and a magnitude of all 0s. We
can see that the range of positive integers that may he represented is from 0 (all of the magnitude bits are
0) through - 1 (all of the magnitude bits are 1). Any larger number would require more bits.
For a negative number A (A < 0), the sign bit, is one. The remaining n-1 bits can take on any one of
an−1 2n−1 values. Therefore, the range of negative integers that can be represented is from -1 to - 2n−1
This is the convention used in twos complement representation, yielding the following expression for
negative and positive numbers:
Pn−2
A = −2n−1 an−1 + i=0 ai 2i
The range of A is from - 2n−1 to 2n−1 -1.
Example: Using 8 bit to represent
+50= 0011 0010
-70=1011 1010
Figure 2.20
Figure 2.21
Figure 2.22
Figure 2.23
Figure 2.24
Figure 2.25
• Multiply as above
• If signs were dierent, negate answer
2.3.3.4.2 Solution 2:
• Booth's algorithm:
Figure 2.26
Figure 2.27
Figure 2.28
(A xed value, called the bias, is subtracted from the biased exponent eld to get the true exponent value
(E). Typically, the bias equal 2k−1 − 1, where k is the number of bits in the binary exponent)
• The base B is implicit and need not be stored because it is the same for all numbers.
Figure 2.29
Not all bit patterns in the IEEE formats are interpreted in die usual way; instead, some bit patterns are
used to represent special values. Three special cases arise:
Figure 2.30
Single-precision 32 bit
A single-precision binary oating-point number is stored in 32 bits.
The number has value v:
v = s × 2e × m
Where
s = +1 (positive numbers) when the sign bit is 0
s = −1 (negative numbers) when the sign bit is 1
e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")
m = 1.fraction in binary (that is, the signicand is the binary number 1 followed by the radix point
followed by the binary bits of the fraction). Therefore, 1 ≤ m < 2.
Figure 2.31
• X1 ± X2 = M1 ∗ RE1−E2 RE2 (assume E1 E2)
• X1 ∗ X2 = (M1 ∗ M2) RE1+E2
• X1/X2 = (M1/M2) RE1−E2
For addition and subtraction, it is necessary lo ensure that both operands have the same exponent value.
I his may require shifting the radix point on one of the operands to achieve alignment. Multiplication and
division are more straightforward.
A oating-point operation may produce one of these conditions:
• Exponent overow: A positive exponent exceeds the maximum possible exponent value. In some
systems, this may be designated as
• Exponent underow: A negative exponent is less than the minimum possible exponent value (e.g.. -200
is less than -127). This means that the number is too small to be represented, and it may be reported
as 0.
• Signicand underow: In the process of aligning signicands, digits may ow o the right end of the
signicand. Some form of rounding is required.
• Signicand overow: The addition of two signicands of the same sign may result in a carry out of the
most signicant bit. This can be xed by realignment.
4
2.4 Module 4: Instruction Set: Characteristics and Functions
• Operation code: Species the operation to be performed (e.g.. ADD, I/O). The operation is specied
by a binary code, known as the operation code, or opcode.
• Source operand reference: The operation may involve one or more source operands, that is, operands
that are inputs for the operation.
• Result operand reference: The operation may produce a result.
• Next instruction reference: This tells the CPU where to fetch the next instruction after the execution
of this instruction is complete.
The next instruction to be fetched is located in main memory or, in the case of a virtual memory system, in
either main memory or secondary memory (disk). In most cases, the next instruction to be fetched imme-
diately follows the current instruction. In those cases, there is no explicit reference to the next instruction.
Source and result operands can be in one of three areas:
• Main or virtual memory: As with next instruction references, the main or virtual memory address
must be supplied.
• CPU register: With rare exceptions, a CPU contains one or more registers that may be referenced
by machine instructions. If only one register exists, reference to it may be implicit. If more than one
register exists, then each register is assigned a unique number, and the instruction must contain the
number of the desired register.
• I/O device: The instruction must specify (he I/O module and device for the operation. If memory-
mapped I/O is used, this is just another main or virtual memory address.
Figure 2.32
instructions. Opcodes are represented by abbreviations, called mnemonics, that indicate the operation.
Common examples include
ADD Add
SUB Subtract
MPY Multiply
DIV Divide
Table 2.1
Figure 2.33
With this simple example to guide us, let us consider the types of instructions that must be included in
a practical computer. A computer should have a set of instructions that allows the user to formulate any
data processing task. Another way to view it is to consider the capabilities of a high-level programming
language. Any program written in a high-level language must be translated into machine language to be
executed. Thus, the set of machine instructions must be sucient to express any of the instructions from a
high-level language. With this in mind we can categorize instruction types as follows:
• Three addresses:
Example: a = b + c
• Three-address instruction formats are not common, because they require a relatively long instruction
format to hold the three address references.
• Two addresses:
Example: a = a + b
• The two-address formal reduces the space requirement but also introduces some awkwardness. To
avoid altering the value of an operand, a MOVE instruction is used to move one of the values to a
result or temporary location before performing the operation.
• One addresses:
• a second address must be implicit. This was common in earlier machines, with the implied address
being a CPU register known as the accumulator. or AC. The accumulator contains one of the operands
and is used to store the result.
• Zero addresses
• Zero-address instructions are applicable to a special memory organization, called a Stack. A stack is a
last-in-rst-out set of locations.
• Fewer addresses per instruction result in more primitive instructions, which requires a less complex
CPU.
• It also results in instructions of shorter length. On the other hand, programs contain more total
instructions, which in general results in longer execution times and longer, more complex programs
Multiple-address instructions:
• Operation repertoire: How many and which operations to provide, and how complex operations should
be
• Data types: The various types of data upon which operations are performed
• Instruction format: Instruction length (in bits), number of addresses, size of various elds, and so on.
• Registers: Number of CPU registers that can be referenced by instructions, and their use.
• Addressing: The mode or modes by which the address of an operand is specied
• Addresses
• Numbers
• Characters
• Logical data
• First, we may sometimes wish to store an array of Boolean or binary data items, in which each item can
take on only the values I (true) and II (fake). With logical data, memory can be used most eciently
for this storage.
• Second, there are occasions when we wish to manipulate the bits of a data item.
• Data transfer
• Arithmetic
• Logical
• Conversion
• I/O
• System control
• Transfer of control
• The location of the source and destination operands must be specied. Each location could be memory.
a register, or the lop of the stack.
In term of CPU action, data transfer operations are perhaps the simplest type. If both source and destination
are registers, then the CPU simply causes data to be transferred from one register to another; this is an
operation internal to the CPU. If one or both operands are in memory, then (he CPU must perform some
or all of following actions:
1. Calculate the memory address, based on the address mode
2. If the address refers to virtual memory, translate from virtual to actual memory address.
3. Determine whether the addressed item is in cache.
4. If not, issue a command lo the memory module.
Example:
Table 2.2
Figure 2.34
Figure 2.35
The above gure illustrates the use of procedures to construct a program. In this example, there is a
main program starting at location 4000. This program includes a call to procedure PROC1, starting at
location 4500. When this call instruction is encountered, the CPU suspends execution of the main program
and begins execution of PROC1 by fetching the next instruction from location 4500. Within PROC1, there
are two calls to PR0C2 at location 4800. In each case, the execution of PROC1 is suspended and PROC2
is executed. The RETURN statement causes the CPU to go back to the calling program and continue
execution at the instruction after the corresponding CALL instruction. This behavior is illustrated in the
right of this gure.
Several points are worth noting:
1. A procedure can be called from more than one location.
2. A procedure call can appear in a procedure. This allows the nesting of procedures to an arbitrary
depth.
3. Each procedure call is matched by a return in the called program.
Because we would like to be able to call a procedure from a variety of points, the CPU must somehow
save the return address so that the return can take place appropriately. There are three common places for
storing the return address:
• Register
• Start of called procedure
• Top of stack
• Immediate
• Direct
• Indirect
• Register
• Register indirect
• Displacement
Figure 2.36
The advantage of immediate addressing is that no memory reference other than the instruction fetch is
required to obtain the operand, thus saving one memory or cache cycle in the instruction cycle.
The disadvantage is that the size of the number is restricted to the size of the address eld, which, in
most instruction sets, is small compared with the word length.
Figure 2.37
The technique was common in earlier generations of computers but is not common on contemporary archi-
tectures. It requires only one memory reference and no special calculation. The obvious limitation is that it
provides only a limited address space.
Figure 2.38
Figure 2.39
The disadvantage of register addressing is that the address space is very limited.
Figure 2.40
The advantages and limitations of register indirect addressing are basically the same as for indirect
addressing. In both cases, the address space limitation (limited range of addresses) of the address eld is
overcome by having that eld refer to a word-length location containing an address. In addition, register
indirect addressing uses one less memory reference than indirect addressing.
• A = base value
• R = register that holds displacement
Figure 2.41
5
2.5 Module 5: CPU Structure and Functions
Processor Organization
To understand the organization of the CPU, let us consider the requirements placed on the CPU, the
things that it must do:
To do these things, it should be clear that the CPU needs to store some data temporarily. It must remember
the location of the last instruction so that it can know where to get the next instruction. It needs to store
instructions and data temporarily while an instruction is being executed. In other words, the CPU needs a
small internal memory.
Figure 2.42 is a simplied view of a CPU, indicating its connection to the rest of the system via the
system bus. You will recall (Lecture 1) that the major components of the CPU are an arithmetic and
logic unit (ALU) and a control unit (CU). The ALU does the actual computation or processing of data.
The control unit controls the movement of data and instructions into and out of the CPU and controls the
5 This content is available online at <http://cnx.org/content/m29369/1.1/>.
operation of the ALU. In addition, the gure shows a minimal internal memory, consisting of a set of storage
locations, called registers.
Figure 2.43 is a slightly more detailed view of the CPU. The data transfer and logic control paths
are indicated, including an element labeled internal CPU-bus. This element is needed to transfer data
between the various registers and the ALU because the ALU in fact operates only on data in the internal
CPU memory.
There is not a clean separation of registers into these two categories. For example, on some machines the
program counter is user visible (e.g., Pentium), but on many it is not (e.g., PowerPC). For purposes of the
following discussion, however, we will use these categories.
• General purpose
• Data
• Address
• Condition codes
General-purpose registers: can be assigned to a variety of functions by the programmer. Sometimes their
use within the instruction set is orthogonal to the operation. That is, any generalpurpose register can contain
the operand for any opcode. This provides true general-purpose register use. Often, however, there are
restrictions. For example, there may be dedicated registers for oating-point and stack operations. In some
cases, general-purpose registers can be used for addressing functions (e.g.. register indirect, displacement).
In other cases, there is a partial or clean separation between data registers and address registers.
Data registers may be used only to hold data and cannot be employed in the calculation of an operand
address.
Address registers may themselves be somewhat general purpose, or they may be devoted to a particular
addressing mode. Examples include the following:
• Segment pointers: In a machine with segmented addressing, a segment register holds the address of
the base of the segment. There may be multiple registers: for example, one for the operating system
and one for the current process.
• Index registers: These are used for indexed addressing and may be autoindexed.
• Stack pointer: If there is user-visible stack addressing, then typically the stack is in memory and
there is a dedicated register that points to the top of the slack. This allows implicit addressing; that
is, push, pop, and other slack instructions need not contain an explicit stack operand.
Condition codes register (also referred to as ags): Condition codes are bits set by the CPU hardware
as the result of operations. For example, an arithmetic operation may produce a positive, negative, zero, or
overow result. In addition to the result itself being stored in a register or memory, a condition code is also
set. The code may subsequently be tested as part of a conditional branch operation.
Typically, the CPU updates the PC after each instruction fetch so that the PC always points to the next
instruction to be executed. A branch or skip instruction will also modify the contents of the PC. The fetched
instruction is loaded into an IR, where the opcode and operand speciers are analyzed. Data are exchanged
with memory using the MAR and MBR. In a bus-organized system, the MAR connects directly to the
address bus, and the MBR connects directly to the data bus. User-visible registers, in turn, exchange data
with the MBR.
The four registers just mentioned are used for the movement of data between the CPU and memory.
Within the CPU, data must be presented to the ALU for processing. The ALU may have direct access to the
MBR and user-visible registers. Alternatively, there may be additional buering registers at the boundary
to the ALU: these registers serve as input and output registers for the ALL and exchange data with the
MBR and user-visible registers.
All CPU designs include a register or set of registers, often known as the program status word (PSW),
that contain status information. The PSW typically contains condition codes plus other stains information.
Common elds or ags include the following:
• Sign: Contains the sign bit of the result of the last arithmetic operation.
• Zero: Set when the result is 0.
• Carry: Set if an operation resulted in a carry (addition) into or borrow (sub-traction) out of a high-
order hit. Used for multiword arithmetic operations.
• Equal: Set if a logical compare result is equality.
• Overow: Used to indicate arithmetic overow
• Interrupt enable/disable: Used to enable or disable interrupts.
• Supervisor: Indicates whether the CPU is executing in supervisor or user mode. Certain privileged
instructions can be executed only in supervisor mode, and certain areas of memory can be accessed
only in supervisor mode.
A number of other registers related to status and control might be found in a particular CPU design. In
addition to the PSW, there may be a pointer to a block of memory containing additional status information
(e.g., process control blocks).
Example Register Organizations:
6
2.6 Module 6: Control Unit Operation
2.6.1 1. Micro-Operation
The execution of a program consists of the sequential execution of instructions. Each instruction is executed
during an instruction cycle made up of shorter sub-cycles (e.g., fetch, indirect, execute, interrupt). The
performance of each sub-cycle involves one or more shorter operations, that is, micro-operations.
Micro-operations are the functional, or atomic, operations of a processor. In this section, we will examine
micro-operations to gain an understanding of how the events of any instruction cycle can be described as a
sequence of such micro-operations (Figure 6.1).
Figure 2.45
• Memory address register (MAR): Is connected to the address lines of the system bus. It species the
address in memory for a read or write operation.
• Memory buer register (MBR): Is connected to the data lines of the system bus. It contains the value
to be stored in memory or the last value read from memory.
• Program counter (PC): Holds the address of the next instruction to be fetched.
• Instruction register (IR): Holds the last instruction fetched.
Let us look at the sequence of events for the fetch cycle from the point of view of its eect on the processor
registers. An example appears in Figure 6.2.
Figure 2.46
• At the beginning of the fetch cycle, the address of the next instruction to be executed is in the program
counter (PC); in this case, the address is 1100100.
• The rst step is to move that address to the memory address register (MAR) because this is the only
register connected lo the address lines of the system bus.
• The second step is to bring in the instruction. The desired address (in the MAR) is placed on the
address bus, the control unit issues a READ command on the control bus, and the result appears on
the data bus and is copied into the memory buer register (MBR). We also need to increment the PC
by 1 to get ready for the next instruction. Because these two actions (read word from memory, add 1
to PC) do not interfere with each other, we can do them simultaneously to save time.
• The third step is to move the contents of the MBR to the instruction register (IR). This frees up the
MBR for use during a possible indirect cycle.
Thus, the simple fetch cycle actually consists of three steps and four micro-operations. Each micro-operation
involves the movement of data into or out of a register. So long as these movements do not interfere with
one another, several of them can take place during one step, saving lime. Symbolically, we can write this
sequence of events as follows:
t1: MAR <= (PC)
t2: MBR <= Memory
PC <= (PC) + l
Note that the second and third micro-operations both take place during the second time unit. The third
micro-operation could have been grouped with the fourth without aecting the fetch operation:
t1: MAR <= (PC)
t2: MBR <= Memory
t3: PC <= (PC) + l
IR <= (MBR)
The groupings of micro-operations must follow two simple rules:
1. The proper sequence of events must be followed. Thus (MAR <= (PC)) must precede (MBR <=
Memory) because the memory read operation makes use of the address in the MAR.
2. Conicts must be avoided. One should not attempt to read to and write from the same register in
one time unit, because the results would be unpredictable. For example, the micro-operations (MBR <=
Memory) and (IR <= MBR) should not occur during the same time unit.
A nal point worth noting is that one of the micro-operations involves an addition. To avoid duplication
of circuitry, this addition could be performed by the ALU. The use of the ALU may involve additional
micro-operations, depending on the functionality of the ALU and the organization of the processor.
The address in the PC at the start of the instruction is the address of the next instruction in sequence.
This is saved at the address designated in Ihe IK. The latter address is also incremented to provide the
address of the instruction for the next instruction cycle.
Figure 2.47
Thus, the owchart of Figure 6.3 denes the complete sequence of micro-operations, depending only on
the instruction sequence and the interrupt pattern. Of course, this is a simplied example. The owchart
for an actual processor would be more complex. In any case, we have reached the point in our discussion in
which the operation of the processor is dened as the performance of a sequence of micro-operations. We
can now consider how the control unit causes this sequence to occur.
We have already performed steps 1 and 2. Let us summarize the results. First, the basic functional elements
of the processor are the following:
• ALU
• Registers
• Internal data paths
• External data paths
• Control unit
Some thought should convince you that this is a complete list. The ALU is the functional essence of
the computer. Registers are used to stoic data internal to the processor. Some registers contain status
information needed to manage instruction sequencing (e.g., a program status word). Others contain data
that go to or come from the ALU, memory, and I/O modules. Internal data paths are used to move
data between registers and between register and ALU. External data paths link registers to memory and
I/O modules, often by means of a system bus. The control unit causes operations to happen within the
processor.
The execution of a program consists of operations involving these processor elements. As we have seen,
these operations consist of a sequence of micro-operations. All micro-operations fall into one of the following
categories:
All of the micro-operations needed to perform one instruction cycle, including all of the micro-operations to
execute every instruction in the instruction set, fall into one of these categories.
We can now be somewhat more explicit about the way in which the control unit functions. The control
unit performs two basic tasks:
• Sequencing: The control unit causes the processor lo step through a series of micro-operations in the
proper sequence, based on the program being executed.
• Execution: The control unit causes each micro-operation to be performed.
The preceding is a functional description of what the control unit does. The key to how the control unit
operates is the use of control signals.
Figure 2.48
• Clock: This is how the control unit "keeps time." The control unit causes one micro-operation (or a
set of simultaneous micro-operations) to be performed for each clock pulse. This is sometimes referred
to as the processor cycle time. or the clock cycle lime.
• Instruction register: The opcode of the current instruction is used lo determine which micro-operations
lo perform during the execute cycle.
• Flags: These are needed by the control unit to determine the status of the processor and the outcome
of previous ALU operations. For example, for the increment-and-skip-if-zero (ISZ) instruction, the
control until will increment the PC if the zero ag is set.
• Control signals from control bus: The control bus portion of the system bus pro-vides signals to the
control unit, such as interrupt signals and acknowledgments.
• Control signals within the processor: These are two types: those that cause data to be moved from
one register to another, and those that activate specic ALU functions.
• Control signals to control bus: These are also of two types: control signals lo memory, and control
signals lo the I/O modules.
The new element that has been introduced in this gure is the control signal. Three types of control signals
are used: those that activate an ALU function, those that activate a data path, and those that are signals
on the external system bus or other external interface. All of these signals are ultimately applied directly as
binary inputs lo individual logic gates.
Let us consider again the fetch cycle to see how the control unit maintains control. The control unit
keeps track of where it is in the instruction cycle. At a given point, it knows that the fetch cycle is to be
performed next. The rst step is to transfer the contents of the PC to the MAR. The control unit does this
by activating the control signal that opens the gates between the bits of the PC and the bits of the MAR.
The next step is to read a word from memory into the MBR and increment the PC. The control unit does
this by sending the following control signals simultaneously:
• A control signal that opens gates, allowing the contents of the MAR onto the address bus
• A memory read control signal on the control bus
• A control signal that opens the gates, allowing the contents of the data bus to be stored in the MBR
• Control signals to logic that add 1 to the contents of the PC and store the result back to the PC
Following this, the control unit sends a control signal that opens gates between the MBR and the IR.
This completes the fetch cycle except for one thing: The control unit must decide whether to perform
an indirect cycle or an execute cycle next. To decide this, it examines the IR to see if an indirect memory
reference is made.
The indirect and interrupt cycles work similarly. For the execute cycle, the control unit begins by
examining the opcode and. on the basis of that, decides which sequence of micro-operations to perform for
the execute cycle.
A Control Signals Example
To illustrate the functioning of the control unit, let us examine a simple example. Figure 6.5 illustrates
the example.
Figure 2.49
• Data paths: The control unit controls the internal How of data. For example, on instruction fetch, the
contents of the memory buer register are transferred to the instruction register. For each path to be
controlled, there is a gate (indicated by a circle in the gure). A control signal from the control unit
temporarily opens the gate to let data pass.
• ALU: The control unit controls the operation of the ALU by a set of control signals. These signals
activate various logic devices and gates within the ALU.
• System bus: The control unit sends control signals out onto the control lines of the system bus (e.g.,
memory READ).
The control unit must maintain knowledge of where it is in the instruction cycle. Using this knowledge, and
by reading all of its inputs, the control unit emits a sequence of control signals that causes micro-operations
to occur.
7
2.7 Module 7: Microprogramming
• Micro-operation execution: Each step of the instruction cycle can be decomposed into micro-operation
primitives that are performed in a precise time sequence. Each micro-operation is initiated and con-
trolled based on the use of control signals / lines coming from the control unit.
• Micro-instruction: Each instruction of the processor is translated into a sequence of lower-level micro-
instructions. The process of translation and execution are to as microprogramming
• Microprogramming: The concept of microprogramming was developed by Maurice Wilkes in 1951, using
diode matrices for the memory element. A microprogram consist of a sequence of micro-instructions
in a microprogramming.
• Microprogrammed Control Unit is a relatively logic circuit that is capable of sequencing through micro-
instructions and generating control signal to execute each micro-instruction.
• Control Unit: The control Unit is an important portion of the processor.
The control unit issues control signals external to the processor to cause data echange with memory and I/O
unit. The control Unit issues also control signals internal to the processor to move data between registres,
to perform the ALU and other internal operations in processor. In a hardwired control unit, the control
signals are generated by a micro-instruction are used to controller register transfers and ALU operations.
Control Unit design is then the collection and the implementation of all of the needed control signals for the
micro-instruction executions.
Figure 2.51
• The principle advantages are a high(er) speed operation and the smaller implementations (component
counts)
• The modications to the design can be hard to do
• This approach is favored in RISC style designs
• Sequencing technique
Figure 2.52
Figure 7.3. Branch Control unit of Microprogrammed Control Unit with with a single address eld
• Address generation
The problem is to consider the various ways in which the next address can be derived or computed. The
various techniques of the address generation is geven in the following.
Figure 2.53
Figure 2.54
• Vertical microprogramming
Width is narrow: n control signals can be encoded into log2n control bits
Limited ability to express parallelism
Considerable encoding of control information requires external memory word decoder to identify the
exact control line being manipulated
• Horizontal microprogramming
• Compromise technique
• Microprogramming applications
8
2.8 Module 8: Instruction Pipelining
2.8.1 1. Pipelining
2.8.1.1 1.1 Basic concepts
An instruction has a number of stages. The various stages can be worked on simultanously through various
blocks of production. This is a pipeline. This process is also referred as instruction pipeling. Figure 8.1
shown the pipeline of two independent stages: fetch instruction and execusion instruction. The rst stage
fetches an instruction and buers it. While the second stage is executing the instruction, the rst stage takes
advantage of any unused memory cycles to fetch and buer the next instruction. This process will speed up
instruction execution
Figure 2.55
Figure 2.56
Tk = [k + (n-1)][U+F020][U+F074]
• The speedup factor for the instruction pipeline compared to execution without the pipeline is dened
as:
SK = T1
= nkτ = nk
TK [k+(n−1)]τ k+(n−1)
• Pipeline depth
• Data dependencies
Pipelining must insure that computed results are the same as if computation was performed in strict
sequential order
With multiple stages, two instructions in execution in the pipeline may have data dependencies. So
we must design the pipeline to prevent this.
Data dependency examples:
A = B + C
D = E + A
C = G x H
A = D / H
Data dependencies limit when an instruction can be input to the pipeline.
• Branching
One of the major problems in designing an instruction pipeline is assuring a steady ow of instructions to
initial stages of the pipeline. However, 15-20% of instructions in an assembly-level stream are (conditional)
branches. Of these, 60-70% take the branch to a target address. Until the instruction is actually executed,
it is impossible to determin whether the branch will be taken or not.
- Impact of the branch is that pipeline never really operates at its full capacity.
The average time to complete a pipelined instruction becomes
Tave =(1-pb)1 + pb[pt(1+b) + (1-pt)1]
A number of techniques can be used to minimize the impact of the branch instruction (the branch
penalty).
- A several approaches have been taken for dealing with conditional branches:
+ Multiple streams
+ Prefetch branch target
+ Loop buer
+ Branch prediction
+Delayed branch.
• Multiple streams
- Replicate the initial portions of the pipeline and fetch both possible next instructions
- Increases chance of memory contention
- Must support multiple streams for each instruction in the pipeline
- When the branch instruction is decoded, begin to fetch the branch target instruction and place in a second
prefetch buer
- If the branch is not taken, the sequential instructions are already in the pipe, so there is not loss of
performance
- If the branch is taken, the next instruction has been prefetched and results in minimal branch penalty
(don't have to incur a memory read operation at the end of the branch to fetch the instruction)
• Branch prediction
- Make a good guess as to which instruction will be executed next and start that one down the pipeline.
- Static guesses: make the guess without considering the runtime history of the program
Branch never taken
Branch always taken
Predict based on the opcode
- Dynamic guesses: track the history of conditional branches in the program.
Taken / not taken switch History table
Figure 2.57
• Delayed branch
- Minimize the branch penalty by nding valid instructions to execute in the pipeline while the branch
address is being resolved.
- It is possible to improve performance by automatically rearranging instruction within a program, so
that branch instruction occur later than actually desired
- Compiler is tasked with reordering the instruction sequence to nd enough independent instructions
(wrt to the conditional branch) to feed into the pipeline after the branch that the branch penalty is reduced
to zero
High performance can be obtained by subdividing the clock cycle into a number of sub intervals
Higher clock frequency!
Subdivide the macro pipeline H/W stages into smaller (thus faster) substages and clock data through
at the higher clock rate
Time to complete individual instructions does not change
Degree of parallelism goes up
Perceived speedup goes up
It must insure computed results are the same as would be computed on a strictly sequential machine
Two instructions can not be executed in parallel if the (data) output of one is the input of the other or
if they both write to the same output location
Consider:
S1: A = B + C
S2: D = A + 1
S3: B = E + F
S4: A = E + 3
Resource dependencies:
In the above sequence of instructions, the adder unit gets a real workout!
Parallelism is limited by the number of adders in the ALU
Decode and execute stages are decoupled via an instruction buer window
Decoded instructions are stored in the window awaiting execution
Functional units will take instructions from the window in an attempt to stay busy
This can result in out-of-order execution
S1: A = B + C
S2: D = E + 1
S3: G = E + F
S4: H = E * 3
Antidependence class of data dependencies must be dealt with it.
9
2.9 Module 9: Multilevel Memories
• Bits
The basic unit of memory is the binary digit called a bit. A bit may contain a 0 or 1. It is the simplest
possible unit
• Memory addresses
- Memories consist of a number of cells or locations each of which can store a piece of information. Each
location has a number called its address, by which program can refer to it. The cells is the smallest addressable
- If an address has m bits, the maximum number of cells addressable is 2m.
- Byte: 8-bits
- Bytes are grouped into words. The signicance of word is that most instruction operate on entire word.
A computer with a 32bit/word has 4 bytes/word
• Byte ordering
• Transfer rate: Rate at which data is transferred to/from the memory device
• Access time:
For RAM, the time to address the unit and perform the transfer
For non-random access memory, the time to position the R/W head over the desired location
• Memory cycle time: Access time plus any other time required before a second access can be started
• Access technique: how are memory contents accessed
Random access:
Each location has a unique physical address
Locations can be accessed in any order and all access times are the same
What we term RAM is more aptly called
read/write memory since this access technique also applies to ROMs as well
Example: main memory
Sequential access:
Data does not have a unique address
Must read all data items in sequence until the desired item is found
Access times are highly variable
Example: tape drive units
Direct access:
Data items have unique addresses
Access is done using a combination of moving to a general memory area followed by a sequential
access to reach the
desired data item
Example: disk drives
Associative access:
A variation of random access memory
Data items are accessed based on their contents rather than their actual location
Search all data items in parallel for a match to a given search pattern
All memory locations searched in parallel without regard to the size of the memory
Extremely fast for large memory sizes
Cost per bit is 5-10 times that of a normal RAM cell
Example: some cache memory units.
• Cache memory
The cache memories are high-speed buers for holding recently accessed data and neighboring data in main
memory. The organization and operations of cache provide an apparently fast memory system.
• External Memory
- Magnetic disks
- RAID technology disks
- Optical disks
- Magnetic tape
Figure 2.58
Registers internal to the CPU for temporary data storage (small in number but very fast)
External storage for data and programs (relatively large and fast)
External permanent storage (much larger and much slower)
Figure 2.59
Example: Suppsose that the processor has access to two level of memory:
Two-level memory system
Level 1 access time of 1 us
Level 2 access time of 10us
Ave access time = H(1) + (1-H)(10) ns
where: H is a fraction of all memory access that are found in the faster memory (e.g cache)
Figure 2.60
10
2.10 Module 10: Cahe Memory
• Improving Performance
Figure 2.61
Figure 10.1. The cache is logically between the CPU and main memory in the the memory hierarchy.
Physically, there are several possible places it could be located.
• Characteristics
Assume an access to main memory causes a block of K words to be transferred to the cache
The block transferred from main memory is stored in the cache as a single unit called a slot, line, or
page
Once copied to the cache, individual words within a line can be accessed by the CPU
Because of the high speeds involved with the cache, management of the data transfer and storage in
the cache is done in hardware
- If there are 2n words in the main memory, then there will be M=2n /K blocks in the memory
M will be much greater than the number of lines, C, in the cache
Every line of data in the cache must be tagged in some way to identify what main memory block it is.
The line of data and its tag are stored in the cache.
Figure 2.62
Word id = 3 bits
Line id = 10 bits
Tag id = 7 bits
Where is the byte stored at main memory
location $ABCDE stored?
$ABCDE=1010101 1110011011 110
Cache line $39B, word oset $6, tag $55
c/ Remarks
+ Easy to implement
+ Relatively inexpensive to implement
+ Easy to determine where a main memory reference can be found in cache
• Disadvantage
Figure 2.63
• Advantages
- Fast
- Flexible
• Disadvantage
• Implementation cost
Figure 2.64
2.10.4 4. Performances
2.10.4.1 4.1 Write policy
When a line is to be replaced, must update the original copy of the line in main memory if any addressable
unit in the line has been changed
Write through
+ Anytime a word in cache is changed, it is also changed in main memory
+ Both copies always agree
+ Generates lots of memory writes to main memory
Write back
+ During a write, only change the contents of the cache
+ Update main memory only when the cache line is to be replaced
+Causes cache coherency problems dierent values for the contents of an address are in the cache
and the main memory
+ Complex circuitry to avoid this problem
11
2.11 Module 11: Internal Memory
• Dynamic RAM
- Uses 5-10x more transistors than similar dynamic cell so packaging density is 10x lower
- Faster than a dynamic cell
ROM : Read Only Memories
A Read only Memory (ROM) contain a permanent of data that canot be changed.
- Permanent data storage
- ROMs: Data is wired in during fabrication at a chip manufacturer's plant
- Purchased in lots of 10k or more
• Example:
Figure 2.65
Figure 2.66
A single error is a bit ip multiple bit ips can occur in a word
2M valid data words
2M+K codeword combinations in the memory
Distribute the 2M valid data words among the 2 M+K codeword combinations such that the distance
between valid words is sucient to distinguish the error
Figure 2.67
Figure 11.3
Figure 2.68
Figure 11.4
Bit position n is checked by bits Ci such that the sum of the subscripts, i, equals n (e.g., position 10,
bit M6, is checked by bits C2 and C8)
To detect errors, compare the check bits read from memory to those computed during the read operation
(use XOR)
+ If the result of the XOR is 0000, no error
+ If non-zero, the numerical value of the result indicates the bit position in error
+ If the XOR result was 0110, bit position 6 (M3) is in error
Double error detection can be added by adding another check bit that implements a parity check for the
whole word of M+K bits. SED and SEC-DED are generally enough protection in typical systems
12
2.12 Module 12: External Memory
• Principle
• Disk characteristics
- Single vs. multiple platters per drive (each platter has its own read/write head)
- Fixed vs. movable head. Fixed head has a head per track. Movable head uses one head per platter
- Removable vs. nonremovable platters. Removable platter can be removed from disk drive for storage
of transfer to another machine
- Data accessing times:
+ Seek time position the head over the correct track
+ Rotational latency wait for the desired sector to come under the head
+ Access time seek time plus rotational latency
+ Block transfer time time to read the block (sector) o of the disk and
transfer it to main memory.
Figure 2.69
RAID 0
No redundancy techniques are used
Data is distributed over all disks in the array
Data is divided into strips for actual storage similar in operation to interleaved
memory data storage
RAID can be used to support high data transfer rates by having block transfer size be in multiples of
the strip
RAID can support low response time by having the block transfer size equal a strip
support multiple strip transfers in parallel
RAID 1
All disks are mirrored duplicated Data is stored on a disk and its mirror Read from either the disk or
its mirror Write must be done to both the disk and mirror
RAID 2
All disks are used for every access disks are synchronized together
Data strips are small (byte)
Error correcting code computed across all disks and stored on additional disks
Uses fewer disks than RAID 1 but still expensive.Number of additional disks is
proportional to log of number of data disks
RAID 3
Like RAID 2 however only a single redundant disk is used the parity drive
Parity bit is computed for the set of individual bits in the same position on all
Disks If a drive fails, parity information on the redundant disks can be used to calculate thedata from
the failed disk on the y
• Basic operations
To write to the disk, a laser is used to burn pits into the track write once!
During reads, a low power laser illuminates the track and its pits
- In the track, pits reect light dierently than no pits thus allowing you to store 1s and 0s
Characteristics:
- Master disk is made using the laser
- Master is used to press copies in a mass production mechanical style
- Cheaper than production of information on magnetic disks
- Storage capacity roughly 775 NB or 550 3.5 disks
- Transfer rate standard is 176 MB/second
- Only economical for production of large quantities of disks
- Disks are removable and thus archival
- Slower than magnetic disks.
13
2.13 Module 13: Input/Output
1. I/O Overview
The computer system's I/O architecture is its interface to the outside world. The I/O components consist
the buses, the keybroads, CRT Minitor, Flat panel display, scanner, modem . . .The I/O architecture consiste
the buses and a set of I/O modules, each module interfaces to the bus system or to the outside world.
- Programmed I/O
- Interrupt-dreiven I/O
- Direct Memory Access (DMA)
I/O parallel
I/O serial
1. Buses
The collection of paths that connect the system modules together form the interconnection structure.
• Bus Interconnection
• Multiple Buses:
Hierarchical
High-speed limited access buses close to the processor
Slower-speed general access buses farther away from the processor.
Figure 2.70
Figure 2.71
• Operating System
The Operating System (OS) is the program that manages the resources, provides
services to programmer, schedules execution of other programs.
The OS masks the details of the hardware from the programmer
The OS provides programmer a good interface for using the system that consists the new instructions
are called system calls
The OS mediates programmer and application programs requests for facilities and services
• Services provided
Interactive OS
User directly interacts with the OS through a keyboard / terminal
Batch OS
User programs are collected together (o line) and submitted to the OS in a batch by an operator
Print-out results of the program execution is returned to the user
This type of OS is not typical of current machines
This type is used in the mainframes of 60-70s when the cost of hardware was such that you wanted to
keep it busy all thetime
O/S was really a monitor program that focused on job scheduling
2.14.2 2. OS scheduling
• In multiprogramming OS, multiple jobs are held in memory and alternate between
High-level
Short-term
I/O
• High-level scheduling
Determines which jobs are admitted into the system for processing
Controls the degree of multiprocessing
Admitted jobs are added to the queue of pending jobs that is managed by the short-termscheduler
Works in batch or interactive modes
• Short-term scheduling
This OS segment runs frequently and determines which pending job will receive the
CPU's attention next
Based on the normal changes of state that a job/process goes through
A process is running in the CPU until:
+It issues a service call to the OS (e.g., for I/O service)
+ Process is suspended until the request is satised. Process causes and interrupt and is Suspended.
External event causes interrupt
Short-term scheduler is invoked to determine which process is serviced next.
Figure 2.72
15
2.15 Module 15: Virtual Memory
2.15.1 1. Partitioning
Partitioning of memory is as a job that is brought into memory, it must be allocated a portion of the memory,
this is its partition. Partitions can be xed size of variable sized
Fixed size:
This is a rigid size divisions of memory
Process is assigned to smallest available partition it ts into
Process can be very wasteful of memory
Variable size
Allocate a process to only as much memory as it needs
Other memory can be used for other processes
Ecient use of memory but, over time, can lead to large numbers of little left over pieces of free memory
too small to use
Must consider use of memory compaction
• Eect of partitioning
Since a process can be swapped in and out of memory and end up in dierent partitions, addressing within
the process must not be tied to specic physical memory addresses
Program/process addresses are logical ones with respect to the starting address of the
program itself
Processor hardware must convert the logical address reference of the program into a physical address
that the machine can operate on
In eect this is a form of index addressing base addressing where the base address is the rst address
of the program.
Figure 2.73
2.15.2 2. Paging
• Paging is to subdivide memory into small xed-size chunks called frames or page frames
• Divide programs into same sized chunks, called pages
• Loading a program in memory requires the allocation of the required number of pages
• Limits wasted memory to a fraction of the last page
• Page frames used in loading process need not be contiguous
Each program has a page table associated with it that maps each program page to a memory page frame
Thus, logical addresses in a program are interpreted as a physical address page frame number and
an oset within the page
Figure 2.74
The more sophistiated way of mapping address from the address space onto the actual memory address is
possible. Principle of address mapping in a paging system is shown in Figure 15.3 .
The address that the program can refer is called the virtual address space, the actual hardwired (pgysical)
memory address is called the physical address space.
Figure 2.75
Only the program pages that are required for the execution of the program are actually loaded hence
the demanded concept
Only a few (of the many potential) pages of any one program might be in memory at a time
Many active programs can have pages increasing the degree of multiprogramming
• With demand paging, it is possible for a program consisting of more pages than can (ever) t into
memory to run
This is presents a virtual memory system that seems unlimited in size to the programmer
Signicantly simplies program development
• Page tables can be large and are themselves subject to being stored in
• It would seem that each virtual address reference causes 2 memory accesses
• To avoid this delay, use a special cache to hold page table information
• Every memory access requires
Figure 2.77
Similar to the ones discussed in caching, the following algorithms have been used
LRU
FIFO
Random, etc.
2.15.5 5. Segmentation
In addition to partitioning and paging, segmentation can be used to subdivide
memory
Programmer subdivides the program into logical units called segments
Programs subdivided by function
Data array items grouped together as a unit
Advantages
Simplies data structure handling
Supports code modication and maintenance
Supports sharing code segments
Provides a degree of protection
Memory reference is in the form of a segment ID and an oset
Segmentation can be combined with paging in a segment ID, page number, page oset scheme
Resume of Virtual Memory management
Partitioning
Paging
Segmentation
Segmentation and paging
16
2.16 Module 16: Advanced Architectures
2.16.1 1. Introduction
In the previous sections of this course. we have concentrated on singleprocessor architectures and techniques
to improve upon their performance, such as:
Ecient algebraic hardware implementations
Enhanced processor operation through pipelined instruction execution and multiplicity of functional
units
Memory hierarchy
Control unit design
I/O operations
Through these techniques and implementation improvements, the processing power of a computer system
has increased by an order of magnitude every 5 years. We are (still) approaching performance bounds due
to physical limitations of the hardware.
Figure 2.78
Multiple processors
Shared, common memory system
Processors under the integrated control of a common operating system
Data is exchanged between processors by accessing common shared variable locations in memory
Common shared memory ultimates presents an overall system bottleneck that eectively limits the
sizes of these systems to a fairly small number of processors (dozens)
Figure 2.79
Multiple processors
Each processor has its own independent memory system
Processors under the integrated control of a common operating system
Data exchanged between processors via interprocessor messages
This denition does not agree with the one given in the text
• These bounds were based on runtime performance of applications and were not necessarily valid in
all cases
• They reinforced the computer industry's hesitancy to get into parallel processing
• SIMD Examples
Vector addition
C(I) = A(I) + B(I)
Complexity O(n) in SISD systems for I=1 to n do
C(I) = A(I) + B(I)
Complexity O(1) in SIMD systems
Matrix multiply
A, B, and C are n-by-n matrices
Compute C= AxB
Complexity O(n3) in SISD systems
n2 dot products, each of which is O(n)
Complexity O(n2) in SIMD systems
Perform n dot products in parallel across M the array
Image smoothing
Smooth an n-by-n pixel image to reduce noise
Each pixel is replaced by the average of itself
and its 8 nearest neighbors
Complexity O(n2) in SISD systems
Complexity O(n) in SIMD systems
Pixel and 8 neighbors
instructions
Rather than forcing all processors to perform the same task at the same time, processors can be assigned
dierent tasks that, when taken as a whole, complete the assigned application
Process synchronization
Process scheduling
• Process synchronization targets keeping all processors busy and not suspended
• System examples
SIMD
Illiac IV
One of the rst massively parallel systems 64 processors
Goodyear Staran: 256 bit-serial processors
Current system from Cray Computer Corp.uses supercomputer (Cray 3) front end coupled to an SIMD
array of 1000s of processors
MIMD
Intel hypercube series:
Supported up to several hundred CISC processors
Next-gen Paragon
Cray Research T3D
Cray Y-MP coupled to a massive array of Dec Alphas
Target: sustained teraop performance
2.16.4 4. Discussions
• Problems
• Reality check:
Outside of a limited number of high-prole applications, parallel processing is still ayoung discipline
Parallel software is still fairly sparse
Risky for companies to adopt parallel strategies, just wait for the next new SISD system.
Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords
do not necessarily appear in the text of the page. They are merely associated with that section. Ex.
apples, 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1
C computer architecture, 7
P program status word, 71
computer organization, 7
control unit, 67
R registers, 68
Attributions
Module: "Syllabus"
By: Nguyen Thi Hoang Lan
URL: http://cnx.org/content/m29778/1.1/
Pages: 1-2
Copyright: Nguyen Thi Hoang Lan
License: http://creativecommons.org/licenses/by/3.0/
Module: "Calendar"
By: Nguyen Thi Hoang Lan
URL: http://cnx.org/content/m29307/1.1/
Pages: 2-4
Copyright: Nguyen Thi Hoang Lan
License: http://creativecommons.org/licenses/by/3.0/
Module: "Exercise"
By: Nguyen Thi Hoang Lan
URL: http://cnx.org/content/m29392/1.1/
Pages: 4-6
Copyright: Nguyen Thi Hoang Lan
License: http://creativecommons.org/licenses/by/3.0/
Module: "MICROPROGRAMMING"
Used here as: "Module 7: Microprogramming"
By: Nguyen Thi Hoang Lan
URL: http://cnx.org/content/m29711/1.1/
Pages: 84-91
Copyright: Nguyen Thi Hoang Lan
License: http://creativecommons.org/licenses/by/3.0/
Module: "Input/Output"
Used here as: "Module 13: Input/Output"
By: Nguyen Thi Hoang Lan
URL: http://cnx.org/content/m29420/1.1/
Pages: 120-124
Copyright: Nguyen Thi Hoang Lan
License: http://creativecommons.org/licenses/by/3.0/
Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12
schools, distance learners, and lifelong learners. Connexions materials are in many languages, including
English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part
of an exciting new information distribution system that allows for Print on Demand Books. Connexions
has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.