0% found this document useful (0 votes)
130 views

Computer Organization: Instructions and

This document discusses machine instructions and addressing modes. It notes that instructions can be zero address, one address, or two address. It describes various addressing modes including register based, memory based, immediate, direct/absolute, auto indexed, indirect, indexed, and indirect indexed addressing. It also discusses transfer of control flow addressing modes which are used for selection, iteration and subprogram control flow. Finally, it briefly discusses ALU and CPU design with the data path containing registers and ALU and the control unit controlling operation.

Uploaded by

manish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views

Computer Organization: Instructions and

This document discusses machine instructions and addressing modes. It notes that instructions can be zero address, one address, or two address. It describes various addressing modes including register based, memory based, immediate, direct/absolute, auto indexed, indirect, indexed, and indirect indexed addressing. It also discusses transfer of control flow addressing modes which are used for selection, iteration and subprogram control flow. Finally, it briefly discusses ALU and CPU design with the data path containing registers and ALU and the control unit controlling operation.

Uploaded by

manish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Computer Organization

and Architecture 14
Machine Instructions and Addressing Modes
Machine Instructions
Types of instructions: Zero address instructions, one address
instructions, and two address instructions.

Zero address instructions: NOP


One address instructions: PUSH, POP
Two address instructions ADD, MULT, SUB
Addressing Modes m m
Opcode Address1 Address2
Addressing modes shows the n
way where the required object Px 2+Qx2+ R 2' (Total Instructions)
is present. The object may be
2-address 1-address Zero Address
an instruction or data. instructions instructions Instructions

Addressing Modes are basically classified in two types


(a) Sequential Control Flow Addressing Modes: When the program
is stored in the sequential memory locations, the program counter
itself points the next instruction address, therefore such Addressing
Modes are focused on data.
(b) Transfer of Control Flow Addressing Modes: When the program
is stored in the random memory locations (using structured
instructions and addressing
programming) there is a need of special
modes in order to calculate the next instruction address.

Sequential Control Flow Addressing Modes


Register Based Addressing Modes:
the data when it is
Ihese Addressing Modes are used to access

available in the registers.


Mode: This Addressing
egister/Register Direct Addressing data
IOde is used to access the local variables.
In this mode the
in the
that register address is available
present in the register, is
Therefore, the effective address
audress of the instruction.
field
yual to address field value. Data =
[EA] = [Register Namej
Example: MOV R,, R2
528 A Handbook for
Electronics Engineering MADE ERSY
Memory Based Addressing Modes: When the data is available inthe
used to
based addressing modes are used toaacces
memory, different memory
the memory address.
the data. Under this, the EA always
is

Implied/Implicit Addressing
Modes: In this mode the dat
data
itself. Therefore, there is no effective adcire
available in the opcode (ess.
Accumulator (CMA) and all zero addrae
Example:Compliment ress
instructions.
Immediate Addressing Mode:
This mode is used to access the constants or to initializ
value.
registers to a constant
Here the data is present in the address tield of the instruction
The range of values initialized is limited by the size of the
address field.
of immediate
I f the address field size is n-bit, the possible range
Constants or data is: 0 to 2 - 1.

Example: MOV A, 10
Direct/Absolute Addressing Mode
Used to access the static variables.
The data is present in the memory, that memory cell address is
present in the address field of the instruction. Data = [EA)

= [Memory Address]
One memory reference is required to read or write the data oy
using the direct addressing mode.
Example: MOV R,. [1000]
Auto Indexed Addressing Mode:
This mode is used to access the linear array elements. lhus,
ase address" is required to access the data.
The base address is maintained in the base register. Therelo
the EA is Base Register t Step Size.
Step size is dependency on the amount of the data to
accessed from the memory. Data = [EA] = [[Base Register
Step size
Auto Indexed AM

Auto Increment Auto Decrement

Pre Auto Post Auto


Post Auto Pre Auto
Increment Increment Decrement Decrement

Mov Ro. +(R,) Mov Ro. (R,+ Mov Ro.-(R,) Mov Ro. (R
maDE ERSN
Computer Organization and Architecture
529
Indirect Addressing Mode: (Array as paramoter)
Used to implement the pointers. The EA is available in either
register or memory
This mode divided into two types:
is

Register Indirect Addressing Mode: Here, the EA is


present in the address register, that register name is
available in the address field of the instruction
EA [Address Field
Valuel= [Register Namel
Memory Indirect Addressing Mode: Here,the EA
present in the memory, that memory address is available
is
in the address field of the
instruction
EA [Address field value] =» [Memory
address]
Data [EA = [[Memory address]]
Two memory references are required in memory indirect
Addressing Mode.
Indexed Addressing Mode:
Allows to implement array indexing.
EA Base Address + Index Value
The register that holds the index value is "Register Indexed
Addressing Mode", it is used to access the Random array
element.
I n Indirect Index Addressing Mode the base address is present
in the memory, that memory address is present in the address
field of the instruction.
Transfer of Control Flow Addressing Mode
During the execution of selection statements (if, then, else, goto, switch).
iterative statements (For loop, while loop, do while loop) and subprogram
concept, the control is transferred from one location to another location.
Transfer-of-Control operations can be implemented by using three
possible mnemonics: (i) Jump (ii) Branch and (ii) Skip.
Transfer-of-Control instruction is divided into two types:
Unconditional Transfer-of-Control: While execution of these
instruction the program control is transferred to the target address
without checking any condition. Example: HALT
Conditional Transfer-of-Control: While execution of these
instruction the associated condition undergoes evaluation. t it
evaluates to true, then the target address is loaded into Program
Counter (PC) else no change in PC.
he condition can be evaluated based on the status of the previous
instruction.
530 A Handbook for Electronics Engineering mADE ERSy

Example: They are used to implement 'i, switch', 'for', 'do whil
and 'while' statements
while
Instruction set = opcode
Vector interrupt - Interrupting source supplies branch info at process
ssor
through an interrupted vector
INTAINTR

Computation of EA for the Next Instruction


Relative/Pc Relative Addressing Mode
(Relocation data, Branch address at run time; Reduce instruction size)
This Addressing Mode is used to access the instruction within the
segment, therefore only the offset address is required.
The offset address is available in the address field of the instruction
EA Base Address +Offset Address

PC Address Field Valuee


PC-PC IR [Address Field]
+

Position dependent.
Relocation at run time.

Base Register Addressing Mode (Position independent)


This Addressing Mode is used to access the instruction between
segments. Therefore base address as well as offset are required.
Base address is maintained in the base register and offset address Is
maintained in the address field of the instruction.
EA [Base Register] + IR [Address Field]
PC-[Base Register]+ IR [Address Field]

I I ALU and Data-path, CPU Control Design


ACPU divided into two sections: Data section and Control section. Data
section also called as "data path" and Control section is "control unit
Data path contains registers and ALU, which performs certain hs
operatio
on data items. Control unit issues the control signals to the data path.

ALU and Data path


Types of data path: One bus data path, two bus data path and threr
bus data path.
In one bus data path, CPU registers and ALU use same bus for au

incoming and outgoing data.


m a D E E R S N
Computer Organization and Architecture 531

bUS data path, goneral


purpose rogisters are connected to two
two
In t a Caln be translerred trom lwo different registers to the input
buses. Dat
time
ALU at the samo
of the
point
Do

wo buses are used a n sOurce (rmoves data out


bus data path,
In three
registors)
while the third is usedas dostination (Moves data into registers)
of
rCebus
is als0 called as
in-Du and dostination bun is also called a
out-bus

PC
General IR
Purpose MAR
ALU
Registers MDR
Memory Bus
Ono-bus Datapath

RBus
IfRn 1
Bus-R
IfRou1
PC Control
Unit
AR Main
MAR
IR
Memory
R] MDR
R
A R, out Y R
Constant

Select MUX
Data flow Into register
and Out from register
R
A B

Data bus =Bi-directional


Control
Signals T ALU TEMP
L
C
interconnecting
The ALU, registers and

path is called "ALU data path".


PC.
Constant is used to increment constant
Processor with One Bus
If select =0; output of MUX is
Y.
=
1; output of MUX is
CPU Registers:
starting
instruction addressandimmediatey
P C : t i s used to hold the
address. As
point the next instruction's instruction todecode.
currently
fetched 1s
not a

used to hold the theretore it


ntis in this register
"ruction format it predefined

user accessible register. ALU npul


ad
du
of the
used to hold the one
ACCumulator: It is
532 A Handbook for Electronics
Engineering
MADE ERSY
MAR (Memory Address Register): It is used to
carry the
This register is directly connected to address lines of a dress
systern
MBR (Memory Buffer Register) or MDR
(Memory Data Ra tbus,
It is used to carry the binary sequence. This
connected to the data lines of the systerm bus.
register isis, didirecthy Register)y
Micro-operation: It is a basic register to register transfer oDera
also called as automatic operation/control
All the micro-operations must
mode/microinstruction tion
complete the operation in one cvc
Control signals are required to execute the
micro-operation.
Example: :R - R2

R
Ri out R2in
Microprogram: The sequence of
micro-operation is called as the
microprogram.
The fetch cycle microprogram is:
T:PCMAR; PCu MAR
T: M[MAR] -> MBR; MARau MBR
Ta: MBR IR; MBR,out: IRn
PC-PC + Step size; PCut PCjin
The basic opérations
performed are:
Register transfer (R-R,): Rout Rzinonly one
ALU operation clock cycle requred)
(R R,+ A): R outY
2 out Select = 1, Add

Zou Ra in
Minimum number of clocks =3
If two data
paths are used: A Y FR, Select =
1, Add
Zou Rain
Minimum number of clocks =
2
Memory Read: A-M[(R,)]
ou MARn. read
Wait for memory function to complete (WMFC).
MDRO 2 in
Here number of clocks can be 2 or 3 depending on WMFC.
Memory Write: M[(R,)]-A,
ROu MARin
R2 0u MDR write
WMFC
Minimum number of clocks = 3.
MADE ERSH Computer Organization and Architecture 533
Control Unit

Commoon
Bus

PC
IR
Rin 1 > RBus
RyOut1 Bus R
Example: R, R,
R out R1out R2in

Data flow from one register to another register


The purpose of control unit is to provide appropriate tirning and control
signals which are required to execute micro-operations.
Control signals are directly executed on the base machine
Micro-program contains sequence of micro-operations.
Subcycles of instruction cycle invokes their own micro-program.
Instruction cycle is used to execute one instruction.
System functionality is execution of the program.
Program is a sequence of instruction along with data.
Micro-operations-Control Signais
P o
Fetch (MAR Pc
Unstruction MAR
instruction 2 Cycle PCPC+1|
System Base
Machine
Data 1 Execution READ
Cycle
Data 2 IRMDR
Execution of Programn Instruction Micro-program Control unit
the program
cycle
Program Execution using Control Unit

Control Unit Design

Control Unit

Hardwired Approach Microprogrammed Approach

Vertical Programming
Horizontal Programming
534 A Handbook for Electronics Engineering
DE ERS
Hardwired Control Unit
The control signals are expressed as Sum-of-Product (SOP) exprecei
and they are directly realized on the independent hardware. Sion
Example: Yn =T,+ Ta ADD+T, BR. T T ADD T, BR
Here Yis enabled during T, for all instructions,
during 7, for ADD and so on..
Itcan be realised as:
It is the fastest control unit design
because of independent circuits.
F o r large number of instructions and control signals the
complexityof
hardware is more.
Relatively inflexible for any of the changes like modification, recorrection,
thus it is not used in design and testing places.
Used in the Real Time Applications.
Example: Air Craft simulation and weather forecasting etc.
Implemented in RISC processor.

Micro-programmed Control Unit


Control Memory (CM) is used to store the microprogram.
Control Memory (CM) is a permanent memory that is ROM.
Microprogram contains sequence of microoperations (control word).
The format of the control word is

Branch Control Control Memory


Condition Flag Field Address

No. of Flag Control


branch
Next micro-operation
information Signals (CS) (control word address)
conditions
CS are
stored in
Capacity
of the CM
Decoded Encoded format
format (N control signals
(1 bit/cS) requires log,N bits)

Horizontal Vertical
(CW) (CW)
Flexible for large number of instructions and CS.
Control signals are generated slow in comparison to hardwired.
Based on the type of Control Word (Cw) stored in the Control Memory
(CM), the control unit is classified into two types:
Computer Organization and Architecture 535

Unit (1uPCUJ)
Horizolal MicroprOgramInOdCOrntrol
Vortical MicroprogIamnod
Control Unit (VuPCU)

Horirontal uPCU
binary forrmat that is
aro ropresontod in the decoded
Ihe control signals
1bit/CS

pre80nl in processor then 68 bits are


required
68 CS a r e
Example: I1
one CS can bo enabled at a tirne.
More than
Control Word (CW).
It supports longor
applications.
Il is used in
parallel processing enabled at
of parallelism. If degree is n, n CS are
degree
Il allows high
a time.
Faster than vertical uPCU
(decoders) >
additional hardware
Requires no

Vertical uPCU binary format.


are represented in the encoded
The control signals
required.
For Ncontrol signals log, Nbits
7 bits required.
Example: If N= 68
then log, 68 =

shorter Control Word (CW).


It supports therefore it is
control signals
implementation of
new
It supports easy
more flexible. is 1 or 0.
of parallelism i.e. degree of parallelism
It allows low degree control signals,
hardware (decoders)
to generate
additional
Requires an
than horizontal uPCU.
it implies slower
Note: vertical <
Horizontal

The ascending order of control unit w.r.t speed


Hardwired (highest)
control unit w.r.t
flexibility (low flexibility)
T h e ascending order of

Hardwired < Horizontal<Vertical.


CISC
RISC instruction set computer
1. Complex
Reduced instruction set
computer number of registers
1. Supports less
set number of
2. Supports rich register 3. Supports more

3. Supports less number off addressing modes

addressing modes Supports variable length


4. Supports fixed length instructions

instructions (CPI 1)
5. One instruction/Cycle (CPI = 1) 5. pipeline
unsucoessful

6. Supports
6. Allows successful pipeline (CPI1)
implementation (CPI = 1) 7. Example: Pentium

Example: Motorola, Power PC,


Advance Risk Machine

****** *
536 A Handbook for Electronics Engineering
MADE ER

IL Memory and 1/0 Interfaces


In the isolated 1/O CPU has distinct instructions for memory
read/wr
and /O read/write. The address what we use for accessing a
be same as one of the address in the memory however separate linae devicer
used to indicate l/O device address.
In memory mapped devices, their intertace registers are mem
mapped on to the address space of the memory. emory
Input Output Organisation
Input output devices are very slow devices, therefore they are not directh
connected with the system bus because Input output devices ara
electromagnetic devices and the CPU is the electronic device; therei
difference in the operating mode, data transfer rate and word format
To synchronize the input output devices with CPU (Memory
the processor there is a need of high speed
interface (l/O Module).
Controi
The high speed interface enables the High Speed Logic
Interface
Corresponding device for the operation. After Buffer
preparing the data, the l/O device transfers it
to the buffer.
/O
/O Interface
When the data is present in the buffer than the
high speed interface
generates the signal to the CPU and waiting for the acknowledgment.
After receiving the acknowledgment it transfers the
data to the CPU
with high rate. Thus, the speed gap is minimized.
Types of Interfaces:

Types of Interfacess

Non-programmable Programmable
Configuration is predefined during the
design time, that can't be changed during| Configuration is predefined in Command
the operation. Example Buffer, Latch Word Register. Based on the value loaded
into the CWR, the configuration is changeo

Programmable

Serial Interface
(bit by bit transmission)
Parallel Interface
(More than one bit transmission)
Example: 8251 USART
Example: 8255 PPI
(Universal Synchronous) (Programmable Peripheral Interface)
Asynchronous Receiver Transmitter 8259 IC (Interrupt Controller)
8237 DMA
8 2 5 7DMA
mADE ERSY
Computer Organization and Architecture 4
537
/ O Transfer Modes

Three different modes available to transfer data frorn /O to CPU/memory


1 Programmed / o

communicates with the CPU


Here the CPU directly 1 Byte
the processor utilization is
/O device. Hence,
inefficient. VO
1 KBPS

The processor udergoes waiting until completion of the l/O operation


on the speed of the 1/O device.
This waiting depends
Driven /O
2. Interrupt
high speed
iS used as the interface between the
Interrupt controller (IC)
to CPU. Hence, the processor's utilization is good
basic 1/O devices
(efficient)
with IC only. Thus, the execution/transfer
Here, the CPU is communicating
on the speed of I/O device, rather it depends
on
time is not depending
the latency of IC.
{VO) 1 KBPS

CPU ByteIC
Level
V O )10 MBPS

/o)50 KBPS

3. Memory Access (DMA)


Direct
Memory
transferred (CPU
T h e bulk amount of data is
main
from the /0 devices to the
involvement of CPU. System Bus
memory without
devices are
The secondary storage Control
connected to the system bus through Logic
DMA
DMA, while execution of user program
Buffer
from the memory to main
auxiliary
memory the program is transterred (VO

page by page through the DMA. destination address.


ne CPU initializes the DMA along with source and
it is busy with
some otne
COntrol signals and count value. Later
execution. deviCe

chain non uniform priority to each


O n priority Daisy
-

polling
HFE is top
privileqed instruction. and enadies
request
interprets the device

n Control logic After preparing


the data

Corresponding
ransfers it
transfers
device for the operation.

it to the buffer.
538 A Handbook for Electronics Engineering
mADE ERS
When the data is available in the buffer then DMA enableo . .
signal, togain the control of the bus and waiting forthe
the acknow ne HOLD
acknowledg
After receiving the HLDA CPU
signal, the DMA transfers the
data from 1/O device to main HOLD
HILDA 8237
memory until the count or Seondary
Storage
Control 8257
becomes zero. After transfer Logic DMA Cornponerts
operation, it establish Buffer

connection to CPU.
During the DMA operation, the CPU is in two states
Busy State Block /Hold State
Untilpreparing the data, the CPU is busy with other executions.
While
transferring the data, the CPU is in blocked state.
L e t 'X' Preparation time (Depends on I/O
=
speed),
'Y = Transfer time (Depends on main memory speed).

Then, % time CPU is blocked = Y + X 100

%time CPU is busy =|X- X 100


DMA is operated in 3 modes:
Burst Mode: After receiving the HLDA
signal, bulk amount of data
is transferred to main
memory.
Cycle Stealing Mode: Before receiving the HLDA signal, it forcefuly
suspend the CPU operation and transfers very important data to
the main memory.
Block Mode: After receiving the HLDA
signal, the data is transterred
to the main memory in block wise.

IV Instruction Pipelining
The set of phases for an instruction execution cycle:
.Instruction Fetch (IF):
.MAR-PC [MAR-2000]
PC-PC +1 [Next instruction: 2004]1
READ CONTROL Signal is enabled
IR-MDR
Instruction Decode (ID): Two know the purpose of instruction.
Operand Fetch (OF):
Memory Read: MAR-LOC X
Write into any general purpose register: A, - MDR
MRDE ERSY
Computer Organization and Architecture 539

Execute and Store (EX): ADD 300 ADD 300

MAR-LOCZ 300 450

MDR-R 300
(10
.Write into memory L 450
Indirect Phase: Here Here
EA = 4500
EA 300
(Direct Address) (Indirect Address)
Interrupt Phase: (2 memory cycles required)
Each instruction execution involves a sequence of microoperations

to be implemented by the processor.


transfer operation, completes in 1 clock
A microoperation is a register
cycle
instruction cycle that Twill be divided into
If Tis the total time for
among different phases.
T. T2 T3-..
in 2 modes:
. T h e CPU is operated
mode executes user program.
User mode or Non-privileged
mode executes
System mode/Supervisor/Kernel/Privileged
services.
operating system to obtain system
Types of interrupts:
External or Hardware interrupt:
Araised due to timing and /O devices
Software Interrupt: Araised due
to switching from user to system
mode or vice versa.
incorrect use of instructions and
Internal Interrupt: Araised due to
data.
overflow invalid opcode
Example: Division by zero register

Pipelining accepted input


new input at one end before previously
ACCepting
appears as an output at the other end is pipelining

Pipelining allows overlapping execution. new cycle, new


e Successful characteristic of the pipeline is for every
input must be inserted into the pipeline.
CPI == 1 Interface Register/Latch/Buffer

Stage
Stagel Stage Segment output
Input
End S
Segment
R Segment
S2
S3 End

Clock
A Handbook for Electronics Engineering
540 DE EnS
Types of Pipelines
1. Linear Pipeline: This pipeline is used to perform only
ne spect
function.
2. Non-linear pipeline: This pipeline is used to perforrm the rmk
functionalities. It uses feed forward and teed backward conne
3. Synchronous Pipeline: On a common load/clock all the
transfer data to the next stages simultaneously.
registe
4. Asynchronous Pipeline: The data flow along the pipeline tages
controlled using handshake protocol.

Performance Evaluation of Pipeline Processor


Consider segment pipeline with clock cycle time t, used to
'K exes
n-tasks. The first task is executed in the non-overlapping time span, s.
it requires Kcycles to complete the operation.
The remaining (n- 1) tasks emerge from the pipe at the rate ofan
cycle per task, so (n-1) tasks requires (n- 1) cycles to completet
operation. Thus, execution time of the K segment pipeline
ETpipe K+ (n-1) cycles

Elpipe(K+(n-):tp
The performance gain of the pipeline processor over the non-pipeline is

Speed up(S) = Performancepipe Elpige Elnorppe


Performancenonpipe Elpipe
Elnonpipe
(K+(n-10) p
For large number of tasks or instructions K+ (n-1)
approaches to
S S=2
S=
ntp p
When all the instructions are taking the same number of
cycles u
one instruction execution time is equal to the
number of stages in
pipeline (K) i.e.,

tn=KS="P=S=K
Instruction Pipeline
The instruction pipeline operates on a stream of instruction
overlapping the phases of instruction cycle like Instruction Fetcn IF
Computer Organization and Architecture 541
m a D E E R S Y

Instruction (ID), Operand


Decode (l
d i o n Decode
Felch (OF),
Execute(EX), Mernory Access
Back (WB) and so on.
Write
(MA),

Stall Cycle
during wrhich no meaningful operation performed for
is one
A stall Cycle
an instruction

arises
due to:
It instructions.
cycles needed by each stage for
( Uneven clock
Increased buffer overhead
(i)
(iii) Memory operands

(v) Pipelined d e p e n d e n c i e s

K(number of stage)
Sdeal cycles) (1+Stall frequencyx Stall cycles)
ef(1+Stall frequencyxStall
S pipelined
be achieved by using a

The maximum
speedup that can
number of stages.
=
processor

maxidealK

n
EfficiencyExKK+(n-1)
Number of tasks processed
the tasks
Throughput
=

Totaltime required to process


n

Throughput (K+(n-1))-p
Advance Pipeline (RISC Pipeline) to execute
implemented
instruction pipeline is
In the RISC processor,

the instruction. instructions.


categories of
HISCprocessor supports three implemented by
instruction can
be
Data transfer
Data Transfer:
store
using load and
o-M[2+[r]l
Load , 2(,):
Store 3(r,). r: M[3 + [ l ] = are direcuy
ALU operation
Data Manipulation (ALU Operation):
performed on the registers.
Add ro,, ti l o - t 2 branch operation syntas
unconditional
Operation: The
ranch
Jmp 1000; PC - target address.
Conditional Branch Operation
A Handbook
for Electronics Engineerinng
DE ERSy
542
address
True;PC- target
JNZ o. 2000,False : P-sequential address
instruction execution is comoletos
d at th
the branch
In RISC pipeline branch penalty [2- 1] = 1.
So
the second stage.
endof eliminate all WAR hazard and WAw
can
Register renaming load instruction
cannot handle all RAW hazard (Memory o
Bypass
can be eliminated by dynamic brane.
ranch
C o n t r o l hazard penalties
prediction.
Dependencies in the Pipeline
extra cycle create
in the pipeline. Stall is an
Dependency causes stalls
in the pipeline with NOP
There are three kinds of dependencies
possible in the pipeline
Structural dependency:
This dependency is present because of resource conflict. The

resource may be a memory or register or functional unit.


Conflict is an unsuccessful operation, So to make it successtul
make the instruction wait until the resource become available
This waiting creates stalls in the pipeline.
T o minimize the number of stalls in the pipeline due to the
structural dependency, "Renaming" concept is used.
RAW= Registerrename decrease.
WAR All decrease (removed).
Data dependency:
Consider the program segment
i: instruction; : instruction
Data dependency exists between i and jwhen the "instruction
jtries to read the data before instruction i writes it".
Control dependency:
While execution of transfer of control operation the progral
control is transferred from the current location to the targel
location.
I f the current instruction is decoded as data transfer or
da
manipulation operation than the sequential instruction
wanted instruction.
The process of removing unwanted instruction from the pipellne
is called as flush or freeze.
Flush operation creates stalls in the pipeline.
The number of stall cycles created in the pipeline due to
operation is called as Branch Penalty.
brand
mADE ERSY
Computer Organization and
Architecture 543
BranchPonalty At what stage the target
=

address is
To reduce the numberof
stalls in the available-1
pipeline due to the branch
Operation, branch prediction buffer or loop
buffer is used buffer or branch
target
Branch target buffer It is the high speed
at the instruction tetch buffer maintained
stage to store the
addresses predicted target
The out-of-order execution creates two more
the pipeline dependencies in

Anti Dependency: t exists when instruction jtries to write


the register before instruction i reads it.
Output Dependency: It exists when instruction j tries to
write the destination before instruction i writes it.
T o handle the Anti and Output dependencies "Register
Renaming' concept is used.
Register Renaming means use the temporary storage to hold
the out-of-order execution output. After receiving the exception
from the dependent instruction update the register file with
temporary storage content.

Hazards
Hazard isa delay. Delay creates extra cycles, these extra cycles without
operation is called as stalls.
Hazards are classified as:
.Write-After-Read (WAR)
Read-After-Write (RAW)
Write-After-Write (WAW) instruction
theinstructionjfollows
segment where
OSIaer a program
i in the program order.
to read the
created when
instruction j tries
MAW: This hazard is DEPENDENCY).
DATA
data before instruction i write it (TRUE betore
writes thedata
hazard is created
when instruction
Inis
instruction i reads it (ANTI DEPENDENCY)
data before
instruction i wriles
instruction / writes the
(OUTPUT DEPENDENCY).
Perfo with Stalls
Evaluation of the Pipeline

CPonpipe x Cycle 1imeNonpipe for ideal CPipe


S
CPipex Cycle 1 imepipe
Engineering
A Handbook
for
Electronics
MADE EDIN
544
CPhnppe xCycle TimeNngoipe
S No. of stall cycles
IdealCPl+ Instruction
xOycle Time,i
overhead, then both the cycle fin.

When there is no setup time times are


equal
CPhnppe
S- No. of stall cycles
1+
Instruction

same number of cycles then e


When all instructions taking the
are
to number of stages in pipeline n
instruction requires cycle equal
complete the execution.
Pipeline Depth
No. No. of stall cycles
1+
Instruction

When efficiency is 100%, no stalls


S=Pipeline depth
rename
Instruction pipeline can create one or more stall, since register
can eliminate it also. So always get false.

VCache, Main Memory and Secondary Storage


Memory Organisation
Contains most
frequently used Cache Mapping
data and instructions Physical Address

Cache
Ro HW
Cache
Levels
R WORD (1/2) "PAGES
Size of "BLOCK|| Logica
Address
registers Process
is 1 word

Ra1
Secondary

"WORD"
Storage
Processor Main Memory

Memory Hierarchy
e m a y

The memory hierarchy shows the accessing order of the existingi


system. Time

Average Memory Access Time=


All Instruction Fetch Time+ Write
Total instruction
Computer Organization and Architecture 545
m D E E R S H

Cachemiss p e n a l t i a s
naller cache block->
T Larger TAG1 Cache TAGoverhead
T Spatial locality
mismatch between fastest processor
iDose is to bridge the speed
memories at reasonable cost.
to the slowest
Access Time Size Frequency
Cost/bit (C)
(T) (S)

Flip flops)
Rogistorss

Cache (Static RAM)


(Random Access)

Main Memory (Dynamic RAM)- (+1)


Magnetic Disks

Access)
Magnetic Tapes (Sequential

refers to tn level memory if


it is found then "hit"
When the processor
else "miss".

Organisation
Types of Memory
the memory system, the memory
Based on the order of accessing
organisation is divided into two types: Processor

Simultaneous Access

In this the CPU directly communicates


with all the levels of memories.

Considera'n' level memory system.


Let H, H, . .
,
H, be the hit ratios
upto n level and T, To, T, be the
. . .

access time upto n level memory


system.
The average access time of the memory isS
H)H,,
avgH,T, +(1-H,) H,Tt... +(1-H,)(1-H)..(1-
Throughput of the memory word/sec =

Tav9
Cavo/bit d t C S, ++Cn Sn
=

S, + Sp +...+Sn
Hierarchical Access
e data from 2nd level is transferred to 1st level and CPU accesses
the data from 1st
level.
546 A Handbook for Electronics
Engineering
Consider a 'n' level memory system. MADE ERS

hit
(Processor
(Block/Word)
L2
(Block/Word)

(Block/Word)

Let
H,, H. ,
H, be the
hit ratios upton level memory
system and T
2 be the access time upto
n level memory system.
Tavg= H,T, +(1- H) H, (T, + T2) (1- H,) +
(1-H,) H(T,+T+ T)
.+((1-H,)(1- H2)..(1- H), (T,+ Tr-1 t.+T+ T)
Cache Memory
Cache is used as the intermediate memory between CPU and main
memory.
It is small and fast
memory.
By placing most frequently used data and instructions in
cache, the
average
access time can be
minimized.
When miss occurs the
processor directly obtains data from main
memory and a copy of it is send to cache
for future references.
The cache memory works on
the principle of
LOR states that "the Locality of Reference (LOH
references to memory at any
tends to be confined to within given interval of tine
localized areas of
The performance of
cache depends on:
memory
Cache size (small)
Number of levels of cache Cache block size
Cache mapping technique
Cache replacement policy
Cache updation policy
Cache is divided into equal
parts called as cache blocks/cache es
Data istransferred from main memory cache line
both are divided into blocks of to in form of blocks, thus

equal size.
Cache line size Main memory block size
=

Number of cache lines =Cache size


Block size
Computer Organization and Architecture
M A D E E R S Y

547
Mapping Techniques

r O e s s of transferring the data trom the main memory to cache


Th
called a mapping. The
ass mapping. There are three ditferent mapping techniques:
memory
is c a l l e d
Direct mapping ASsOClative mapping Set associative mapping

Direct Mapping
In this technique, mappig Tunction Is used to transfer the data frorn

to cache memory
main memory
tunction is K Mod N=i
The mapping
where K = Main memory bIock number; N= Number of cachelines

i =Cache memory line number Physical address (in bits

representation is: Tag Block Word


The address
Accessing same block from different pages is always a miss. The
replacementis done when "conflict miss" occurs.
Hit ratio is less.
The higher order bits of the address is compared simultaneously with
each cache tag. lf a match is not found it is miss. The delay of tag
is called hit latency or hit delay. So, the reference is forward
comparator
to the main memory.
The main memory control logic interprets the request into its known
format.

Tag Word
The tag field is directly connected with the address logic of the main
memory. Therefore, the corresponding block is enabled.
transferred to
mapping function the main memory block is
by using the
Later the CPU
Corresponding cache line along with the complete tag.
accesses the data from the cache.
Ihe number of bits in the tag is tag size or tag length.
he maximum number of tag comparisons =

Associative Mapping
in cache. Inus,
"the ain
memory block can be mapped to any location
cache address is not in use.
nedata is transferred from main memory to cache along wi
Complete tag. The address representation is
Physical address (in bits)-
Tag Word
for Electronics Engineering
A Handbook MADE ERS
548
the cache is full hen
is done only when
.The replacement
miss" occurs.
The higher
"Capact
orei
updaled in the tag tield.
The block
number is
its
ag. iOts
with each cache tag a
the address compared
is
sequentially
"miss".
matchi
not found then it
of comparisons = V(Nurmber of cache h
The maximum number tag
more.
of tag comparator is
The complexity Data memory size
Cache size Tag memory size
=
+

Number of cache lines x Number of bits in thel.


Data memory size =
ine
Set Associative Mapping
I n this mapping the cache memory is organised into sets, each setis
1s
main memory blocks.
capable to hold multiple
N
Number of sets P-way
where N Number of cache lines
P Number of MM blocks possible on each cache line
Physical address (in bits)-
The address interpretation is:
The mapping function used is
K mod S = i
Tag Set offset Wordoiset
where, K = Main memory block number
S Number of cache sets
i = Cache memory set number
T h e replacement is done when the set is full.
H i t ratio and complexity of tag comparator is optimal.
Maximum number of tag comparisons = P.
When P =1 set associative is direct mapping.
et
There is need of multiplexers to compare the existing tags in the
a
one by one with respect to CPU generated tag.
Tag memorysize = Number of sets in the cache x Number of blocks
the set x Number of tag bits in each block

Compulsory, Capacity and Conflict Misses


.Compulsory misses are caused by the first reference to a localio
memory. Cache size and associatively makes no difference tou
number of compulsory misses. Pre-fetching can help here, as can ia
cache block sizes (which are a form of pre-fetching).
Capacity misses are those misses that occur regardless of associatv
or block size, solely due to the finite size of the cache.
Computer Organization and Architecture
mADE ERS
549
Conflic misses are those misses that could have been avoided, had

the cache
not evicted an entry earlier
Double the associativity:
.Capacity and line size constant
Halves the number of sets.
Decrease conflict misses, no effect on compulsory and capacity
misses.
Typically higher associativity reduces conflict misses because there
are more places to put the same element.
line size:
Halving the
Associativity and number of sets constant.
Halves the capacity.
Increases compulsory and capacity misses but no effect on conflict
misses.
Shorter lines mean less "pre-fetching" for shorter lines. It reduces
the cache's ability to exploit spatial locality. Capacity has been cut
in half.

Doubling the number of sets:

Capacity and line size constant.


Halves the associativity.
Increases conflict misses but no effect on compulsory and capacity
misses.
Associativity is less.

Replacement Algorithms
When the cache is full, the existing block is replaced with new block
There are two replacement algorithms used
FIFO (First-In-First-Out): In replaces the cache block with new block

which is having longest time stamp.


with new
LRU (Least Recently Used): It replaces the cache block
DIOCk which is having less number of references with longest time stamp.

Updating Techniques
at different
Onerence: The same address contains different values
This property
updated).
s(cache is updated while main memory
is not
S
Called "coherence" which creates loss of data.
e are two updating techniques used: Write through and wile dd

Write Through
In this t e c h n i q u e , t h e C P U p e r f o r m s ut h e s i m u l t a n e o u s u p d a t i o n in t h e

cache memory and main memory. Thus, there is noconere ce.


550 A Handbook for Electronies Engineering
oE ERSM
Word Word updation | Word updation tre
updation (7 Time max =

time in cache
in main
rmerrory
avGead +(1-H) +
(Tm T)
read read read read read
hit data miss allocate data

avgwite w+(1-HM)+ (Tm+ T)


write word write write Word
hit updation miss allocate updation
time time
Total access time of memory, when both read
and write cvcle
considered is:

avgreadXavgeadwriteavgwrite
1
Throughput nwIeache word/sec
The hit ratio for the write
avgWT
operation is always 1,
axgante H,Tm where Tm = Main memory access time
Write Back
The CPU performs the write
is present but that doesn't
operation only on the cache, still coherence
create problem because before
cache block with new block the replacing the
updation are taking back into the man
memory.
Each cache line contains one extra
when the block is
bit, called modified bit (update bit
updated the corresponding modified bit is set.
Write Back
Write Back

CPU Words/
Cache
Blocks MM

1 (Dirty bit)
[Hierarchical: (TreadT write)
Modified bit read H. To+ (1- H) (Tcache Tmem-bloo
+%(dirty). Tmem-block
o (Clean bt)
Simultaneous: Tread =H. T+ (1- H)
(Tm+%(dirty)-Tmem-blook
avgead H T+(1-H) % dirty bits (T+ Tm+T)+% clean bits (T +T
read read road write read read read read
hit data mlss back allocate data allocate data
Computer Organization and
Architecture
MADE ERSY

551
Awit
T+ (1-H)+ %dirty bits (m +TT+
%clean bits (T. T.
write write write
back allocate data write write
allocate data
avarileback =eadX avGeadwriteXav
1
Throughput write back word/sec
aVgwrite back
Multilevel Caches
Ta reduce the miss penalty, multlevel caches are used in the systern
design

CPU L MM
Cache Cache
.In multilevel caches, two
kinds of miss rates will be calculated
) Local miss rate:

No. of misses in the cache


Local
Local miss
miss rate
rate =
Total No. of references to that cache
(ii) Global miss rate

Global miss rate = No. of misses in the cache


Total No. of CPU generated references
Average access time can be calculated as
avg- Hit timeL, +Miss rate,(Local miss rate)x Miss penalty
Miss penalty =
Hit time, +Miss
rate, x Miss penaltyt,
Miss penalty =
MM Access time
Average memory stall cycles (No. of miss
/nstruction x Hit time 2)+
(No. of misses,,/Instruction x Miss penalty,)
Note:
h e r e are three
types of caches based on whether the index or tag
correspond to physical or virtual addresses.
Physically indexed, physically tagged: Caches use the physical
address for both the index and the
tag
Virtualy indexed, physically tagged: Caches use the virtuai
address for the index and the
physical address in the tag.
Virtually
for
indexed, virtually tagged: Caches use the virtual address
both the index
***********. and the tag.****** .

'**"

You might also like