Computer Organization: Instructions and
Computer Organization: Instructions and
and Architecture 14
Machine Instructions and Addressing Modes
Machine Instructions
Types of instructions: Zero address instructions, one address
instructions, and two address instructions.
Implied/Implicit Addressing
Modes: In this mode the dat
data
itself. Therefore, there is no effective adcire
available in the opcode (ess.
Accumulator (CMA) and all zero addrae
Example:Compliment ress
instructions.
Immediate Addressing Mode:
This mode is used to access the constants or to initializ
value.
registers to a constant
Here the data is present in the address tield of the instruction
The range of values initialized is limited by the size of the
address field.
of immediate
I f the address field size is n-bit, the possible range
Constants or data is: 0 to 2 - 1.
Example: MOV A, 10
Direct/Absolute Addressing Mode
Used to access the static variables.
The data is present in the memory, that memory cell address is
present in the address field of the instruction. Data = [EA)
= [Memory Address]
One memory reference is required to read or write the data oy
using the direct addressing mode.
Example: MOV R,. [1000]
Auto Indexed Addressing Mode:
This mode is used to access the linear array elements. lhus,
ase address" is required to access the data.
The base address is maintained in the base register. Therelo
the EA is Base Register t Step Size.
Step size is dependency on the amount of the data to
accessed from the memory. Data = [EA] = [[Base Register
Step size
Auto Indexed AM
Mov Ro. +(R,) Mov Ro. (R,+ Mov Ro.-(R,) Mov Ro. (R
maDE ERSN
Computer Organization and Architecture
529
Indirect Addressing Mode: (Array as paramoter)
Used to implement the pointers. The EA is available in either
register or memory
This mode divided into two types:
is
Example: They are used to implement 'i, switch', 'for', 'do whil
and 'while' statements
while
Instruction set = opcode
Vector interrupt - Interrupting source supplies branch info at process
ssor
through an interrupted vector
INTAINTR
Position dependent.
Relocation at run time.
PC
General IR
Purpose MAR
ALU
Registers MDR
Memory Bus
Ono-bus Datapath
RBus
IfRn 1
Bus-R
IfRou1
PC Control
Unit
AR Main
MAR
IR
Memory
R] MDR
R
A R, out Y R
Constant
Select MUX
Data flow Into register
and Out from register
R
A B
R
Ri out R2in
Microprogram: The sequence of
micro-operation is called as the
microprogram.
The fetch cycle microprogram is:
T:PCMAR; PCu MAR
T: M[MAR] -> MBR; MARau MBR
Ta: MBR IR; MBR,out: IRn
PC-PC + Step size; PCut PCjin
The basic opérations
performed are:
Register transfer (R-R,): Rout Rzinonly one
ALU operation clock cycle requred)
(R R,+ A): R outY
2 out Select = 1, Add
Zou Ra in
Minimum number of clocks =3
If two data
paths are used: A Y FR, Select =
1, Add
Zou Rain
Minimum number of clocks =
2
Memory Read: A-M[(R,)]
ou MARn. read
Wait for memory function to complete (WMFC).
MDRO 2 in
Here number of clocks can be 2 or 3 depending on WMFC.
Memory Write: M[(R,)]-A,
ROu MARin
R2 0u MDR write
WMFC
Minimum number of clocks = 3.
MADE ERSH Computer Organization and Architecture 533
Control Unit
Commoon
Bus
PC
IR
Rin 1 > RBus
RyOut1 Bus R
Example: R, R,
R out R1out R2in
Control Unit
Vertical Programming
Horizontal Programming
534 A Handbook for Electronics Engineering
DE ERS
Hardwired Control Unit
The control signals are expressed as Sum-of-Product (SOP) exprecei
and they are directly realized on the independent hardware. Sion
Example: Yn =T,+ Ta ADD+T, BR. T T ADD T, BR
Here Yis enabled during T, for all instructions,
during 7, for ADD and so on..
Itcan be realised as:
It is the fastest control unit design
because of independent circuits.
F o r large number of instructions and control signals the
complexityof
hardware is more.
Relatively inflexible for any of the changes like modification, recorrection,
thus it is not used in design and testing places.
Used in the Real Time Applications.
Example: Air Craft simulation and weather forecasting etc.
Implemented in RISC processor.
Horizontal Vertical
(CW) (CW)
Flexible for large number of instructions and CS.
Control signals are generated slow in comparison to hardwired.
Based on the type of Control Word (Cw) stored in the Control Memory
(CM), the control unit is classified into two types:
Computer Organization and Architecture 535
Unit (1uPCUJ)
Horizolal MicroprOgramInOdCOrntrol
Vortical MicroprogIamnod
Control Unit (VuPCU)
Horirontal uPCU
binary forrmat that is
aro ropresontod in the decoded
Ihe control signals
1bit/CS
instructions (CPI 1)
5. One instruction/Cycle (CPI = 1) 5. pipeline
unsucoessful
6. Supports
6. Allows successful pipeline (CPI1)
implementation (CPI = 1) 7. Example: Pentium
****** *
536 A Handbook for Electronics Engineering
MADE ER
Types of Interfacess
Non-programmable Programmable
Configuration is predefined during the
design time, that can't be changed during| Configuration is predefined in Command
the operation. Example Buffer, Latch Word Register. Based on the value loaded
into the CWR, the configuration is changeo
Programmable
Serial Interface
(bit by bit transmission)
Parallel Interface
(More than one bit transmission)
Example: 8251 USART
Example: 8255 PPI
(Universal Synchronous) (Programmable Peripheral Interface)
Asynchronous Receiver Transmitter 8259 IC (Interrupt Controller)
8237 DMA
8 2 5 7DMA
mADE ERSY
Computer Organization and Architecture 4
537
/ O Transfer Modes
CPU ByteIC
Level
V O )10 MBPS
/o)50 KBPS
polling
HFE is top
privileqed instruction. and enadies
request
interprets the device
Corresponding
ransfers it
transfers
device for the operation.
it to the buffer.
538 A Handbook for Electronics Engineering
mADE ERS
When the data is available in the buffer then DMA enableo . .
signal, togain the control of the bus and waiting forthe
the acknow ne HOLD
acknowledg
After receiving the HLDA CPU
signal, the DMA transfers the
data from 1/O device to main HOLD
HILDA 8237
memory until the count or Seondary
Storage
Control 8257
becomes zero. After transfer Logic DMA Cornponerts
operation, it establish Buffer
connection to CPU.
During the DMA operation, the CPU is in two states
Busy State Block /Hold State
Untilpreparing the data, the CPU is busy with other executions.
While
transferring the data, the CPU is in blocked state.
L e t 'X' Preparation time (Depends on I/O
=
speed),
'Y = Transfer time (Depends on main memory speed).
IV Instruction Pipelining
The set of phases for an instruction execution cycle:
.Instruction Fetch (IF):
.MAR-PC [MAR-2000]
PC-PC +1 [Next instruction: 2004]1
READ CONTROL Signal is enabled
IR-MDR
Instruction Decode (ID): Two know the purpose of instruction.
Operand Fetch (OF):
Memory Read: MAR-LOC X
Write into any general purpose register: A, - MDR
MRDE ERSY
Computer Organization and Architecture 539
MDR-R 300
(10
.Write into memory L 450
Indirect Phase: Here Here
EA = 4500
EA 300
(Direct Address) (Indirect Address)
Interrupt Phase: (2 memory cycles required)
Each instruction execution involves a sequence of microoperations
Stage
Stagel Stage Segment output
Input
End S
Segment
R Segment
S2
S3 End
Clock
A Handbook for Electronics Engineering
540 DE EnS
Types of Pipelines
1. Linear Pipeline: This pipeline is used to perform only
ne spect
function.
2. Non-linear pipeline: This pipeline is used to perforrm the rmk
functionalities. It uses feed forward and teed backward conne
3. Synchronous Pipeline: On a common load/clock all the
transfer data to the next stages simultaneously.
registe
4. Asynchronous Pipeline: The data flow along the pipeline tages
controlled using handshake protocol.
Elpipe(K+(n-):tp
The performance gain of the pipeline processor over the non-pipeline is
tn=KS="P=S=K
Instruction Pipeline
The instruction pipeline operates on a stream of instruction
overlapping the phases of instruction cycle like Instruction Fetcn IF
Computer Organization and Architecture 541
m a D E E R S Y
Stall Cycle
during wrhich no meaningful operation performed for
is one
A stall Cycle
an instruction
arises
due to:
It instructions.
cycles needed by each stage for
( Uneven clock
Increased buffer overhead
(i)
(iii) Memory operands
(v) Pipelined d e p e n d e n c i e s
K(number of stage)
Sdeal cycles) (1+Stall frequencyx Stall cycles)
ef(1+Stall frequencyxStall
S pipelined
be achieved by using a
The maximum
speedup that can
number of stages.
=
processor
maxidealK
n
EfficiencyExKK+(n-1)
Number of tasks processed
the tasks
Throughput
=
Throughput (K+(n-1))-p
Advance Pipeline (RISC Pipeline) to execute
implemented
instruction pipeline is
In the RISC processor,
address is
To reduce the numberof
stalls in the available-1
pipeline due to the branch
Operation, branch prediction buffer or loop
buffer is used buffer or branch
target
Branch target buffer It is the high speed
at the instruction tetch buffer maintained
stage to store the
addresses predicted target
The out-of-order execution creates two more
the pipeline dependencies in
Hazards
Hazard isa delay. Delay creates extra cycles, these extra cycles without
operation is called as stalls.
Hazards are classified as:
.Write-After-Read (WAR)
Read-After-Write (RAW)
Write-After-Write (WAW) instruction
theinstructionjfollows
segment where
OSIaer a program
i in the program order.
to read the
created when
instruction j tries
MAW: This hazard is DEPENDENCY).
DATA
data before instruction i write it (TRUE betore
writes thedata
hazard is created
when instruction
Inis
instruction i reads it (ANTI DEPENDENCY)
data before
instruction i wriles
instruction / writes the
(OUTPUT DEPENDENCY).
Perfo with Stalls
Evaluation of the Pipeline
Cache
Ro HW
Cache
Levels
R WORD (1/2) "PAGES
Size of "BLOCK|| Logica
Address
registers Process
is 1 word
Ra1
Secondary
"WORD"
Storage
Processor Main Memory
Memory Hierarchy
e m a y
Cachemiss p e n a l t i a s
naller cache block->
T Larger TAG1 Cache TAGoverhead
T Spatial locality
mismatch between fastest processor
iDose is to bridge the speed
memories at reasonable cost.
to the slowest
Access Time Size Frequency
Cost/bit (C)
(T) (S)
Flip flops)
Rogistorss
Access)
Magnetic Tapes (Sequential
Organisation
Types of Memory
the memory system, the memory
Based on the order of accessing
organisation is divided into two types: Processor
Simultaneous Access
Tav9
Cavo/bit d t C S, ++Cn Sn
=
S, + Sp +...+Sn
Hierarchical Access
e data from 2nd level is transferred to 1st level and CPU accesses
the data from 1st
level.
546 A Handbook for Electronics
Engineering
Consider a 'n' level memory system. MADE ERS
hit
(Processor
(Block/Word)
L2
(Block/Word)
(Block/Word)
Let
H,, H. ,
H, be the
hit ratios upton level memory
system and T
2 be the access time upto
n level memory system.
Tavg= H,T, +(1- H) H, (T, + T2) (1- H,) +
(1-H,) H(T,+T+ T)
.+((1-H,)(1- H2)..(1- H), (T,+ Tr-1 t.+T+ T)
Cache Memory
Cache is used as the intermediate memory between CPU and main
memory.
It is small and fast
memory.
By placing most frequently used data and instructions in
cache, the
average
access time can be
minimized.
When miss occurs the
processor directly obtains data from main
memory and a copy of it is send to cache
for future references.
The cache memory works on
the principle of
LOR states that "the Locality of Reference (LOH
references to memory at any
tends to be confined to within given interval of tine
localized areas of
The performance of
cache depends on:
memory
Cache size (small)
Number of levels of cache Cache block size
Cache mapping technique
Cache replacement policy
Cache updation policy
Cache is divided into equal
parts called as cache blocks/cache es
Data istransferred from main memory cache line
both are divided into blocks of to in form of blocks, thus
equal size.
Cache line size Main memory block size
=
547
Mapping Techniques
Direct Mapping
In this technique, mappig Tunction Is used to transfer the data frorn
to cache memory
main memory
tunction is K Mod N=i
The mapping
where K = Main memory bIock number; N= Number of cachelines
Tag Word
The tag field is directly connected with the address logic of the main
memory. Therefore, the corresponding block is enabled.
transferred to
mapping function the main memory block is
by using the
Later the CPU
Corresponding cache line along with the complete tag.
accesses the data from the cache.
Ihe number of bits in the tag is tag size or tag length.
he maximum number of tag comparisons =
Associative Mapping
in cache. Inus,
"the ain
memory block can be mapped to any location
cache address is not in use.
nedata is transferred from main memory to cache along wi
Complete tag. The address representation is
Physical address (in bits)-
Tag Word
for Electronics Engineering
A Handbook MADE ERS
548
the cache is full hen
is done only when
.The replacement
miss" occurs.
The higher
"Capact
orei
updaled in the tag tield.
The block
number is
its
ag. iOts
with each cache tag a
the address compared
is
sequentially
"miss".
matchi
not found then it
of comparisons = V(Nurmber of cache h
The maximum number tag
more.
of tag comparator is
The complexity Data memory size
Cache size Tag memory size
=
+
the cache
not evicted an entry earlier
Double the associativity:
.Capacity and line size constant
Halves the number of sets.
Decrease conflict misses, no effect on compulsory and capacity
misses.
Typically higher associativity reduces conflict misses because there
are more places to put the same element.
line size:
Halving the
Associativity and number of sets constant.
Halves the capacity.
Increases compulsory and capacity misses but no effect on conflict
misses.
Shorter lines mean less "pre-fetching" for shorter lines. It reduces
the cache's ability to exploit spatial locality. Capacity has been cut
in half.
Replacement Algorithms
When the cache is full, the existing block is replaced with new block
There are two replacement algorithms used
FIFO (First-In-First-Out): In replaces the cache block with new block
Updating Techniques
at different
Onerence: The same address contains different values
This property
updated).
s(cache is updated while main memory
is not
S
Called "coherence" which creates loss of data.
e are two updating techniques used: Write through and wile dd
Write Through
In this t e c h n i q u e , t h e C P U p e r f o r m s ut h e s i m u l t a n e o u s u p d a t i o n in t h e
time in cache
in main
rmerrory
avGead +(1-H) +
(Tm T)
read read read read read
hit data miss allocate data
avgreadXavgeadwriteavgwrite
1
Throughput nwIeache word/sec
The hit ratio for the write
avgWT
operation is always 1,
axgante H,Tm where Tm = Main memory access time
Write Back
The CPU performs the write
is present but that doesn't
operation only on the cache, still coherence
create problem because before
cache block with new block the replacing the
updation are taking back into the man
memory.
Each cache line contains one extra
when the block is
bit, called modified bit (update bit
updated the corresponding modified bit is set.
Write Back
Write Back
CPU Words/
Cache
Blocks MM
1 (Dirty bit)
[Hierarchical: (TreadT write)
Modified bit read H. To+ (1- H) (Tcache Tmem-bloo
+%(dirty). Tmem-block
o (Clean bt)
Simultaneous: Tread =H. T+ (1- H)
(Tm+%(dirty)-Tmem-blook
avgead H T+(1-H) % dirty bits (T+ Tm+T)+% clean bits (T +T
read read road write read read read read
hit data mlss back allocate data allocate data
Computer Organization and
Architecture
MADE ERSY
551
Awit
T+ (1-H)+ %dirty bits (m +TT+
%clean bits (T. T.
write write write
back allocate data write write
allocate data
avarileback =eadX avGeadwriteXav
1
Throughput write back word/sec
aVgwrite back
Multilevel Caches
Ta reduce the miss penalty, multlevel caches are used in the systern
design
CPU L MM
Cache Cache
.In multilevel caches, two
kinds of miss rates will be calculated
) Local miss rate:
'**"