0% found this document useful (0 votes)
3 views

8.Week

The document provides an overview of advanced computer architecture, focusing on the central processing unit (CPU) structure and function, including topics like processor organization, instruction cycles, and pipelining. It details the roles of various registers, the instruction cycle stages, and the concept of instruction pipelining to enhance execution efficiency. Additionally, it discusses pipeline hazards, including resource, data, and control hazards, and their implications on instruction execution.

Uploaded by

Aiman Al Arab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

8.Week

The document provides an overview of advanced computer architecture, focusing on the central processing unit (CPU) structure and function, including topics like processor organization, instruction cycles, and pipelining. It details the roles of various registers, the instruction cycle stages, and the concept of instruction pipelining to enhance execution efficiency. Additionally, it discusses pipeline hazards, including resource, data, and control hazards, and their implications on instruction execution.

Uploaded by

Aiman Al Arab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

(Advanced) Computer Architechture

Prof. Dr. Hasan Hüseyin BALIK


(8th Week)
Outline
4. The central processing unit
—Processor Structure and Function
—Reduced Instruction Set Computers (RISCs)
—Instruction-Level Parallelism and Superscalar
Processors
—Control Unit Operation and Microprogrammed
Control
+
4.1 Processor Structure and Function
4.1 Outline
• Processor Organization
• Register Organization
• Instruction Cycle
• Instruction Pipelining
• Processor Organization for Pipelining
• The x86 Processor Family
• The ARM Processor
Processor Organization

Processor Requirements:
• Fetch instruction
– The processor reads an instruction from memory (register, cache, main memory)
• Interpret instruction
– The instruction is decoded to determine what action is required
• Fetch data
– The execution of an instruction may require reading data from memory or an I/O
module
• Process data
– The execution of an instruction may require performing some arithmetic or logical
operation on data
• Write data
– The results of an execution may require writing data to memory or an I/O module
• In order to do these things the processor needs to store some data
temporarily and therefore needs a small internal memory
Arithmetic and Logic Unit

Status Flags
Registers

Shifter

I nternal CPU Bus


Complementer

Arithmetic
and
Boolean
Logic

Control
Unit

Control
Paths

Figure: Internal Structure


Figure 16.1 I nternal Structur e of theof
CPU the CPU
Register Organization

• Within the processor there is a set of registers that function as


a level of memory above main memory and cache in the
hierarchy

• The registers in the processor perform two roles:


User-Visible Registers Control and Status Registers

• Enable the machine or • Used by the control unit to


assembly language control the operation of the
programmer to minimize main processor and by privileged
memory references by operating system programs to
optimizing use of registers control the execution of
programs
User-Visible Registers
Categories:
Referenced by means of • General purpose
the machine language • Can be assigned to a variety of functions by
the programmer
that the processor • Data
executes • May be used only to hold data and cannot be
employed in the calculation of an operand
address
• Address
• May be somewhat general purpose or may be
devoted to a particular addressing mode
• Examples: segment pointers, index registers,
stack pointer
• Condition codes
• Also referred to as flags
• Bits set by the processor hardware as the result
of operations
Control and Status Registers

Four registers are essential to instruction execution:


• Program counter (PC)
– Contains the address of an instruction to be fetched
• Instruction register (IR)
– Contains the instruction most recently fetched
• Memory address register (MAR)
– Contains the address of a location in memory
• Memory buffer register (MBR)
– Contains a word of data to be written to memory or the word most
recently read
Program Status Word (PSW)

Register or set of registers that


contain status information

Common fields or flags include:


• Sign:Contains the sign bit of the result of the last
arithmetic operation
• Zero:Set when the result is 0
• Carry:Set if an operation resulted in a carry
• Equal:Set if a logical compare result is equality
• Overflow: Used to indicate arithmetic overflow
• Interrupt Enable/Disable: Used to enable or disable
interrupts
• Supervisor:Indicates whether the processor is
executing in supervisor or user mode
Data registers General registers General Registers
D0 AX Accumulator EAX AX
D1 BX Base EBX BX
D2 CX Count ECX CX
D3 DX Data EDX DX
D4
D5 Pointers & index ESP SP
D6 SP Stack ptr EBP BP
D7 BP Base ptr ESI SI
SI Source index EDI DI
Address registers DI Dest index
A0 Program Status
A1 Segment FLAGS Register
A2 CS Code I nstruction Pointer
A3 DS Data
A4 SS Stack (c) 80386 - Pentium 4
A5 ES Extrat
A6
A7´ Program status
Flags
I nstr ptr
Program status
Program counter (b) 8086
Status register

(a) M C68000

Figure:Figure
Example Microprocessor
16.2 Example M icroprocessor Register Organizations
Register Organizations
Instruction
Cycle Includes the following
stages:

Fetch Execute Interrupt

If interrupts are enabled


Read the next and an interrupt has
Interpret the opcode
instruction from occurred, save the
and perform the
memory into the current process state
indicated operation
processor and service the
interrupt
Figure: The Instruction Cycle
I ndirection I ndirection

I nstruction Operand Operand


fetch fetch store

Multiple Multiple
operands results

I nstruction I nstruction Operand Operand


Data
address operation address address
Operation
calculation decoding calculation calculation

Instruction complete, Return for string


fetch next instruction or vector data
No
interrupt I nterrupt
check

Interrupt

I nterrupt

Figure 16.4 I nstruction Cycle State Diagram


Figure: Instruction Cycle State Diagram
CPU

PC M AR
M emory

Control
Unit

IR M BR

Address Data Control


Bus Bus Bus
MBR = Memory buffer register
MAR = Memory address register
IR = Instruction register
PC = Program counter

Figure: Data Flow,


Figure Fetch
16.5 Data Cycle
Flow, Fetch Cycle
CPU

M AR
M emory

Control
Unit

M BR

Address Data Control


Bus Bus Bus

Figure: Data
Figure 16.6 Flow, Indirect
Data Flow, Cycle
I ndirect Cycle
CPU

PC M AR
M emory

Control
Unit

M BR

Address Data Control


Bus Bus Bus

Figure 16.7 Data Flow, I nterrupt Cycle


Figure: Data Flow, Interrupt Cycle
Pipelining Strategy
To apply this concept
to instruction
execution we must
Similar to the use of recognize that an
an assembly line in a instruction has a
manufacturing plant number of stages

New inputs are


accepted at one end
before previously
accepted inputs
appear as outputs at
the other end
I nstruction I nstruction Result
Fetch Execute

(a) Simplified view

Wait New address Wait

I nstruction I nstruction Result


Fetch Execute

Discard
(b) Expanded view

Figure: Two-Stage
Figure 16.8 Two-StageInstruction Pipeline
I nstruction Pipeline
Figure: Simplified Pipeline Architecture
Additional Stages

• Fetch instruction (FI) • Fetch operands (FO)


– Read the next expected instruction – Fetch each operand from
into a buffer memory
– Operands in registers need
• Decode instruction (DI) not be fetched
– Determine the opcode and the
operand specifiers
• Execute instruction (EI)
– Perform the indicated
• Calculate operands (CO) operation and store the
result, if any, in the specified
– Calculate the effective address of
destination operand location
each source operand
– This may involve displacement, • Write operand (WO)
register indirect, indirect, or other – Store the result in memory
forms of address calculation
Time

1 2 3 4 5 6 7 8 9 10 11 12 13 14
I nstruction 1 FI DI CO FO EI WO

I nstruction 2 FI DI CO FO EI WO

I nstruction 3 FI DI CO FO EI WO

I nstruction 4 FI DI CO FO EI WO

I nstruction 5 FI DI CO FO EI WO

I nstruction 6 FI DI CO FO EI WO

I nstruction 7 FI DI CO FO EI WO

I nstruction 8 FI DI CO FO EI WO

I nstruction 9 FI DI CO FO EI WO

Figure: Timing Diagram for Instruction Pipeline Operation


Figure 16.10 Timing Diagram for I nstruction Pipeline Operation
Time Branch Penalty

1 2 3 4 5 6 7 8 9 10 11 12 13 14
I nstruction 1 FI DI CO FO EI WO

I nstruction 2 FI DI CO FO EI WO

I nstruction 3 FI DI CO FO EI WO

I nstruction 4 FI DI CO FO

I nstruction 5 FI DI CO

I nstruction 6 FI DI

I nstruction 7 FI

I nstruction 15 FI DI CO FO EI WO

I nstruction 16 FI DI CO FO EI WO

Figure: The Effect of a Conditional Branch on Instruction


Figure 16.11 The Effect of a Conditional Branch on I nstruction Pipeline Operation
Pipeline Operation
Fetch
I nstruction
FI

Decode
DI I nstruction

Calculate
CO Operands

Yes Uncon-
ditional
Branch?

No

Fetch
FO Operands

Execute
EI I nstruction

Update Write
PC
WO Operands

Empty
Pipe Yes Branch No
or
I nter
-rupt?

Figure 16.12 Six-Stage I nstruction Pipeline


Figure: Six-Stage CPU Instruction Pipeline
FI DI CO FO EI WO FI DI CO FO EI WO

1 I1 1 I1

2 I2 I1 2 I2 I1

3 I3 I2 I1 3 I3 I2 I1

4 I4 I3 I2 I1 4 I4 I3 I2 I1

5 I5 I4 I3 I2 I1 5 I5 I4 I3 I2 I1

6 I6 I5 I4 I3 I2 I1 6 I6 I5 I4 I3 I2 I1
Time

7 I7 I6 I5 I4 I3 I2 7 I7 I6 I5 I4 I3 I2

8 I8 I7 I6 I5 I4 I3 8 I 15 I3

9 I9 I8 I7 I6 I5 I4 9 I 16 I 15

10 I9 I8 I7 I6 I5 10 I 16 I 15

11 I9 I8 I7 I6 11 I 16 I 15

12 I9 I8 I7 12 I 16 I 15

13 I9 I8 13 I 16 I 15

14 I9 14 I 16

Figure: An Alternative Pipeline


(a) No branches
Depiction
(b) With conditional branch
12

10 k = 12 stages

Speedup factor
8
k = 9 stages
6
k = 6 stages
4

0
1 2 4 8 16 32 64 128
Number of instructions (log scale)
(a)

14

12
n = 30 instructions
10
Speedup factor

8 n = 20 instructions

6
n = 10 instructions
4

0
0 5 10 15 20
Number of stages
(b)

Figure: Speedup Factors with Instruction Pipelining


Figure 16.14 Speedup Factors with I nstruction Pipelining
Pipeline Hazards
Occur when the
pipeline, or some
portion of the There are three
pipeline, must stall types of hazards:
because conditions • Resource
do not permit • Data
continued execution • Control

Also referred to as a
pipeline bubble
Clock cycle
1 2 3 4 5 6 7 8 9
I1 FI DI FO EI WO

I nstrutcion
I2 FI DI FO EI WO

I3 FI DI FO EI WO

I4 FI DI FO EI WO

(a) Five-stage pipeline, ideal case

Clock cycle
1 2 3 4 5 6 7 8 9
I1 FI DI FO EI WO
I nstrutcion

I2 FI DI FO EI WO

I3 I dle FI DI FO EI WO

I4 FI DI FO EI WO

(b) I 1 source operand in memory

Figure: Example of Resource Hazard


Clock cycle
1 2 3 4 5 6 7 8 9 10
ADD EAX, EBX FI DI FO EI WO

SUB ECX, EAX FI DI I dle FO EI WO

I3 FI DI FO EI WO

I4 FI DI FO EI WO

Figure: Example of Data Hazard


Figure 16.16 Example of Data Hazard
Types of Data Hazard
• Read after write (RAW), or true dependency
– An instruction modifies a register or memory location
– Succeeding instruction reads data in memory or register location
– Hazard occurs if the read takes place before write operation is
complete
• Write after read (WAR), or antidependency
– An instruction reads a register or memory location
– Succeeding instruction writes to the location
– Hazard occurs if the write operation completes before the read
operation takes place
• Write after write (WAW), or output dependency
– Two instructions both write to the same location
– Hazard occurs if the write operations take place in the reverse order
of the intended sequence
Control Hazard

• Also known as a branch hazard


• Occurs when the pipeline makes the wrong decision on
a branch prediction
• Brings instructions into the pipeline that must
subsequently be discarded
• Dealing with Branches:
– Multiple streams
– Prefetch branch target
– Loop buffer
– Branch prediction
– Delayed branch
Multiple Streams
A simple pipeline suffers a penalty for a
branch instruction because it must choose one
of two instructions to fetch next and may make
the wrong choice

A brute-force approach is to replicate the


initial portions of the pipeline and allow the
pipeline to fetch both instructions, making
use of two streams

Drawbacks:
• With multiple pipelines there are contention delays for
access to the registers and to memory
• Additional branch instructions may enter the pipeline
before the original branch decision is resolved
Prefetch Branch Target
• When a conditional branch is recognized, the target of the
branch is prefetched, in addition to the instruction following
the branch

• Target is then saved until the branch instruction is executed

• If the branch is taken, the target has already been prefetched

• IBM 360/91 uses this approach


Loop Buffer
• Small, very-high speed memory maintained by the instruction
fetch stage of the pipeline and containing the n most recently
fetched instructions, in sequence
• Benefits:
– Instructions fetched in sequence will be available without the usual
memory access time
– If a branch occurs to a target just a few locations ahead of the address
of the branch instruction, the target will already be in the buffer
– This strategy is particularly well suited to dealing with loops

• Similar in principle to a cache dedicated to instructions


– Differences:
▪ The loop buffer only retains instructions in sequence
▪ Is much smaller in size and hence lower in cost
Branch Prediction

• Various techniques can be used to predict whether a


branch will be taken:
• These approaches are static
1. Predict never taken
• They do not depend on the
2. Predict always taken
execution history up to the time of
3. Predict by opcode the conditional branch instruction

1. Taken/not taken switch • These approaches are dynamic


2. Branch history table • They depend on the execution history
Intel 80486 Pipelining
Fetch
Objective is to fill the prefetch buffers with new data as soon as the Operates independently of the other stages to keep the prefetch
old data have been consumed by the instruction decoder buffers full

Decode stage 1
All opcode and addressing-mode 3 bytes of instruction are passed to the D1 D1 decoder can then direct the D2 stage to
information is decoded in the D1 stage stage from the prefetch buffers capture the rest of the instruction

Decode stage 2
Also controls the computation of the more complex addressing
Expands each opcode into control signals for the ALU
modes

Execute

Stage includes ALU operations, cache access, and register update

Write back
Updates registers and status flags modified during the preceding execute stage
Figure: Approaches to Pipeline Organization
Figure: Improved Pipeline Organization
Interrupt Processing

Interrupts and Exceptions


• Interrupts
– Generated by a signal from hardware and it may occur at random
times during the execution of a program
– Maskable
– Nonmaskable
• Exceptions
– Generated from software and is provoked by the execution of an
instruction
– Processor detected
– Programmed
• Interrupt vector table
– Every type of interrupt is assigned a number
– Number is used to index into the interrupt vector table
The ARM Processor

ARM is primarily a RISC system with the following


attributes:
• Moderate array of uniform registers
• A load/store model of data processing in which operations only perform on
operands in registers and not directly in memory
• A uniform fixed-length instruction of 32 bits for the standard set and 16 bits for
the Thumb instruction set
• Separate arithmetic logic unit (ALU) and shifter units
• A small number of addressing modes with all load/store addresses determined
from registers and instruction fields
• Auto-increment and auto-decrement addressing modes are used to improve the
operation of program loops
• Conditional execution of instructions minimizes the need for conditional branch
instructions, thereby improving pipeline efficiency, because pipeline flushing is
reduced
External memory (cache, main memory)

M emory address register M emory buffer register

I ncrementer Sign
R15 (PC) extend

Rd
User Register File (R0 - R15)

Rn Rm Acc

I nstruction register

Barrel
shifter
I nstruction
decoder

M ultiply/
ALU
accumulate
Control
unit
CPSR

Figure: Simplified ARM Organization


Figure 16.28 Simplified ARM Organization
Processor Modes
Most application
programs execute in
ARM user mode
architecture • While the processor is in
supports seven user mode the program
being executed is unable
execution to access protected
modes system resources or to
change mode, other than
by causing an exception
to occur

Remaining six Advantages to defining


so many different
execution modes privileged modes
are referred to as •The OS can tailor the use of
privileged modes system software to a variety
of circumstances
• These modes are •Certain registers are
used to run system dedicated for use for each of
the privileged modes, allows
software swifter changes in context
Exception Modes

Have full access to


Entered when
system resources
specific
and can change
exceptions occur
modes freely

Exception modes: System mode:


• Supervisor mode • Not entered by any
exception and uses the
• Abort mode same registers available
• Undefined mode in User mode
• Fast interrupt mode • Is used for running
certain privileged
• Interrupt mode operating system tasks
• May be interrupted by
any of the five exception
categories

You might also like