0% found this document useful (0 votes)
105 views

Pooja Vashisth

This document discusses hazards in pipelined processors and methods for mitigating them. It covers: 1. Data hazards and forwarding to resolve dependencies between instructions in the pipeline. 2. Scheduling instructions to avoid stalls from load-use hazards when a value isn't ready in the next cycle. 3. Control hazards from branches changing the instruction flow, and how branch prediction helps resolve them by guessing the target early to avoid stalls.

Uploaded by

Apurva Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

Pooja Vashisth

This document discusses hazards in pipelined processors and methods for mitigating them. It covers: 1. Data hazards and forwarding to resolve dependencies between instructions in the pipeline. 2. Scheduling instructions to avoid stalls from load-use hazards when a value isn't ready in the next cycle. 3. Control hazards from branches changing the instruction flow, and how branch prediction helps resolve them by guessing the target early to avoid stalls.

Uploaded by

Apurva Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

CSC 258

Pooja Vashisth
CSC
CSC 258
258

2
This Week’s Learning Goals

• Identify and describe the hazards that must be mitigated in a pipelined


processor.

• Describe and compare different methods for mitigating hazards.


• Forwarding
• Scheduling
• Prediction

3
Today’s Plan
• Data Hazards and forwarding
• Scheduling
• Control Hazards and Prediction

4
CSC 258

5
RISC-V Instructions

RISC-V instructions can be split


into stages.

1. IF: Instruction fetch from


memory
2. ID: Instruction decode &
register read
3. EX: Execute operation or
calculate address
4. MEM: Access memory operand
5. WB: Write result back to register

6
The Pipelined Processor

7
Hazards

A hazard prevents the next instruction from starting in the


next cycle

Structure hazards
A required resource is busy
Data hazards
A previous instruction must complete its data read/write
Control hazards
A previous (branch) will determine the next instruction to
execute

8
Structural Hazards
Structural hazards are caused by conflicts for a resource

Example:
Instruction fetch requires access to memory to get an instruction.
A load or store instruction requires access to memory to manipulate data
If there was a single point of access to memory, it could not handle both simultaneously: a
structural hazard

Good design can avoid structural hazards. But what about control and data hazards?

9
CSC 258

10
Data Hazards

An instruction needs data from a prior instruction.


add x19, x0, x1
sub x2, x19, x3

11
Performance Penalty

The simple solution is to stall. This inserts a nop into the


pipeline.

Since our speedup depends on the pipeline being full, this


can tank performance.

How do we handle this more effectively than just


stalling?

12
Forwarding (Bypassing)

No need to wait for values to be written back.


Create additional connections in the datapath to allow recently
computed values to be used

13
What do we need for forwarding?
First, we need to detect the need to forward.

Second, we must send data to the correct stage and have it


selected when it’s needed.

14
Detecting the Need to Forward

Recall: registers are being passed along the pipeline

Let’s define a notation:


ID/EX.RegisterRs1 = Rs1 in the ID/EX pipeline register
Then, data hazards occur when …
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1
1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1
2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2

15
Quick Check
Using this breakdown …
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1
1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1
2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2

Which kind of hazard is occurring below?

addi x7, x3, 42


sub x6, x3, x2
add x7, x7, x6

16
Forwarding Paths

The forwarding unit


detects a hazard
condition.

It emits a control
signal to change
the value selected
by the multiplexer.

17
Activity: Analyzing Hazards

Identify all the data hazards in the following code, categorize it, and
explain how it might be resolved.

add x7, x3, x4


addi x7, x7, 2
sw x7, 0(x2)
lw x7, 4(x2)
addi x7, x7, 1

18
CSC 258

19
Revisiting the activity…
The last hazard in this sequence is difficult to resolve! We can’t easily
forward the value …

add x7, x3, x4


addi x7, x7, 2
sw x7, 0(x2)
lw x7, 4(x2)
addi x7, x7, 1

20
Load-Use Data Hazard

Forwarding cannot resolve all stalls.


A loaded value will not be available for an instruction in the next
cycle.

21
How to Stall the Pipeline

Force control values in ID/EX register to 0


This leads the EX, MEM and WB stages to perform nop (no-operation)

Prevent update of PC and IF/ID register


This results in the current instruction being decoded again in the next
cycle

22
Datapath with Hazard Detection

Hazard detection unit


is placed in ID stage.

Here, it can easily


introduce a bubble
by zeroing out the
ID/EX registers

23
How to Stall the Pipeline

Force control values in ID/EX register to 0


This leads the EX, MEM and WB stages to perform nop (no-operation)

Prevent update of PC and IF/ID register


This results in the current instruction being fetched again in the next
cycle

This reduces throughput. One stage of the pipeline is now empty, not
doing productive work. Is there a better answer?

24
Code Scheduling to Avoid Stalls

Can reorder code to avoid use of load result in the next instruction
C code for a = b + e; c = b + f;

ld x1, 0(x0) ld x1, 0(x0)


ld x2, 8(x0) ld x2, 8(x0)
stall add x3, x1, x2 ld x4, 16(x0)
sd x3, 24(x0) add x3, x1, x2
ld x4, 16(x0) sd x3, 24(x0)
stall add x5, x1, x4 add x5, x1, x4
sd x5, 32(x0) sd x5, 32(x0)
13 cycles 11 cycles

25
Revisiting the activity…
Go back to your answer to the previous activity. Can you provide a
different solution to resolving the lw-addi hazard?

add x7, x3, x4


addi x7, x7, 2
sw x7, 0(x2)
lw x7, 4(x2)
addi x7, x7, 1

26
The Bottom Line:
Stalls and Performance

Stalls reduce performance, so we avoid them at all costs.

The addition of hardware to detect hazards and forward data


can help.

In some situations, we rely on the compiler (or even more


complex hardware) to rearrange the instruction stream.

27
CSC 258

28
Control Hazards

Branches change the next instruction to execute

As a result, the pipeline can’t always fetch correct instruction


… or even know what the correct instruction to fetch will be.

We could just stall …


… but we can’t determine the next instruction until AFTER the
execution stage.
… so it’s better to compute the target as early as possible and to make
an educated guess about the next instruction.

29
Reducing Branch Delay

First, move hardware to determine outcome to ID stage


Target address adder and/or a memory to store previously computed
targets (the “branch target buffer”)
Register comparator

Second, add hardware to choose whether to load target address or


PC + 4

30
Branch Prediction

In a deep pipeline, the stall penalty is too high


Branches are common instructions!

So … let’s predict the outcome of the branch


If the prediction is wrong, then it’s no worse than stalling.
And if the prediction is correct, there is no penalty.

Easiest prediction: not taken


Just fetch instruction after branch, with no delay

31
Can we do better?

The compiler can make not taken work pretty well.


Code can be built to use only unconditional jumps and branches that are likely
to not be taken.

But the cost of a missed prediction is very high on a deep pipeline.


Modern pipelines are typically 10-14 stages (with a max of 31!)
So there has been a lot of work on improving branch prediction.

32
(More) Advanced Branch Prediction

Static branch prediction


Predict backward branches taken (e.g., the end of a loop body)
Predict forward branches not taken (e.g., if statements)

Dynamic branch prediction


Track historical branch behaviour and based on that history
Use counters to track “taken” vs. ”not taken”

33
Branch Prediction Impact
The following two textbook problems are in group homework as well.

4.28: This problem quantifies the impact of a good branch predictor.

4.29: This problem focuses on how various predictors perform on a


single pattern

34
Problem Text

35
Next topic: Interrupts and Exceptions

There’s one additional control issue we need


to discuss: interrupts!

How do get user input?

How do we stop the processor when its


running a program?

36
CSC
CSC 258
258

37
This Week’s Learning Goals
1. Describe how the RISC-V architecture services an exception/interrupt.
2. Explain how interrupts are used to interact with external devices (typically
I/O).
3. Explain how interrupts are used to implement system calls.

38
Today’s Plan
• Handling Exceptions
• Pipelining and Exceptions
• Exceptions and System Calls, I/O

39
Why?

1. What happens when the processor needs to stop


executing a user program to do something else?
2. How does the operating system run code with the user
seeing or influencing it?
3. What happens when we get I/O? (Keyboard input? Data
from a network?)

So far, we have seen how to execute a single program. But


our machines need to do more than one thing at once.

40
Exceptions and Interrupts

Exceptions are ”unexpected” (unpredictable) events requiring non-


user code to be run.

Different ISAs use the terms exception and interrupt differently


Exception: generally come from within the CPU (syscall, floating
point error, …)
Interrupt: generally generated by an external device

Handling exceptions without sacrificing performance is challenging!

41
How to Handle an Exception

Quick challenge: How would you do this?

42
How to Handle an Exception
1. Pause and save current location in running user process
Also need to make sure user state is not modified
2. Store cause of the exception/interrupt
3. Invoke a handler that will deal with the issue
Quick challenge: How would you do this?

43
How to Handle an Exception (in RISC-V)

1. Pause and save current location in running user process


Like a function call! Store the PC (SEPC register) and save registers that will be
modified (stack)
2. Store cause of the exception/interrupt
We use an SCAUSE register
3. Invoke a handler that will deal with the issue
A single handler is used to handle all exceptions

44
One Handler vs. Many

RISC-V uses a single exception handler, but that’s not the only possibility

Other systems use Vectored Interrupts


Here, the handler address is determined by the cause
1. To call the correct a function, a vector table is maintained, where a
handler is registered for every category of exception.
2. The cause register is used to index into this table to invoke the correct
handler.

45
Single Handler Duties

1. Set up any resources required by the handler (or subsequent code)


2. Read the SCAUSE register
3. If the cause allows the user program to restart …
Handle the error or transfer to code that can
Use the SEPC to return to program
4. Else …
Terminate program and report the error

46
CSC 258

47
Exceptions … and Pipelining

The first step becomes more complicated in a pipelined


processor!
1. Pause and save current location in running user process
Like a function call! Store the PC (SEPC register) and save registers that
will be modified (stack)
2. Store cause of the exception/interrupt
We use an SCAUSE register
3. Invoke a handler that will deal with the issue
A single handler is used to handle all exceptions

48
Exceptions: a Control Hazard
Consider malfunction on add in EX stage …
add x1, x2, x1
Must prevent x1 from being clobbered
Must complete previous instructions
To do so, flush add and subsequent instructions – but keep previous
The steps required are similar to a mispredicted branch

49
Pipeline with Exceptions

50
Exception Example

Exception on add …
40 sub x11, x2, x4
44 and x12, x2, x5
48 orr x13, x2, x6
4c add x1, x2, x1
50 sub x15, x6, x7
54 ld x16, 100(x7)

Handler
1C090000 sd x26, 1000(x10)
1c090004 sd x27, 1008(x10)

51
Exception Example

52
Exception Example

53
What about multiple exceptions?

We could have multiple exceptions at once …


A pipelined processor has more than one instruction in flight
Or an external interrupt could occur while an exception is occurring.

Simple approach: have a priority


Deal with external interrupt first. Flush all instructions … re-execution will cause
the internal exception to occur again.
Deal with the earliest exception first. Re-execution will cause the second
exception.
This has a performance cost!

But in complex pipelines


Multiple instructions are issued per cycle, some out-of-order
Maintaining precise exceptions is difficult!

54
CSC 258

55
Activity: Branch Prediction Impact
We’re going to explore how exceptions are handled by the pipelined
processor.

Exercise 4.30 in the text

56
Exercise Text

57
CSC 258

58
We’ve seen how to support exceptions …

But how do we use them to do useful things?

• System calls

• Handling I/O

59
System Calls

In RVS, you used the ecall instruction to invoke a system call.

A question for you: How does ecall differ from a procedure call?

60
System Calls

In RVS, you used the ecall instruction to invoke a system call.

A question for you: How does ecall differ from call?

ecall invokes the exception handler. It causes an exception.


Think of it as an intentional error that can be discovered at the
ID stage.

61
Handling System Calls

When a system call is detected …

Normal exception handling occurs: the PC is saved, the handler is


invoked
Within the handler, the CAUSE is set to be system call. And the user
has provided key information.
a7 contains the specific system call that is desired.
Arguments are placed in a0 and a1.
The handler invokes the correct code – in the operating system – for
handling the specific call that is desired.

62
Other Examples?
Timers: the OS can set a timer in the hardware which causes an interrupt when
the timer expires
Signals: signals (sent from one process to another) are implemented as interrupts
Virtual memory: the OS needs to provide the illusion that every process has
access to all of its memory (but also needs to store some of that memory on disk).
Interrupts are used when memory that is not loaded is accessed.

63
Coming Up
64
Don’t Forget!

• Term-Test2 (ch3, 4, App-A), this week on Wednesday


• Submit READY? Quizzes before next classes
• Participate in Peer discussion and Q/A every week
• Submit the upcoming Quiz 5 before deadline (Mar. 24)
• Check your labs schedule… (Lab M)

65
Don’t Forget!

• Practice questions for the week


(also, part of homework 3)
• #?: 4.7, 4.8, 4.12*, 4.16, 4.17, 4.18, 4.19
• #?: 4.20, 4.22*, 4.23, 4.24, 4.26*, 4.28, 4.30 (till
4.30.3)

66
Next week: Memory and Storage

Two Big Ideas: Locality and the Memory Wall

Introduction to the memory hierarchy


Evaluating the performance of memory systems
(latency, bandwidth)

67
See you next
week!
68

You might also like