Arch5 Precise Exceptions Afterlecture
Arch5 Precise Exceptions Afterlecture
ETH Zürich
Spring 2023
20 April 2023
Agenda for Today & Next Few Lectures
n Prior to last week: Microarchitecture Fundamentals
q Single-cycle Microarchitectures
q Multi-cycle Microarchitectures
Problem
Algorithm
n Last week & today: Pipelining
Program/Language
q Pipelining System Software
q Pipelined Processor Design SW/HW Interface
n Control & Data Dependence Handling Micro-architecture
n Precise Exceptions: State Maintenance & Recovery Logic
Devices
n Tomorrow: Out-of-Order Execution Electrons
q Out-of-Order Execution
q Issues in OoO Execution: Load-Store Handling, …
2
Readings
n Past weeks & today
q Pipelining
n H&H, Chapter 7.5
q Pipelining Issues
n H&H, Chapter 7.7, 7.8.1-7.8.3
E E E E E E E E ...
Load/store
5
Issues in Pipelining: Multi-Cycle Execute
n Instructions can take different number of cycles in EXECUTE
stage
q Integer ADD versus Integer DIVide
Exception-causing
DIV R4 ß R1, R2 F D E E E E E E E E W instruction
ADD R3 ß R1, R2 F D E W
F D E W
F D E W
DIV R2 ß R5, R6 F D E E E E E E E E W
ADD R7 ß R5, R6 F D E W
F D E W
Delayed “instruction”
due to exception
Exception-causing “instruction”
Time: 12:55 7
An Example Exception
Exception-causing
“instruction”
Time: 12:57 8
An Example Exception
Time: 12:58 9
An Example Exception
Time: 13:00 10
Another View
11
Exception Handled & Resolved…
Exception-causing
“instruction”
Time: 13:06 12
Exceptions and Interrupts
n “Unplanned” changes or interruptions in program execution
n Interrupt examples
q I/O device needing service (e.g., keyboard input, video input)
q (Periodic) system timer expiration
q Power failure
q Machine check
q …
14
Exceptions vs. Interrupts
n Cause
q Exceptions: internal to the running thread
q Interrupts: external to the running thread
n When to Handle
q Exceptions: when detected (and known to be non-speculative)
q Interrupts: when convenient
n Except for very high priority ones
q Power failure
q Machine check (error)
DIV R4 ß R1, R2
Precise state
ADD R3 ß R1, R2
(clean separation of
DIV R2 ß R5, R6 sequential instructions)
ADD R7 ß R5, R6
16
Checking for and Handling Exceptions in Pipelining
17
Aside: From the x86-64 ISA Manual
https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html 18
Why Do We Want Precise Exceptions?
n Semantics of the von Neumann model ISA specifies it
q Remember von Neumann vs. Dataflow
19
Ensuring Precise Exceptions
n Easy to do in single-cycle and multi-cycle machines
n Single-cycle
q Instruction boundary == Cycle boundary
q An instruction is guaranteed to be finished in one cycle
à no possibility of violating sequential execution semantics
n Multi-cycle
q Add special states in the control FSM that lead to the
exception or interrupt handlers
q Switch to the handler only at a precise state
à before fetching the next instruction
n mfc0 instruction
is used to copy
the exception
cause into a
general-purpose
register
23
Multi-Cycle Execute: More Complications
n Instructions can take different number of cycles in EXECUTE
stage à This complicates exception/interrupt handling
Exception-causing
DIV R4 ß R1, R2 F D E E E E E E E E W instruction
ADD R3 ß R1, R2 F D E W
F D E W
F D E W
DIV R2 ß R5, R6 F D E E E E E E E E W
ADD R7 ß R5, R6 F D E W
F D E W
DIV R3 ß R1, R2 F D E E E E E E E E W
ADD R4 ß R1, R2 F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
F D E E E E E E E E W
n Downside
q Worst-case instruction latency determines all instructions’ latency
n What about memory operations?
n Each functional unit takes worst-case number of cycles?
25
Solutions: Supporting Precise Exceptions
n How do we support precise exceptions in the presence of
instructions completing out of program order?
n Reorder buffer
n History buffer
n Checkpointing
Func Unit
Instruction Register Reorder
Cache File Func Unit Buffer
(ROB)
Func Unit
Entry 13
Entry 14
Entry 15
29
Reorder Buffer: Independent Operations
n Result first written to ROB on instruction completion
n Result written to register file at commit time
F D E E E E E E E E R W
F D E R W
F D E R W
F D E R W
F D E E E E E E E E R W
F D E R W
F D E R W
30
Reorder Buffer: How to Access?
n A register value can be in the register file, reorder buffer,
(or bypass/forwarding paths)
Func Unit
Value
instruction
Value
(pointer to instruction
ROB entry)
A Register Alias Table (RAT) points to where each register’s current value is (or will be)
Intel Pentium Pro (1995)
Output dependence
r3 ¬ r1 op r2 Write-after-Write
r5 ¬ r3 op r4 (WAW)
r3 ¬ r6 op r7
38
Register Renaming Example (On Your Own)
n Assume
q Register file has a pointer to the reorder buffer entry that
contains or will contain the value, if the register is not valid
q Reorder buffer works as described before
39
Reorder Buffer Example
Register File (RF) Reorder Buffer (ROB)
R0 Entry 0 Oldest
R1 Entry 1 instruction
R2 Entry 2
R3
R4
R5
R6
R7
Tag Entry 8 Youngest
Value Valid?
Value
(pointer to instruction
ROB entry)
Initially: all registers Entry 13
are valid in RF Entry 14
& ROB is empty Entry 15 Entry Valid?
n Disadvantages
q Reorder buffer needs to be accessed to get the results that
are yet to be written to the register file
n CAM or indirection à increased latency and complexity
42
More on State Maintenance & Precise Exceptions
https://www.youtube.com/watch?v=nMfbtzWizDA&list=PL5PHm2jkkXmi5CxxI7b3JCL1TWybTDtKq&index=13
More on State Maintenance & Precise Exceptions
https://www.youtube.com/watch?v=upJPVXEuqIQ&list=PL5Q2soXY2Zi-iBn_sw_B63HtdbTNmphLc&index=18
More on State Maintenance & Precise Exceptions
https://www.youtube.com/watch?v=9yo3yhUijQs&list=PL5Q2soXY2Zi8J58xLKBNFQFHRO3GrXxA9&index=17
Lectures on State Maintenance & Recovery
n Computer Architecture, Spring 2015, Lecture 11
q Precise Exceptions, State Maintenance/Recovery (CMU, Spring 2015)
q https://www.youtube.com/watch?v=nMfbtzWizDA&list=PL5PHm2jkkXmi5CxxI7b3J
CL1TWybTDtKq&index=13
https://www.youtube.com/onurmutlulectures 46
Suggested Readings for the Interested
n Smith and Plezskun, “Implementing Precise Interrupts in
Pipelined Processors,” IEEE Trans on Computers 1988 and
ISCA 1985.
n Backup Slides
47
Digital Design & Computer Arch.
Lecture 14: Precise Exceptions
ETH Zürich
Spring 2023
20 April 2023
Backup Slides
on Precise Exceptions
49
Reorder Buffer Tradeoffs
n Advantages
q Conceptually simple for supporting precise exceptions
q Can eliminate false dependences
n Disadvantages
q Reorder buffer needs to be accessed to get the results that
are yet to be written to the register file
n CAM or indirection à increased latency and complexity
50
Solution II: History Buffer (HB)
n Idea: Update the register file when instruction completes,
but UNDO UPDATES when an exception occurs
51
History Buffer
Func Unit
Instruction Register History
Cache File Func Unit Buffer
Func Unit
n History buffer
q Optimistic register file update
q Update immediately, but log the old value for recovery
q Leads to complexity/delay in logging old values
Func Unit
Instruction Future Arch.
Func Unit ROB File
Cache File
n Disadvantage
q Multiple register files
q Need to copy arch. reg. file to future file on an exception
55
In-Order Pipeline with Future File and Reorder Buffer
n Decode (D): Access future file, allocate entry in ROB, check if instruction
can execute, if so dispatch instruction
n Execute (E): Instructions can complete out-of-order
n Completion (R): Write result to reorder buffer and future file
n Retirement/Commit (W): Check for exceptions; if none, write result to
architectural register file or memory; else, flush pipeline, copy
architectural file to future file, and start from exception handler
n In-order dispatch/execution, out-of-order completion, in-order retirement
Integer add
E
Integer mul
E E E E
FP mul
R W
F D
E E E E E E E E
E E E E E E E E ...
Load/store
56
Can We Reduce the Overhead of Two Register Files?
n Idea: Use indirection, i.e., pointers to data in frontend and
retirement
q Have a single storage that stores register data values
q Keep two register maps (speculative and architectural); also
called register alias tables (RATs)
57
Future Map in Intel Pentium 4
Many modern
processors
are similar:
- MIPS R10K
- Alpha 21264
https://courses.cs.washington.edu/courses/cse378/10au/lectures/Pentium4Arch.pdf 58
Reorder Buffer vs. Future Map Comparison
https://courses.cs.washington.edu/courses/cse378/10au/lectures/Pentium4Arch.pdf 59
Before We Get to Checkpointing …
n Let’s cover what happens on exceptions
n And branch mispredictions
60
Checking for and Handling Exceptions in Pipelining
61
Pipelining Issues: Branch Mispredictions
n A branch misprediction resembles an “exception”
q Except it is not visible to software (i.e., it is microarchitectural)
62
How Fast Is State Recovery?
n Latency of state recovery affects
q Exception service latency
q Interrupt service latency
q Latency to supply the correct data to instructions fetched after
a branch misprediction
n History buffer
q Flush instructions in pipeline younger than the branch
q Undo all instructions after the branch by rewinding from the
tail of the history buffer until the branch & restoring old values
one by one into the register file
n Future file
q Wait until branch is the oldest instruction in the machine
q Copy arch. reg. file to future file
q Flush entire pipeline
64
Can We Do Better?
n Goal: Restore the frontend state (future file) such that the
correct next instruction after the branch can execute right
away after the branch misprediction is resolved
65
Checkpointing
n When a branch is decoded
q Make a copy of the future file/map and associate it with the
branch
66
Checkpointing
n Advantages
q Correct frontend register state available right after checkpoint
restoration à Low state recovery latency
q …
n Disadvantages
q Storage overhead
q Complexity in managing checkpoints
q …
67
Many Modern Processors Use Checkpointing
n MIPS R10000
n Alpha 21264
n Pentium 4
n …
n History buffer
n Checkpointing
n Readings
q Smith and Plezskun, “Implementing Precise Interrupts in Pipelined
Processors,” IEEE Trans on Computers 1988 and ISCA 1985.
q Hwu and Patt, “Checkpoint Repair for Out-of-order Execution
Machines,” ISCA 1987.
69
Registers versus Memory
n So far, we considered mainly registers as part of state
70
Maintaining Speculative Memory State: Stores
n Handling out-of-order completion of memory operations
q UNDOing a memory write more difficult than UNDOing a register
write. Why?
q One idea: Keep store address/data in reorder buffer
n How does a load instruction find its data?
q Store/write buffer: Similar to reorder buffer, but used only for
store instructions
n Program-order list of un-committed store operations
n When store is decoded: Allocate a store buffer entry
n When store address and data become available: Record in store
buffer entry
n When the store is the oldest instruction in the pipeline: Update the
memory address (i.e. cache) with store data
Value or Tag
(i.e., pointer to ROB entry) instruction