vuoi
o PayPal
tutte le volte che vuoi
ISA X XTechnology X XCPI = CPI + Stalls hazardidealIncreasing length of pipeline increases impact of hazards and Pipelining helps instruction bandwidth, notlatency.MIPS- RISC (Reduced Instruction Set Computer): executing only simple instructions- LOAD/STORE Architecture: operands into registers and specific instructions to load/store them- Pipeline Architecture: overlapping of the execution of multiple instructions in multiple stages- Instruction and Data Memory are separated- 2 Read Port and 1 Write Port- Inter-pipeline register: IF/ID, ID/EX, EX/MEM, MEM/WB- First half clock writes register, second half clock reads registerPC (Program Counter): address of the next instructionIR (Instruction Register): instruction currently being executedISA (Instruction Set Architecture): operations, instruction format, data types ecc.Instruction Instruction Instruction Execution Memory Write Back LatencyFetch (IF) Decode (ID) (E) Access (MA) (WB)ALU 2 1 2 0 1 6Load 2 1 2 2 1 8Store 2 1 2 2 0
7Branch 2 1 2 0 0 5Jump 2 0 0 0 0 2
Structural Hazard
Structural Hazard: use same resource from different instruction simultaneously (impossible in MIPS)
Data Hazard
Data Hazard: use a result before it is ready
RAW (Read After Write): next instruction reads before previous instruction writes
WAW (Write After Write): next instruction writes before previous instruction writes (impossible in MIPS)
WAR (Write After Read): next instruction writes before previous instruction reads (impossible in MIPS)
Compilation Solutions: insert “nop” or schedule the instructions to avoid conflicts (Scheduling)
Hardware Solutions: insert stalls or forwarding/bypassing (use a temporary result [inter-pipeline registers]to be used as soon as possible)
Forwarding paths: EX/EX, MEM/EX (Load/Ex), EX/ID (Ex/Branch), MEM/ID (Load/Branch), MEM/MEM(Load/Store) (it is always possible to solve without introducing stalls, except for the load/use, ex/branchand load/branch hazards where it is necessary to add one or
more stalls and use forwarding path)
Control/Branch Hazard
Control Hazard: attempt to decide on the next instruction to fetch before the branch condition is evaluated
- The branch is taken only if the condition is satisfied.
- The branch target address (PC+4+offset) (BTA) is stored in the Program Counter (PC) instead of PC+4 if branch is taken
- Compare registers to derive Branch Outcome (branch taken or branch not taken)
- Branch Outcome and Branch Target Address are ready at the end of the EX-stage (3rd stage)
- Conditional branches are solved when PC is updated at the end of the ME stage (4th stage)
Solutions:
- Assume branch is not taken and flush instructions if branch is taken
- Insert stall until resolution at the end of 4 stage
- Insert stall until resolution at the end of 3 stage and then use forwarding EX/IF
Early evaluation: MIPS processor compares registers, computes branch target address and updates PC during ID stage.
Solutions:
- Assume branch is not taken and flush
- Branch always not taken: Assume branch is not taken and flush instructions if branch is taken
- Branch always taken: Assume branch is taken and flush instructions if branch is not taken (problem: compute BTA as soon as possible)
- Backward Taken Forward Not Taken: the predictions are based on the directions of branches, in case of misprediction we have 1 flush
- Profile-Driven Prediction: profiling information from earlier runs
- Delayed Branch: schedule an independent instruction in branch delay slot that is executed whether or not branch
- Branch Outcome Predictor: predict the outcome of branch (taken or not taken)
- Branch Target Predictor/Buffer: predict the BTA in case of taken branch
- a table that contains 1 or 2 bits for each branch (taken/not taken)
- Table indexed by the lower portion k-bit of the address of the branch. Problem: same index
- In case of misprediction, the prediction is changed
- Shortcoming: in 1-bit BHT the prediction must miss once before it is changed. In some cases(loops) usually, is better to use a 2-Bit BHT because the prediction must miss twice before it is changed, so at the last loop iteration, we do not need to change the prediction.
- Studies on n-bit predictors have shown that 2-bit predictors behave almost as well
- (m, n) correlating predictor records last m branches (global branch history) to choose from 2mBHTs, each of which is a n-bit predictor
- First level (global history): records last m branches (Branch History Register BHR)
- Second level (local history): table of 2 bits (Pattern History Table PHT)
- BHR is used to index PHT
- GShare Predictor: XOR of BHR and lower bits of address of branch
- Asynchronous (external event): it doesn't depend on the instruction being executed and can be handled after the completion of instruction
- Synchronous (internal event): it depends on the instruction being executed
- When the processor decides to process an interrupt:
- It stops the current program at Instruction I; completing all the instruction up to Ii-1
- It saves PC of I in EPC
- It disables interrupts and transfers control to an interrupt handler (status register PSR) in kernel mode
- It restores PC, enables interrupts and user mode when interrupt was handled (jump instruction RFE)
- In synchronous event, in general, the instruction cannot be completed and needs to be restarted after the exception has been handled
- Precise Interrupt: there is a single instruction for which all instructions before having committed their state and no following instructions have modified any state. So, we can restart execution from this instruction and get the right
- In MIPS when exceptions occur out-of-order, it doesn't handle immediately the exception, but it will handle the exception in the MEM stage and then restart execution from the latter
- Caches
- Caches keep the contents and fetch blocks of data around recently accessed locations
- If the requested data is not found in the cache (miss) so we need to access to lower level of memory (stall CPU, require data from main memory, copy data in cache, repeat cache access [hit])
- Dependence
- If two instructions are dependent, they cannot execute in parallel: they must be executed in order or only partially overlapped.
- Dependences are property of code while hazards are property of the pipeline
- Name Dependence: 2 instructions use the same register or memory location but there is no flow of data between the instructions (no true data dependence) so, the name of register or location could be changed (static or dynamic register renaming)
- Anti-dependence (WAR hazard)
- Output-dependence (WAW hazard)
- True Data Dependence (RAW Hazard): 2 instructions use the same register or memory location and there is flow of data between the instructions
- Execution and Memory Stage might require multiple cycles
- Issue in-order or out-of-order
- Functional units (FU) pipelined or not pipelined: multiple instructions can be executed simultaneously in the same FU in different stage
- Commit in-order or out-of-order: in the case of commit in-order the next instructions must wait (stall) for the previous ones to complete
- Single Write Port: in the case multiple instructions are both in WB stage, one of them will stall because one instruction at a time can write register
- →Commit out-of-order possible WAR and WAW
- Decode stage checks RAW and Issue stage checks WAW, WAR and available FU
- Issue is a queue
- →Multiple issue at each clock CPI < 1 and 2N read ports and N write ports ideal
- Dynamic Scheduling: HW reorder instructions execution to
instructions if branch is taken and insert stall until resolution at the end of 2 stage
In case of add followed by a branch, we need to introduce 1 stall before ID stage of branch to enable the forwarding EX/ID.
In case of load followed by a branch, we need to introduce 2 stalls before ID stage of branch to enable the forwarding MEM/ID.
Static Branch Prediction: the predictions for each branch are fixed at compile time during execution using compiler
is taken From before: slot is scheduled with an independent instruction from before the branch
From target: slot is scheduled from the target of the branch (usually the target instruction will need to be copied because it can be reached by another path)
From fall-through: slot is scheduled from the not-taken fall-through path
From after: slot is scheduled with an independent instruction from after the branch
Dynamic Branch Prediction: the predictions for each branch can change at run time during execution using HW and the past behavior to predict the future
Hardware blocks in IF stage:
Branch History Table (BHT): use only the recent behavior of a single branch
fordifferent branches
Correlating Branch Predictors: use m recent behavior of multiple branches
2 Level Adaptive Branch/GAs Predictor
Exception/Interrupt: event that requests
Attention of processor:
Hazard
Multi-cycle
Scheduling
reduce stalls, maintaining data flow and exception behavior (superscalar processor). Execution begins as soon as operands are available out-of-order
o Handle some cases where dependences are unknown at compile time
o Simplify the compiler complexity
o Allow compiled code to run efficiently on a different pipeline
o A significant increase in hardware complexity
o Increased power consumption
▪ Static Scheduling: depend on the compiler for identifying potential parallelism and dependencies are avoided by code reordering (VLIW processor excepts dependency-free code)
o Unpredictable branches
o Code size explosion
o Compiler complexity
o Low power
o Simple HW Scoreboard
▪ Out-of-order execution/commit divides ID Stage:
o Issue Stage: decode instructions and check for structural hazard (FUs and WAW)
o Read Operands (RR): Wait until no data hazard (RAW) then read operands
All instructions pass through the issue stage in-order, but they can be stalled or bypass each other in read operands stage
out-of-order execution/commit
No forwarding and no register renaming
Scoreboard: a centralized logic block that manages dynamic scheduler, controls status of registers and Fus and tracks dependencies/state of operations
Stages: Issue, Read Operands, Execution, Write Back
WAR: Stall Write Back until registers have been read
WAW: Stall Issue of new instructions until the other instruction completes
Issue:
- If a FU for instruction is free and no other active instructions has the same destination register, the scoreboard issues the instruction to the FU
- If the FU is not available or there is another active instruction with the same destination register, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared