Advanced computer architecture

Esame Advanced computer architecture

Facoltà Ingegneria dell'informazione

Dal corso del Prof. Santambrogio Marco

Università Politecnico di Milano

Appunto

4,5 / 5 (2)

Scarica

Appunti di Advanced computer architecture basati su appunti personali del publisher presi alle lezioni del prof. Santambrogio, dell’università degli Studi del Politecnico di Milano - Polimi, facoltà di ingegneria dell'informazione. Scarica il file in formato PDF!

…continua

Anteprima

Vedrai una selezione di 3 pagine su 10

Anteprima di 3 pagg. su 10.
Scarica il documento per vederlo tutto.

Scarica

Disdici quando
vuoi

Acquista con carta
o PayPal

Scarica i documenti
tutte le volte che vuoi

Estratto del documento

ISA X XTechnology X XCPI = CPI + Stalls hazardidealIncreasing length of pipeline increases impact of hazards and Pipelining helps instruction bandwidth, notlatency.MIPS- RISC (Reduced Instruction Set Computer): executing only simple instructions- LOAD/STORE Architecture: operands into registers and specific instructions to load/store them- Pipeline Architecture: overlapping of the execution of multiple instructions in multiple stages- Instruction and Data Memory are separated- 2 Read Port and 1 Write Port- Inter-pipeline register: IF/ID, ID/EX, EX/MEM, MEM/WB- First half clock writes register, second half clock reads registerPC (Program Counter): address of the next instructionIR (Instruction Register): instruction currently being executedISA (Instruction Set Architecture): operations, instruction format, data types ecc.Instruction Instruction Instruction Execution Memory Write Back LatencyFetch (IF) Decode (ID) (E) Access (MA) (WB)ALU 2 1 2 0 1 6Load 2 1 2 2 1 8Store 2 1 2 2 0

7Branch 2 1 2 0 0 5Jump 2 0 0 0 0 2

Structural Hazard

Structural Hazard: use same resource from different instruction simultaneously (impossible in MIPS)

Data Hazard

Data Hazard: use a result before it is ready

RAW (Read After Write): next instruction reads before previous instruction writes

WAW (Write After Write): next instruction writes before previous instruction writes (impossible in MIPS)

WAR (Write After Read): next instruction writes before previous instruction reads (impossible in MIPS)

Compilation Solutions: insert “nop” or schedule the instructions to avoid conflicts (Scheduling)

Hardware Solutions: insert stalls or forwarding/bypassing (use a temporary result [inter-pipeline registers]to be used as soon as possible)

Forwarding paths: EX/EX, MEM/EX (Load/Ex), EX/ID (Ex/Branch), MEM/ID (Load/Branch), MEM/MEM(Load/Store) (it is always possible to solve without introducing stalls, except for the load/use, ex/branchand load/branch hazards where it is necessary to add one or

more stalls and use forwarding path)

Control/Branch Hazard

Control Hazard: attempt to decide on the next instruction to fetch before the branch condition is evaluated

The branch is taken only if the condition is satisfied.
The branch target address (PC+4+offset) (BTA) is stored in the Program Counter (PC) instead of PC+4 if branch is taken
Compare registers to derive Branch Outcome (branch taken or branch not taken)
Branch Outcome and Branch Target Address are ready at the end of the EX-stage (3rd stage)
Conditional branches are solved when PC is updated at the end of the ME stage (4th stage)

Solutions:

Assume branch is not taken and flush instructions if branch is taken
Insert stall until resolution at the end of 4 stage
Insert stall until resolution at the end of 3 stage and then use forwarding EX/IF

Early evaluation: MIPS processor compares registers, computes branch target address and updates PC during ID stage.

Solutions:

Assume branch is not taken and flush

instructions if branch is taken and insert stall until resolution at the end of 2 stage

In case of add followed by a branch, we need to introduce 1 stall before ID stage of branch to enable the forwarding EX/ID.

In case of load followed by a branch, we need to introduce 2 stalls before ID stage of branch to enable the forwarding MEM/ID.

Static Branch Prediction: the predictions for each branch are fixed at compile time during execution using compiler

Branch always not taken: Assume branch is not taken and flush instructions if branch is taken
Branch always taken: Assume branch is taken and flush instructions if branch is not taken (problem: compute BTA as soon as possible)
Backward Taken Forward Not Taken: the predictions are based on the directions of branches, in case of misprediction we have 1 flush
Profile-Driven Prediction: profiling information from earlier runs
Delayed Branch: schedule an independent instruction in branch delay slot that is executed whether or not branch

is taken From before: slot is scheduled with an independent instruction from before the branch

From target: slot is scheduled from the target of the branch (usually the target instruction will need to be copied because it can be reached by another path)

From fall-through: slot is scheduled from the not-taken fall-through path

From after: slot is scheduled with an independent instruction from after the branch

Dynamic Branch Prediction: the predictions for each branch can change at run time during execution using HW and the past behavior to predict the future

Hardware blocks in IF stage:

Branch Outcome Predictor: predict the outcome of branch (taken or not taken)
Branch Target Predictor/Buffer: predict the BTA in case of taken branch

Branch History Table (BHT): use only the recent behavior of a single branch

a table that contains 1 or 2 bits for each branch (taken/not taken)
Table indexed by the lower portion k-bit of the address of the branch. Problem: same index

fordifferent branches

In case of misprediction, the prediction is changed
Shortcoming: in 1-bit BHT the prediction must miss once before it is changed. In some cases(loops) usually, is better to use a 2-Bit BHT because the prediction must miss twice before it is changed, so at the last loop iteration, we do not need to change the prediction.
Studies on n-bit predictors have shown that 2-bit predictors behave almost as well

Correlating Branch Predictors: use m recent behavior of multiple branches

(m, n) correlating predictor records last m branches (global branch history) to choose from 2mBHTs, each of which is a n-bit predictor

2 Level Adaptive Branch/GAs Predictor

First level (global history): records last m branches (Branch History Register BHR)
Second level (local history): table of 2 bits (Pattern History Table PHT)
BHR is used to index PHT
GShare Predictor: XOR of BHR and lower bits of address of branch

Exception/Interrupt: event that requests

Attention of processor:

Asynchronous (external event): it doesn't depend on the instruction being executed and can be handled after the completion of instruction
Synchronous (internal event): it depends on the instruction being executed
When the processor decides to process an interrupt:
- It stops the current program at Instruction I; completing all the instruction up to Ii-1
- It saves PC of I in EPC
- It disables interrupts and transfers control to an interrupt handler (status register PSR) in kernel mode
- It restores PC, enables interrupts and user mode when interrupt was handled (jump instruction RFE)
- In synchronous event, in general, the instruction cannot be completed and needs to be restarted after the exception has been handled
Precise Interrupt: there is a single instruction for which all instructions before having committed their state and no following instructions have modified any state. So, we can restart execution from this instruction and get the right

In MIPS when exceptions occur out-of-order, it doesn't handle immediately the exception, but it will handle the exception in the MEM stage and then restart execution from the latter
Caches

Caches keep the contents and fetch blocks of data around recently accessed locations
If the requested data is not found in the cache (miss) so we need to access to lower level of memory (stall CPU, require data from main memory, copy data in cache, repeat cache access [hit])

Dependence

If two instructions are dependent, they cannot execute in parallel: they must be executed in order or only partially overlapped.
Dependences are property of code while hazards are property of the pipeline
Name Dependence: 2 instructions use the same register or memory location but there is no flow of data between the instructions (no true data dependence) so, the name of register or location could be changed (static or dynamic register renaming)
Anti-dependence (WAR hazard)
Output-dependence (WAW hazard)

Hazard

True Data Dependence (RAW Hazard): 2 instructions use the same register or memory location and there is flow of data between the instructions

Multi-cycle

Execution and Memory Stage might require multiple cycles
Issue in-order or out-of-order
Functional units (FU) pipelined or not pipelined: multiple instructions can be executed simultaneously in the same FU in different stage
Commit in-order or out-of-order: in the case of commit in-order the next instructions must wait (stall) for the previous ones to complete
Single Write Port: in the case multiple instructions are both in WB stage, one of them will stall because one instruction at a time can write register
→Commit out-of-order possible WAR and WAW
Decode stage checks RAW and Issue stage checks WAW, WAR and available FU
Issue is a queue
→Multiple issue at each clock CPI < 1 and 2N read ports and N write ports ideal

Scheduling

Dynamic Scheduling: HW reorder instructions execution to

reduce stalls, maintaining data flow and exception behavior (superscalar processor). Execution begins as soon as operands are available out-of-order

o Handle some cases where dependences are unknown at compile time

o Simplify the compiler complexity

o Allow compiled code to run efficiently on a different pipeline

o A significant increase in hardware complexity

o Increased power consumption

▪ Static Scheduling: depend on the compiler for identifying potential parallelism and dependencies are avoided by code reordering (VLIW processor excepts dependency-free code)

o Unpredictable branches

o Code size explosion

o Compiler complexity

o Low power

o Simple HW Scoreboard

▪ Out-of-order execution/commit divides ID Stage:

o Issue Stage: decode instructions and check for structural hazard (FUs and WAW)

o Read Operands (RR): Wait until no data hazard (RAW) then read operands

All instructions pass through the issue stage in-order, but they can be stalled or bypass each other in read operands stage

out-of-order execution/commit

No forwarding and no register renaming

Scoreboard: a centralized logic block that manages dynamic scheduler, controls status of registers and Fus and tracks dependencies/state of operations

Stages: Issue, Read Operands, Execution, Write Back

WAR: Stall Write Back until registers have been read

WAW: Stall Issue of new instructions until the other instruction completes

Issue:

If a FU for instruction is free and no other active instructions has the same destination register, the scoreboard issues the instruction to the FU
If the FU is not available or there is another active instruction with the same destination register, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared

Dettagli

Publisher

simone.sorrenti.21

A.A. 2021-2022

10 pagine

1 download

SSD Scienze matematiche e informatiche INF/01 Informatica

I contenuti di questa pagina costituiscono rielaborazioni personali del Publisher simone.sorrenti.21 di informazioni apprese con la frequenza delle lezioni di Advanced computer architecture e studio autonomo di eventuali libri di riferimento in preparazione dell'esame finale o della tesi. Non devono intendersi come materiale ufficiale dell'università Politecnico di Milano o del prof Santambrogio Marco.