Performance metrics
Response time, execution time, and latency
The performance can be measured by the difference time_end – time_start.
Throughput and bandwidth
This is defined as the total amount of work completed in a given time.
MIPS
The MIPS metric depends on the Instruction Set Architecture (ISA) and compiler, making it unsuitable for comparing architectures with different ISAs.
MFLOPS
1 1ℎℎ ≥ = 6 10
Factors affecting performance
- Algorithm
- Data set
- Compiler
- Instruction set
- Operating system
- Clock rate
- Memory and I/O system
Amdahl’s Law
Performance improvement depends on the enhancement E, which accelerates a fraction F of the task by a factor S. 1 1 = = (1−)ℎ(1−) + ℎℎ ℎ =
Factors affecting IC CPI CC
- Program: Affects IC, CPI, and CC
- Compiler: Impacts IC and CPI
- ISA: Influences CPI and CC
- Technology: Affects CC
CPI = CPI + Stalls hazardideal
Pipelining
Increasing the length of the pipeline increases the impact of hazards. Pipelining helps with instruction bandwidth, not latency.
MIPS architecture
- RISC (Reduced Instruction Set Computer): Executes only simple instructions
- LOAD/STORE Architecture: Operands into registers with specific instructions to load/store them
- Pipeline Architecture: Overlaps the execution of multiple instructions in multiple stages
- Instruction and data memory are separated
- 2 Read Ports and 1 Write Port
- Inter-pipeline register stages: IF/ID, ID/EX, EX/MEM, MEM/WB
- First half clock writes register, second half clock reads register
Registers and architecture details
PC (Program Counter): Holds the address of the next instruction.
IR (Instruction Register): Contains the instruction currently being executed.
ISA (Instruction Set Architecture): Defines operations, instruction format, data types, etc.
Instruction stages
| Instruction | Fetch (IF) | Decode (ID) | Execution (E) | Memory Access (MA) | Write Back (WB) | Latency |
|---|---|---|---|---|---|---|
| ALU | 2 | 1 | 2 | 0 | 1 | 6 |
| Load | 2 | 1 | 2 | 2 | 1 | 8 |
| Store | 2 | 1 | 2 | 2 | 0 | 7 |
| Branch | 2 | 1 | 2 | 0 | 0 | 5 |
| Jump | 2 | 0 | 0 | 0 | 0 | 2 |
Hazards in pipelining
Structural hazard
Occurs when the same resource is used by different instructions simultaneously, which is impossible in MIPS.
Data hazard
Occurs when a result is used before it is ready:
- RAW (Read After Write): Next instruction reads before the previous instruction writes.
- WAW (Write After Write): Next instruction writes before the previous instruction writes (impossible in MIPS).
- WAR (Write After Read): Next instruction writes before the previous instruction reads (impossible in MIPS).
Solutions to data hazards
- Compilation solutions: Insert "nop" or schedule the instructions to avoid conflicts (scheduling).
- Hardware solutions: Insert stalls or forwarding/bypassing (use a temporary result from inter-pipeline registers as soon as possible).
Forwarding paths
Examples of forwarding paths include EX/EX, MEM/EX (Load/Ex), EX/ID (Ex/Branch), MEM/ID (Load/Branch), MEM/MEM (Load/Store). It is possible to solve data hazards without introducing stalls, except for load/use, ex/branch, and load/branch hazards where it is necessary to add one or more stalls and use forwarding paths.
Control/branch hazard
Occurs when trying to decide on the next instruction to fetch before the branch condition is evaluated.
- The branch is taken only if the condition is satisfied.
- The branch target address (PC+4+offset) (BTA) is stored in the Program Counter (PC) instead of PC+4 if the branch is taken.
- Compare registers to derive the Branch Outcome (branch taken or branch not taken).
- Branch Outcome and Branch Target Address are ready at the end of the EX-stage (3rd stage).
- Conditional branches are resolved when PC is updated at the end of the ME stage (4th stage).
Solutions to control hazards
- Assume branch is not taken and flush instructions if branch is taken.
- Insert stall until resolution at the end of the 4th stage.
- Insert stall until resolution at the end of the 3rd stage and then use forwarding EX/IF.
Early evaluation in MIPS
The MIPS processor compares registers, computes the branch target address, and updates PC during the ID stage.
Branch prediction techniques
- Static branch prediction: Predictions for each branch are fixed at compile time.
- Branch always not taken: Assume branch is not taken and flush instructions if taken.
- Branch always taken: Assume branch is taken and flush instructions if not taken (problem: compute BTA as soon as possible).
- Backward Taken Forward Not Taken: Predictions are based on branch directions; in case of misprediction, there is 1 flush.
- Profile-Driven Prediction: Uses profiling information from earlier runs.
- Delayed Branch: Schedule an independent instruction in the branch delay slot that is executed whether or not the branch is taken:
- From before: Slot is scheduled with an independent instruction from before the branch.
- From target: Slot is scheduled from the target of the branch (usually, the target instruction will need to be copied because it can be reached by another path).
- From fall-through: Slot is scheduled from the not-taken fall-through path.
In the case of an add followed by a branch, introduce 1 stall before the ID stage of the branch to enable the forwarding EX/ID. In the case of a load followed by a branch, introduce 2 stalls before the ID stage of the branch to enable the forwarding MEM/ID.
-
Advanced computer architectures notes
-
Appunti corso Advanced structural design
-
Formulario per l'esame di Advanced Computer Architectures
-
Appunti per l'esame di Advanced Computer Architectures - parte 2