## 250P: Computer Systems Architecture

# Lecture 5: Advanced Pipelines

Anton Burtsev January, 2019

#### Hazards

- Structural hazards
- Data hazards
- Control hazards

#### **Control Hazards**

- Simple techniques to handle control hazard stalls:
  - for every branch, introduce a stall cycle (note: every 6<sup>th</sup> instruction is a branch on average!)
  - assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instructions
  - predict the next PC and fetch that instr if the prediction is wrong, cancel the effect of the wrong-path instructions
  - Fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost

### Branch delay slot



4

© 2007 Elsevier, Inc. All rights reserved.

### **Multicycle Instructions**



© 2007 Elsevier, Inc. All rights reserved.

#### Effects of Multicycle Instructions

- Potentially multiple writes to the register file in a cycle
- Frequent RAW hazards
- WAW hazards (WAR hazards not possible)
- Imprecise exceptions because of o-o-o instr completion

Note: Can also increase the "width" of the processor: handle multiple instructions at the same time: for example, fetch two instructions, read registers for both, execute both, etc.

#### **Precise Exceptions**

- On an exception:
  - $\geq$  must save PC of instruction where program must resume
  - All instructions after that PC that might be in the pipeline must be converted to NOPs (other instructions continue to execute and may raise exceptions of their own)
  - temporary program state not in memory (in other words, registers) has to be stored in memory
  - potential problems if a later instruction has already modified memory or registers
- A processor that fulfils all the above conditions is said to provide precise exceptions (useful for debugging and of course, correctness)

### Dealing with these Effects

- Multiple writes to the register file: increase the number of ports, stall one of the writers during ID, stall one of the writers during WB (the stall will propagate)
- WAW hazards: detect the hazard during ID and stall the later instruction
- Imprecise exceptions: buffer the results if they complete early or save more pipeline state so that you can return to exactly the same state that you left at

### **Slowdowns from Stalls**

- Perfect pipelining with no hazards → an instruction completes every cycle (total cycles ~ num instructions)
  → speedup = increase in clock speed = num pipeline stages
- With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes
- Total cycles = number of instructions + stall cycles
- Slowdown because of stalls = 1/ (1 + stall cycles per instr)

## **Pipelining Limits**



Gap between indep instrs: T + Tovh Gap between dep instrs: T + Tovh



Gap between indep instrs: T/3 + Tovh Gap between dep instrs: T + 3Tovh



Gap between indep instrs: T/6 + T<sub>ovh</sub> Gap between dep instrs: T + 6T<sub>ovh</sub>

Assume that there is a dependence where the final result of the first instruction is required before starting the second instruction

LD

ST

- For the following code sequence, show how the instrs flow through the pipeline:
  - ADD R3 ← R1, R2 R7 ← 8[R6]  $R9 \rightarrow 4[R8]$ Time (in clock cycles) BEZ R4, [R5] CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 Reg ALU IM DM Reg Reg Reg IM DM Reg ALU IM DM Reg ALU IM



- For the following code sequence, show how the instrs flow through the pipeline:
  - ADD R3  $\leftarrow$  R1, R2 LD R7  $\leftarrow$  8[R6] ST R9  $\rightarrow$  4[R8] BEZ R4, [R5]



#### **Pipeline Summary**

RR ALU DM RW ADD R3  $\leftarrow$  R1, R2 Rd R1, R2 R1+R2 Wr R3 --BEZ R1, [R5] Rd R1, R5 Compare, Set PC LD R6  $\leftarrow$  8[R3] Get data Wr R6 Rd R3 R3+8 ST R6  $\rightarrow$  8[R3] Rd R3,R6 R3+8 Wr data

Convert this C code into equivalent RISC assembly instructions

a[i] = b[i] + c[i];

 Convert this C code into equivalent RISC assembly instructions

a[i] = b[i] + c[i];

LD R2, [R1] # R1 has the address for variable i MUL R3, R2, 8 # the offset from the start of the array ADD R7, R3, R4 # R4 has the address of a[0] ADD R8, R3, R5 # R5 has the address of b[0] ADD R9, R3, R6 # R6 has the address of c[0] LD R10, [R8] # Bringing b[i] LD R11, [R9] # Bringing c[i] ADD R12, R11, R10 # Sum is in R12 ST R12, [R7] # Putting result in a[i]

 Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9



 Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9



#### **Bypassing: 5-Stage Pipeline**



Source: H&P textbook <sup>18</sup>

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
Identify the input latch for each input operand.



Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
Identify the input latch for each input operand.



#### Thank you!