## Lecture 20: Branches, OOO

- Today's topics:
  - Branch prediction
  - Out-of-order execution
  - (Also see class notes on pipelining, hazards, etc.)

A 7 or 9 stage pipeline, RR and RW take an entire stage



lw \$1, 8(\$2)

A 7 or 9 stage pipeline, RR and RW take an entire stage



lw \$1, 8(\$2)

#### Problem 4

#### Without bypassing: 4 stalls IF:IF:DE:DE:RR:AL:DM:DM:RW IF: IF :DE:DE:DE:DE: DE :DE:RR:AL:RW

With bypassing: 2 stalls IF:IF:DE:DE:RR:AL:DM:DM:RW IF: IF :DE:DE:DE:DE: RR :AL:RW



- Unpipelined design: the entire circuit takes 10ns to finish Cycle time = 10ns; Clock speed = 1/10ns = 100 MHz CPI = 1 (assuming no stalls) Throughput in instructions per second = #cycles in a second x instructions-per-cycle = 100 M x 1 = 100 M instrs per second = 0.1 BIPS (billion instrs per sec)
- 5-stage pipeline: under ideal conditions, each stage takes 2ns Cycle time = 2ns; Clock speed = 1/2ns = 500 MHz (5x higher) CPI = 1 (continuing to assume no stalls) Throughput = # cycles in a second x instrs-per-cycle = 500 M x 1 = 500 MIPS = 0.5 BIPS Under ideal conditions, a 5-stage pipeline gives a 5x speedup.

- Simple techniques to handle control hazard stalls:
  - for every branch, introduce a stall cycle (note: every 6<sup>th</sup> instruction is a branch!)
  - assume the branch is not taken and start fetching the next instruction – if the branch is taken, need hardware to cancel the effect of the wrong-path instruction
  - fetch the next instruction (branch delay slot) and execute it anyway – if the instruction turns out to be on the correct path, useful work was done – if the instruction turns out to be on the wrong path, hopefully program state is not lost
  - make a smarter guess and fetch instructions from the expected target

## **Control Hazards**



## **Branch Delay Slots**

a. From before



## **Pipeline without Branch Predictor**



## **Pipeline with Branch Predictor**



### **Bimodal Predictor**



- For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) ... sound familiar?
- If (counter >= 2), predict taken, else predict not taken
- The counter attempts to capture the common case for each branch

Indexing functions Multiple branch predictors History, trade-offs

- Perfect pipelining with no hazards → an instruction completes every cycle (total cycles ~ num instructions)
  → speedup = increase in clock speed = num pipeline stages
- With hazards and stalls, some cycles (= stall time) go by during which no instruction completes, and then the stalled instruction completes
- Total cycles = number of instructions + stall cycles

# **Multicycle Instructions**



<sup>© 2003</sup> Elsevier Science (USA). All rights reserved.

- Multiple parallel pipelines each pipeline can have a different number of stages
- Instructions can now complete out of order must make sure that writes to a register happen in the correct order