CS/EE 3810

Assignment 7

Due: 10:45am, Tue Mar 26th, 2024

Note: Make reasonable assumptions where necessary and clearly state them. Feel free to discuss problems with classmates, but the only written material that you may consult while writing your solutions are the textbook and lecture slides/videos. Solutions should be uploaded on Gradescope. Show your solution steps so you receive partial credit for incorrect answers and we know you have understood the material. Don't just show us the final answer.

Every homework has an automatic penalty-free 1.5 day extension to accommodate any covid/family-related disruptions. In other words, try to finish your homework by Tuesday 10:45am to keep up with the lecture content, but if necessary, you may take until Wednesday 11:59pm.

Consider an unpipelined or single-stage processor design like the one discussed in slide 19 of lecture 15. At the start of a cycle, a new instruction enters the processor and is processed completely within a single cycle. It takes 7,000 ps to navigate all the circuits in a cycle (including latch overheads). Therefore, for this design to work, the cycle time has to be at least 7,000 pico seconds.
1. What is the clock speed of this processor? (5 points)
2. What is the CPI of this processor, assuming that every load/store instruction finds its instruction/data in the instruction or data cache? (5 points)
3. What is the throughput of this processor (in billion instructions per second)? (10 points)
The processor in Q1 above is converted into a 14-stage pipeline. The slowest of these 14 stages takes 600 ps (including latch overheads).
1. What is the clock speed of this processor? (5 points)
2. What is the CPI of this processor, assuming that every load/store instruction finds its instruction/data in the instruction or data cache, and there are no stalls from data/control/structural hazards? (5 points)
3. What is the throughput of this processor (in billion instructions per second)? (10 points)
4. What is the speedup, relative to the unpipelined processor in Q1? Why is the speedup less than 14X? (10 points)
Show how the following three consecutive instructions move through each stage of the five stage pipeline, similar to the example on slide 10 of lecture 18. This pipeline does not support any bypassing. Make sure the decode stage does not advance an instruction through the pipeline unless all data dependences are correctly resolved. (25 points)
I1: add $s1, $s2, $s3
I2: lw $s4, 4($s1)
I3: add $s5, $s4, $s1
Show how the same three instructions move through each stage of the five stage pipeline, similar to the example on slide 12 of lecture 18. This pipeline does support bypassing. Make sure the decode stage does not advance an instruction through the pipeline unless all data dependences are correctly resolved. You don't need to show the latch involved in every bypass (but feel free to ponder this question for your own understanding). (25 points)
I1: add $s1, $s2, $s3
I2: lw $s4, 4($s1)
I3: add $s5, $s4, $s1