The Optimal Logic Depth Per Pipeline Stage is 6-8 FO4 Inverter Delays Q1. The execution time of a program depends on the time taken to execute the chain of dependences that form the critical path. This is roughly the sum of all dependent instruction latencies and mispredict penalties. In going from a clock of 16 FO4 to 8 FO4, the latency for an add and branch mispredict detection (in ps) goes up (because of the increased pipelining overhead). Yet, the total number of seconds to execute the program go down. Why? Q2. What are the weaknesses in the paper?