Consider a 5–stage pipelined processor with stages – Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MA) and Write Back (WB). All stages except Memory Access takes 1 clock cycle each for all instructions. Memory access takes 3 clock cycles for instruction LOAD. How many clock cycles are needed to execute the following sequence of instructions with optimization?
I0 : LOAD R0, 3(R1) ; R0 ← [3 + [R1]]
I1 : ADD R2, R0, R1 ; R2 ← R0 + R1
I2 : LOAD R3, 4(R4) ; R3 ← [4 + [R4]]
I3 : SUB R5, R3, R4 ; R5 ← R3 – R4
ans is 14 bt how?