in CO and Architecture edited by
34,196 views
52 votes
52 votes

Consider a $4$ stage pipeline processor. The number of cycles needed by the four instructions $I1, I2, I3, I4$ in stages $S1, S2, S3, S4$ is shown below:

$$\begin{array}{|c|c|c|c|c|} \hline \textbf{} & \textbf {S1} &\textbf {S2} & \textbf {S3} &  \textbf{S4 } \\\hline \textbf{I1}& 2 & 1  & 1 & 1 \\\hline \textbf{I2} & 1 & 3 & 2  & 2\\\hline  \textbf{I3}& 2 & 1  & 1 & 3 \\\hline \textbf{I4} & 1 & 2 & 2  & 2 \\\hline  \end{array}$$

 

What is the number of cycles needed to execute the following loop?

For (i=1 to 2) {I1; I2; I3; I4;}

  1. $16$
  2. $23$
  3. $28$
  4. $30$
in CO and Architecture edited by
34.2k views

4 Comments

nice explanation @rish-18

0
0
  S1 S2 S3 S4
I1  2 3 4 5
I2 2 + 1= 3 max(3,3)+3 = 6 max(4,6)+2 = 8 max(5,8) + 2 = 10
I3 3+ 2 = 5 max(5,6)+1 = 7 max(7,8)+1 = 9 max(9,10)+3 = 13
I4 5+1 = 6  max(6,7)+2 = 9 11 max(11,13)+2 = 15

 

for 2 iteration

I1  6+2 = 8 max ( 8,9) +1 = 10 max(10,11)+1 = 12 max(12,15)+1 = 16
I2 8 + 1= 9 max(9,10)+3 = 13 max(12,13)+2 = 15 max(15,16) + 2 = 18
I3 9+ 2 = 11 max(11,13)+1 = 14 max(14,15)+1 = 16 max(16,18)+3 = 21
I4 11+1 = 12  max(12,14)+2 = 16 18 max(18,21)+2 = 23

 

In every row (except first ) the logic will be check  for diagonal entries , find out which is maximum and add time

so ans is 23.

7
7
edited by

In case it helps anybody, consider the following analogy –

Imagine you were going to a polling station to cast your vote. Inside the polling station, one person is assigned the job of checking your voter ID card, and registering your name. Probably, on the adjacent bench, a person puts the ink-mark on your index finger. There might be many more such people in charge (each doing a certain work), before you ultimately stand before the EVM to cast your vote. All these people (and the EVM) can be considered as stages of a (voting) pipeline.

Now, let say you were to go to the polling station at 3pm in a Sunday afternoon. Chances are you would encounter a sea of people there. Now, imagine what would happen if the people in charge refused any queues inside the polling station. In that case, there would a treacherously long queue of agitated people waiting outside the entry gate. Instead, if the people in charge did allow queues inside the polling station, then the queue outside it would be shorter, and people would be less agitated seeing the crowd move more quickly (through the stages of the voting pipeline).

Drawing comparison from the above analogy, we can imagine what would happen when the stages of the hardware pipeline do not have buffers. The processes/instructions start getting blocked just to enter the pipeline! When multiple stall cycles from previous instructions are involved, the later instructions can’t even enter the pipeline! Given that pipelines are used by a large number of instructions of various kinds, and for a long period of time, such kind of instructions with multiple stall cycles are indeed quite frequently encountered. If this happens, then what remains of the benefits that a pipeline promises? To keep everything moving, a practical hardware pipeline must implement stage buffers. Yes, this does increases the delay in the system. Yes, processes/instructions can still get blocked before entering the pipeline. But, it’s still far better than the former case of not using stage buffers!

0
0

10 Answers

69 votes
69 votes
Best answer

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling $23$ cycles are required for execution. Therefore, correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

$$\tiny \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &C_1&C_2&C_3&C_4&C_5&C_6&C_7&C_8&C_9&C_{10}&C_{11}&C_{12}&C_{13}&C_{14}&C_{15}&C_{16}&C_{17}&C_{18}&C_{19}&C_{20}
&C_{21}&C_{22}&C_{23}\\\hline
\bf{I_1}&S_1&S_1&S_2&S_3&S_4\\\hline
\bf{I_2}&&&S_1&S_2&S_2&S_2&S_3& S_3&S_4&S_4\\\hline
\bf{I_3}&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4}&&&&&&S_1&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline
\bf{I_1}&&&&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4\\\hline
\bf{I_2}&&&&&&&&&S_1&\color{red}{-}&S_2&S_2&S_2&S_3& S_3&\color{red}{-}&S_4&S_4\\\hline
\bf{I_3}&&&&&&&&&&S_1&S_1&\color{red}{-}&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4} &&&&&&&&&&&&S_1&\color{red}{-}&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline

\end{array} $$

edited by
by

4 Comments

@Vickyrix

in the second iteration ..in I2  the s2 can get started at c13 itself because the s2 of I1 is completed by then. So this way the answer we are getting is 23 cycles only and not 25.

anything wrong?

0
0
edited by

@Pranavpurkar how do we know when to assume multi stage buffer or single stage buffer?

0
0

@JAINchiNMay

i think it will be specified in the question.

1
1
22 votes
22 votes
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations

1 comment

how is this a loop level parallelism question?
0
0
18 votes
18 votes

When using the following timeline we get 23 cycles:

 

In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).  

Hence , option (b) is true.

reshown by

4 Comments

Ok,  in above diag,  in 6th cycle we are doing it introducing i4 while i3 hasn't  gone to next stage

In this ques if we take that approach,  we will get diff ans i think i4 will be introduced in 7th clock. 

  but why cant we use that stage if its free?cant we just use intermediate buffers to store prev instruction output. We will require those buffers only at end of that stage for new instruction. 

Ok so i hv read discussion regarding this doubt in below ans.. So in which cases we can use that particular stage.. As we are using here

2
2
Process-2 can't get into slot-9 untill it completes first time!
0
0

@Arjun sir,

Why loop enrolling is not used in https://gateoverflow.in/3690/gate2004-it-47

1
1
8 votes
8 votes

Concept for solving this question:

Stage Si is allocated to instruction Ii iff

1. Eligibility: Ii must have completed Si-1 ... X time

2. Availability: Ii-1 must have released Si ... Y time

Time of allocation =  Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

Answer : B

2 Comments

In second iteration I2 completes in 20th cycle and not 18th cycle. So we have to go with 30 only as 25 is not present in options.
0
0
23 is not correct u will get 25 but as this is not in the options loop level parallelism wont work here and u have to go drctly by 2*15=30
0
0
Answer:

Related questions