in CO and Architecture edited by
22,754 views
56 votes
56 votes

Following table indicates the latencies of operations between the instruction producing the result and instruction using the result.
$$\begin{array}{|l|l|c|} \hline \textbf {Instruction producing the result} &  \textbf{Instruction using the result }& \textbf{Latency} \\\hline \text{ALU Operation} & \text{ALU Operation} & 2 \\\hline \text{ALU Operation} & \text{Store} & \text{2}\\\hline \text{Load} & \text{ALU Operation} & \text{1}\\\hline \text{Load} & \text{Store} & \text{0} \\\hline \end{array}$$

Consider the following code segment:

 Load R1, Loc 1; Load R1 from memory location Loc1
 Load R2, Loc 2; Load R2 from memory location Loc 2
 Add R1, R2, R1; Add R1 and R2 and save result in R1
 Dec R2;         Decrement R2
 Dec R1;         Decrement R1
 Mpy R1, R2, R3; Multiply R1 and R2 and save result in R3
 Store R3, Loc 3; Store R3 in memory location Loc 3


What is the number of cycles needed to execute the above code segment assuming each instruction takes one cycle to execute?

  1. $7$
  2. $10$
  3. $13$
  4. $14$
in CO and Architecture edited by
22.8k views

3 Comments

What is the answer ? 13 or 14?
0
0
instruction i4 is decrementing i.e arithmatic operation it is betn LOAD and ALU operation so latency is 1

you havnt included that ...
0
0

No.The explanation given above is correct and answer is 13 only.

See-->

I3 is ALU operation which uses result of LOAD in I2 , so latency is of 1 cycle (after I2)

I4 is ALU operation which uses result of LOAD in I2 , so latency is of 1 cycle (after I2) but I3 is executing at cycle 4. Therefore. I4 will execute at cycle no. 5.

I5 is ALU operation using result of ALU in I3 therefore has to wait for 2 cycles after I3

I6 is ALU and uses result of ALU in I5 ,therefore waits 2 cycles

4
4

5 Answers

184 votes
184 votes
Best answer

In the given question there are $7$ instructions each of which takes $1$ clock cycle to complete. (Pipelining may be used)
If an instruction is in execution phase and any other instructions can not be in the execution phase. So, at least $7$ clock cycles will be taken.
Now, it is given that between two instructions latency or delay should be there based on their operation. Ex- $1^{st}$ line of the table says that between two operations in which first is producing the result of an ALU operation and the $2^{nd}$ is using the result there should be a delay of $2$ clock cycles.

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c||} \hline \text {Clock cycle} & \text{T1} & \text{T2}&  \text{T3 }& \text{T4} & \text{T5} & \text{T6} & \text{T7}&  \text{T8 }& \text{T9} & \text{T10} & \text{T11} & \text{T12}&  \text{T13 }\\\hline & \text{I1} & \text{I2} &  & \text{I3}&  \text{I4 } & & \text{I5} & & & \text{I6} & & & \text{I7}\\\hline \end{array}$$

 

  1. Load R1, Loc 1; Load R1 from memory location Loc1
    Takes 1 clock cycle, simply loading R1 on loc1.
     
  2. Load R2, Loc 2; Load R2 from memory location Loc2
    Takes 1 clock cycle, simply loading r2 on loc2.
     
  3. (Add R1, R2, R1; Add R1 and R2 and save result in R1
    R1=R1+R2;
    Hence, this instruction is using the result of R1 and R2, i.e. result of Instruction 1 and Instruction 2.
    As instruction 1 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 1 and instruction 3, which is already there due to I2.
    As instruction 2 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 2 and instruction 3.
     
  4. Dec R2; Decrement R2
    This instruction is dependent on instruction 2 and there should be a delay of one clock cycle between Instruction 2 and
    Instruction 4. As instruction 2 is load and 4 is ALU, which is already there due to Instruction 3.
     
  5. Dec R1 Decrement R1
    This instruction is dependent on Instruction 3
    As Instruction I3 is ALU and I5 is also ALU so a delay of 2 clock cycles should be there between them of which 1 clock cycle delay is already there due to I4 so one clock cycle delay between I4 and I5.
     
  6. MPY R1, R2, R3; Multiply R1 and R2 and save result in R3
    R3=R1*R2;
    This instruction uses the result of Instruction 5, as both instruction 5 and 6 are ALU so there should be a delay of 2 clock cycles.
  7. Store R3, Loc 3 Store R3 in memory location Loc3
    This instruction is dependent on instruction 6 which is ALU and instruction 7 is store so there should be a delay of 2 clock cycles between them.

Hence, a total of 13 clock cycles will be there.

Correct Answer: $C$

edited by

4 Comments

I am having the same doubt ....we don't consider it because of parallel execution ?? Means both the operands are fetched parallely so the maximum is 2 cc ?? @Arjun Sir please clarify ...

0
0
Simply superb
0
0

Very well explained ! Thanks @Madhab.

0
0
19 votes
19 votes

Answer is (C)

Here each instruction takes $1$ cycle but apart from that we have to consider latencies b/w
instruction: If there are two ALU operations by $I1$ and $I2$ such that $I2$ uses the value
produced by $I1$ in some register then $I2$ will be executed ONLY after waiting TWO more
cycles after $I1$ has executed because latency b/w two ALU operations is $2$

See here:

Clock         1       2         3         4         5            6           7           8         9          10           11          12           13             14

Inst.           I1       I2       -           I3       I4           -           I5           -          -             I6            -             -              I7

$I3$ is ALU operation which uses the result of LOAD in $I2,$ so latency is of $1$ cycle.

$I5$ is ALU operation using result of ALU in I3, therefore, has to wait for $2$ cycles after $I3$

$I6$ is ALU and uses result of ALU in I5, therefore waits for $2$ cycles

edited by

4 Comments

Increase font size of ur answer , it will look better..
0
0
calculation not clear.
1
1
I3 is ALU operation which uses result of load in I2 hence latency should be two cycles

I4 is also ALU opn which uses result of load in I2 hence latency 2

I5 is ALU - ALU hence 2

I6 latency 2

I7 latency 1

In this way answer comes to be 16.

Can you please clear my doubt sir?
0
0
instruction i4 is decrementing i.e arithmatic operation it is betn LOAD and ALU operation so latency is 1

you havnt included that .
0
0
5 votes
5 votes

cool answer...>>

2 Comments

totally not cool. For such a good answer to be at bottom!

1
1
beautifully explained :)
0
0
2 votes
2 votes

As per the given table and the assumption that 
The instruction of Type R1 <- R1 + R2 requires 
2 x (load - alu op) type

1 x (alu op - store) type

Then the answer should be 14

Analysis:
Instr | Cost
==== ====
1. 0
2. 0
3. 1+1+2 = 4
4. 1+2 = 3
5. 1+2 = 3
6. 1+1+2 = 4
7. 0

AND 
if, R1 <- R1+R2 
is only an (alu-store) type.. then intrs. 3 and 6 take 2 time units each
resulting in the answer as 10..

Unless, somebody presents a different interpretation.. 
smiley

2 Comments

I m not sure between (A) and (B).. If go with table.. it will come out to be 10 clocks, assuming 1 latency is 1 clock.

Next it is clearly given that "Assuming each instruction takes one cycle to execute".. here there are 7 instructions.. So 7 cycle.. isn't it?.. So which should we follow the table or the statement that given..?
0
0
made easy soluion
1
1
Answer:

Related questions