in CO and Architecture retagged by
1,814 views
0 votes
0 votes

 A program consists of two nested loops—a small inner loop and a much larger outer loop.The decimal memory addresses
shown delineate the location of the two loops and the beginning and end of the total program. All memory locations in the various sections of the program, 17-22, 23-164, 165-239, and so on, contain instructions to be executed in straight-line sequencing. The program is to be run on a computer that has an instruction cache organized in the direct-mapped manner with the following parameters:

Main memory size 64K words
Cache size 1K words
Block size 128 words
The cycle time of the main memory is 10τ s, and the cycle time of the cache is 1τ s.
Compute the total time needed for instruction fetching during execution of the program.

in CO and Architecture retagged by
1.8k views

2 Answers

2 votes
2 votes
Please bear with me for this detailed answer but this question baffled me for few days and even I could not understand the solution itself. I hope my answer will help clear your doubts.

As you can see its mentioned the Main memory Size is 64K words and block size is 128 words.

So the total blocks in the memory would be 2^16 / 2^7 = 2^9 (512 blocks , each block will be containing 128 words) Now coming up to the cache which is 1K means 2^10.
So the number of cache lines will be 2^10 / 2^7 = 2^3 = 8 lines or blocks and each block will be 128 words each. Both blocks in cache and memory should be same as
that will be justify direct mapping principle.

Let me plot the cache lines for you to have a better visualization . Since each block has 128 words so the Block 0 will contain the words from 0 - 127, Block 1 will
contain the words from 128 - 255 and so on . Below ranges are appended for each block. Consider the same ranges mapping in the main memory as the number of blocks
are same. Since in our question the last instruction ends at 1500 so the ranges are shown upto that point only


#########
#       #   --> Block 0 (0 - 127) / (1024 - 1151)
#########

#########
#       #  --> Block 1  (128 - 255) /  (1152 - 1279)
#########

#########
#       #  --> Block 2 (256 - 383) / (1280 - 1407)
#########

#########
#       #  --> Block 3 (384 - 511) / (1408 - 1535)
#########

#########
#       #  --> Block 4 (512 - 639)
#########

#########
#       #   --> Block 5 (640 - 767)
#########

#########
#       #  --> Block 6 (768 - 895)
#########

#########
#       #  --> Block 7  (896 - 1023)
#########


#####################
# 6  #  3   #   7   #  --> Here you can easily deduce that words are of 7 bits (128 words per block) , 3 bits for block (8 blocks for the cache) , 6 bits for the tag
#    #      #       #      (64 tags as can easily be computed - Number of blocks in memory / Number of blocks in cache = 512 / 8 = 64 )
#####################

Now we have figured out the basics , now lets jump back to the question. Its cleary mentioned that there are two nested loops and direct mapping is used. So , lets
analyse the from the beginning.

The beginning address is 17 and it falls within the first block in memory (0 - 127) and gets mapped to Block 0 (0 - 127) in cache. Now as mentioned in the question itself -
All memory locations in the various sections of the program, 17-22, 23-164, 165-239, and so on, contain instructions to be executed in straight-line sequencing. So the
following instructions 18 - 22 are also mapped to Block 0. Now the first outer loop is located at address 23 , and the first iteration is started. From addresses
123 - 127 will be mapped in Block 0 and rest of the instructions will be mapped to (128 - 164) Block 1. Below is the sequence of the first iteration of the outer loop

17 - 22 --> Block 0
23 - 127 --> Block 0
128 - 165 --> Block 1
165 - 239 --> Block 1
240 - 255 --> Block 1
256 - 383 --> Block 2
384 - 511 --> Block 3
512 - 639 --> Block 4
640 - 767 --> Block 5
768 - 895 --> Block 6
896 - 1023 --> Block 7
1024 - 1151 --> Block 0 - (The old block contents will be removed and will be replaced by the contents of addresses 1024 - 1151)
1152 - 1279 --> Block 1 -  (The old block contents will be removed and will be replaced by the contents of addresses 1152 - 1279)

Since the first loop ends at location 1200 , hence the first iteration of the loop will have the sequence of reads from memory as follows -
0,1,2,3,4,5,6,7,0,1

Now in the second iteration , block 0 and block 1 will be replaced again as the instruction addresses (23 belongs to block 0 and inner loop addresses 165 - 239 belong
to block 1) , rest all the instructions (block 2 to block 7) that are resident in cache will remain same until the last iteration of the outer loop.One thing to note
here is block 0 and block 1 will be replaced twice as you can see in the first iteration.

Hence the sequence
of memory read for all the 10 iterations of the outer loop is as follows

0,1,2,3,4,5,6,7,0,1  0,1,0,1  0,1,0,1  0,1,0,1  0,1,0,1  0,1,0,1  0,1,0,1  0,1,0,1  0,1,0,1  0,1,0,1,2,3  (The instruction range from 1200 - 1500 in the last iteration

will replace block 2 and block 3).

Now lets just analyse the first iteration sequence :
10 blocks are there that were read from memory , hence total number of cycles would be (10 memory cycles as given in question) = 128 * 10 * 10 = 12800 cycles

Then there are two loops - outer and inner. Firstly lets look at the inner loop.

Inner loop = Last address - Beginning address + 1 = 239 - 165 + 1 = 239 - 164 = 75

This inner loop will be executed 20 times in the one iteration of the outer loop , since we already computed the time taken from memory to cache for the blocks and now
the inner loop instruction addresses are already in cache. Hence,

Total time for one iteration of outer loop , the inner loop time will be = 75 * 20 * 1 = 1500 cycles

Now there are instructions in between outer loop and inner loop that will be in cache that can be computed by subtraction as follows :

Outer loop = Last address - Beginning address + 1 = 1200 - 23 + 1 = 1200 - 22 = 1178
Inner loop = Last address - Beginning address + 1 = 239 - 165 + 1 = 239 - 164 = 75

Hence for one iteration of outer loop , the differential instruction in between loops is = 1178 - 75 = 1103

Now we know that there are 10 iterations of the outer loop and 20 iterations each for inner loop. Now from the above sequence that we generated we can see there are
only block 0 and block 1 that are replaced and rest blocks remained in cache only. Hence the computation is as follows

Now the total time of reading the above sequence of blocks from main memory to cache is = Total number of blocks * 128(words per block) * 10 (memory cycle time)

==> 48 * 128 * 10 = 61440 memory cycles

Differential instruction in between loops for 10 iterations = 1103 * 10 * 1 = 11030 (these are the instructions of block 0 and block 1 when in cache)

Total time for ten iterations of outer loop , the inner loop time will be = 1500 * 10 * 1 = 15000 cycles (when in cache as we have already calculated the memory reads)

Once both the loop are executed , the end section of program is executed in cache = 1500 - 1200 = 300 * 1 = 300 cycles (block 2 and block 3 that are replaced in the last iteration)

Therefore the total time needed for instruction fetching during execution of the program is = 61440 + 11030 + 300 + 15,000 = 87770 cycles
0 votes
0 votes
As Cache has 8 blocks.(0 - 7)
so for 1 pass first all the 8 block got filled to satisfy the 0 - 1023 need and then again block 0 and block 1 for 1024 to 1200 .
for 2 pass as the we are moving only between block 0 - 1 till 9 times as 1 we already count in first pass (block 0 and block 1 get keep on exchanging)

and final pass we are move 0  - 1 - 2 - 3 (3 contains address 1500 so from 1200 to 1500 we need to fetch 2 and 3 again).
so total time moving block from main memory to   cache is.

(10 + 4*9 + 2 ) * 128 *10τ s = 61440 τ

again time needed to execute the program

let remove the inner loop out form outer loop and count the number of step in program
[(1200 − 22) − (239 − 164)]10 × 1τ = 11030 τ (as the remaining section will execute in outer loop 10 times )
inner loop (239 -164)10*20*1 τ = 15000 τ

and end section 1500-1200 = 300*1 τ  = 300τ

 

so total is 11030 +15000+300+61440 = 87770 τ

1 comment

why have u not included the time for execution of 17-22 line no?
0
0
Answer:

Related questions