in CO and Architecture edited by
10,399 views
34 votes
34 votes

A CPU has a $32$ $KB$ direct mapped cache with $128$ byte-block size. Suppose $A$ is two dimensional array of size $512 \times512$ with elements that occupy $8-bytes$ each. Consider the following two $C$ code segments, $P1$ and $P2$.

$P1$: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2: 

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
}

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers. Let the number of cache misses experienced by $P1$ be $M1$ and that for $P2$ be $M2$.

The value of the ratio $\frac{M_{1}}{M_{2}}$:

  1. $0$ 
  2. $\frac{1}{16}$
  3. $\frac{1}{8}$
  4. $16$ 
in CO and Architecture edited by
10.4k views

3 Comments

edited by

another good source- 

14
14
Nice:)
1
1

We can think 512* 512 2d array like 512 1d array each of size 512 element

It is given that element size is 8 byte so each 1d array size is 512 * 8 byte so 4KB

cache size given 32kb and 128 byte block size so,

total line in cache is 2^15/2^7 is 2^8

so we find out that their are 2^8 line in cache so at a time max 2^8 cache misses can happen then cache will filled up

we already find out that one 1d array size is 4KB and we know cache size is 32kb so number of 1d array we can load into cache at a time is 32kb/4kb = 8.

so 8 1d array at a time then total how many round needed to finish up all the 1d array ?

so their are 512 1d array and we can load 8 at a time so total round is 512/8 = 64 round

lets count now cache miss for this 64 round

ok so how many misses in 1 round?

so in our case in 1st  round we are feeling the cache total then in 2nd round we are overriding all the cache data. so for every round 2^8 misses so

64 round 64*2^8 = 2^14 misses so M1 = 16384


 

Lets move to column major access( i.e. M2) and data store in raw major order by default is C language

so for M2 we have to know when we store the 1d array how many element store per block and how many block it can take to store only one 1d array and their are 512 such 1d array keeping in mind.


Mistake i did when i was solving the question


 

i think like their are 2^14 misses for 64 round means for here we done with 2^14 cause what we load we access them all and we got to next

 

but due to column major order we unnecessarily load a whole row and access one element and load one more raw. so their are 512 element are their in a raw so we have to do 512*2^14 that is M2 but this is wrong


Their are 512 element in one 1d array it is correct but at a time all 512 not loaded because we load one block and 512 element we can’t fit into one block so how many element we can fit into one block?

element size is 8 byte block size is 128 byte so number of element is 2^7/2^3 = 2^4

so only 16 element we will load up for misses that is not 512* 2^14 that is 2^4*2^14

so M2 is 2^18

 

then M1/M2 = 2^14/2^18 = 1/2^4 = 1/16

 

so answer is option B

 

 

 

 

 

1
1

6 Answers

50 votes
50 votes
Best answer

$\text{Number of Cache Lines}= \dfrac{2^{15}B}{128B}= 256$

$\text{In 1 Cache Line} =\dfrac{128B}{8B} = 16\ elements$

$P_1=\dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$\quad=\dfrac{512 \times 512}{16}= 2^{14}= 16384.$

$P_2= 512 \times 512=2^{18}$

$\dfrac{P_1}{P_2}=\dfrac{16384}{512 \times 512}$

$\quad = 2^{14-18}= 2^{-4}=\dfrac{1}{16}$

It is so, because for $P_1$ for every line there is a miss, and once a miss is processed we get $16$ elements in memory.
So, another miss happens after $16$ elements.
For $P_2$ for every element there is a miss because storage is row major order(by default) and we are accessing column wise.

Hence, answer is option B.

edited by

4 Comments

Please explain no. of cache lines=2^15/128B . How 2^15 ?
0
0
it means whenever storage will be in  row major order(by default) and we will try to  access in  column wise. then for every element there will be miss
0
0

For p2 for every element there is a miss because storage is row major order(by default) and we are accessing column wise.

I think This line is not always true, consider the following variation


A[8][512], rest info same, column wise access


$for(j=0;j<512;j++)${
 $for(i=0;i<8;i++)${
   $ x+=A[i][j]$
 }
}
then there will 8 be misses for every 128 access


Refer the attached pic


 

0
0
68 votes
68 votes

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When A[0][0] is fetched, 128 consecutive bytes are moved to cache. So, for the next 128/8 -1= 15 memory references there won't be a cache miss. For the next iteration of i loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for P1

$= \frac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

 

In the case of P2, the memory references are not consecutive. After A[0][0], the next access is A[1][0] which is after 512 * 8 memory locations. Since our cache block can hold only 128 contiguous memory locations, A[1][0] won't be in cache after a[0][0] is accessed. Now, the next location after A[0][0] is A[0][1] which will be accessed only after 512 iterations of the inner loop- after 512 distinct memory  block accesses. In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced. So, each of the memory access in P2 results in a cache miss. Total number of cache miss

$ = 512\times 512$

So, $\frac{M_1}{M_2} = \frac{32 \times 512}{512 \times 512} = \frac{1}{16}$

by

4 Comments

Yes, you are right, I missed that.
0
0

In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced.

The quote above is not entirely obvious to me.

Please elaborate about this; how can we be sure that 512 iterations of the inner loop will definitely remove the block containing A[0][0]. Even if there are 512 accesses to distinct bocks for every iteration of the outer loop, maybe not all blocks are replaced and instead some cache lines are replaced multiple times?

For example, it could be the case that in each of the outer loop, every odd cache line is replaced twice and the even lines are left untouched after the initial compulsory miss. This means that when time comes to visit A[0][1], it will in fact be in the first cache block.

 

0
0
@Arjun sir, What would be the miss ratio in P2 if we were using Optimal Block replacement policy instead of LRU?
0
0
7 votes
7 votes

No. of elements/block = 128/8=16

one row contains 512 elements 

No. of blocks/row= 512/16 = 32

No. of cache lines = 32KB/128B= 256

 FOR M1

Now to access one row 32 miss operations will be occurred

to access whole array it requires = 512 * 32 (miss operations)

                                                    = 16384

FOR M2 

Now to access one column 512 miss operations will occur(as it is column major)

to access whole array it requires = 512 * 512 (miss operations)

so  M1/M2=1/16 

ANSWER IS (B)

3 votes
3 votes

Check previous question..in that question i calculated cache lines and MM block..Hence here Option B is correct.

Answer:

Related questions