GATE CSE 2006 | Question: 81

Question

Dark Mode

GATE CSE 2006 | Question: 81

go_editor asked in CO and Architecture Apr 23, 2016 edited Jun 26, 2018 by Shikha Mallick

10,399 views

A CPU has a $32$ $KB$ direct mapped cache with $128$ byte-block size. Suppose $A$ is two dimensional array of size $512 \times512$ with elements that occupy $8-bytes$ each. Consider the following two $C$ code segments, $P1$ and $P2$.

$P1$:

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[i] [j]; 
    } 
}

P2:

for (i=0; i<512; i++)
{ 
    for (j=0; j<512; j++)
    { 
        x +=A[j] [i]; 
    } 
}

$P1$ and $P2$ are executed independently with the same initial state, namely, the array $A$ is not in the cache and $i$, $j$, $x$ are in registers. Let the number of cache misses experienced by $P1$ be $M1$ and that for $P2$ be $M2$.

The value of the ratio $\frac{M_{1}}{M_{2}}$:

$0$
$\frac{1}{16}$
$\frac{1}{8}$
$16$

go_editor asked in CO and Architecture Apr 23, 2016 edited Jun 26, 2018 by Shikha Mallick

by go_editor

10.4k views

3 Comments

by Nirmalya Pratap 1

commented May 30, 2021

We can think 512* 512 2d array like 512 1d array each of size 512 element

It is given that element size is 8 byte so each 1d array size is 512 * 8 byte so 4KB

cache size given 32kb and 128 byte block size so,

total line in cache is 2^15/2^7 is 2^8

so we find out that their are 2^8 line in cache so at a time max 2^8 cache misses can happen then cache will filled up

we already find out that one 1d array size is 4KB and we know cache size is 32kb so number of 1d array we can load into cache at a time is 32kb/4kb = 8.

so 8 1d array at a time then total how many round needed to finish up all the 1d array ?

so their are 512 1d array and we can load 8 at a time so total round is 512/8 = 64 round

lets count now cache miss for this 64 round

ok so how many misses in 1 round?

so in our case in 1st round we are feeling the cache total then in 2nd round we are overriding all the cache data. so for every round 2^8 misses so

64 round 64*2^8 = 2^14 misses so M1 = 16384

Lets move to column major access( i.e. M2) and data store in raw major order by default is C language

so for M2 we have to know when we store the 1d array how many element store per block and how many block it can take to store only one 1d array and their are 512 such 1d array keeping in mind.

Mistake i did when i was solving the question

i think like their are 2^14 misses for 64 round means for here we done with 2^14 cause what we load we access them all and we got to next

but due to column major order we unnecessarily load a whole row and access one element and load one more raw. so their are 512 element are their in a raw so we have to do 512*2^14 that is M2 but this is wrong

Their are 512 element in one 1d array it is correct but at a time all 512 not loaded because we load one block and 512 element we can’t fit into one block so how many element we can fit into one block?

element size is 8 byte block size is 128 byte so number of element is 2^7/2^3 = 2^4

so only 16 element we will load up for misses that is not 512* 2^14 that is 2^4*2^14

so M2 is 2^18

then M1/M2 = 2^14/2^18 = 1/2^4 = 1/16

so answer is option B

6 Answers

Best answer

$\text{Number of Cache Lines}= \dfrac{2^{15}B}{128B}= 256$

$\text{In 1 Cache Line} =\dfrac{128B}{8B} = 16\ elements$

$P_1=\dfrac{\text{total elements in array}}{\text{elements in a cache line}}$

$\quad=\dfrac{512 \times 512}{16}= 2^{14}= 16384.$

$P_2= 512 \times 512=2^{18}$

$\dfrac{P_1}{P_2}=\dfrac{16384}{512 \times 512}$

$\quad = 2^{14-18}= 2^{-4}=\dfrac{1}{16}$

It is so, because for $P_1$ for every line there is a miss, and once a miss is processed we get $16$ elements in memory.
So, another miss happens after $16$ elements.
For $P_2$ for every element there is a miss because storage is row major order(by default) and we are accessing column wise.

Hence, answer is option B.

amarVashishth answered Apr 29, 2016 edited Dec 25, 2017 by pavan singh

by amarVashishth

4 Comments

Page:

Arjun · Answer 1 · 2016-04-29T17:53:41+0000

Code being C implies array layout is row-major.

http://en.wikipedia.org/wiki/Row-major_order

When A[0][0] is fetched, 128 consecutive bytes are moved to cache. So, for the next 128/8 -1= 15 memory references there won't be a cache miss. For the next iteration of i loop also the same thing happens as there is no temporal locality in the code. So, number of cache misses for P1

$= \frac{512}{16} \times 512$

$ = 32 \times 512 $

$=2^{14} = 16384$

In the case of P2, the memory references are not consecutive. After A[0][0], the next access is A[1][0] which is after 512 * 8 memory locations. Since our cache block can hold only 128 contiguous memory locations, A[1][0] won't be in cache after a[0][0] is accessed. Now, the next location after A[0][0] is A[0][1] which will be accessed only after 512 iterations of the inner loop- after 512 distinct memory block accesses. In our cache we have only space for 32 KB/128 B = 256 memory blocks. So, by the time A[0][1] is accessed, its cache block would be replaced. So, each of the memory access in P2 results in a cache miss. Total number of cache miss

$ = 512\times 512$

So, $\frac{M_1}{M_2} = \frac{32 \times 512}{512 \times 512} = \frac{1}{16}$

Tauhin Gangwar · Answer 2 · 2016-04-24T02:38:30+0000

No. of elements/block = 128/8=16

one row contains 512 elements

No. of blocks/row= 512/16 = 32

No. of cache lines = 32KB/128B= 256

FOR M1

Now to access one row 32 miss operations will be occurred

to access whole array it requires = 512 * 32 (miss operations)

= 16384

FOR M2

Now to access one column 512 miss operations will occur(as it is column major)

to access whole array it requires = 512 * 512 (miss operations)

so M1/M2=1/16

ANSWER IS (B)

Navneet Singh Tomar · Answer 3 · 2019-11-07T18:20:14+0000

tags	tag:apple
author	user:martin
title	title:apple
content	content:apple
exclude	-tag:apple
force match	+apple
views	views:100
score	score:10
answers	answers:2
is accepted	isaccepted:true
is closed	isclosed:true

GATE CSE 2006 | Question: 81

3 Comments

Please log in or register to add a comment.

Please log in or register to answer this question.

6 Answers

4 Comments

Please log in or register to add a comment.

4 Comments

Please log in or register to add a comment.

0 Comments

Please log in or register to add a comment.

Check previous question..in that question i calculated cache lines and MM block..Hence here Option B is correct.

0 Comments

Please log in or register to add a comment.

Related questions

Subscribe to GATE CSE 2024 Test Series

Subscribe to GO Classes for GATE CSE 2024

Subjects

GATE CSE 2006 | Question: 81

3 Comments

Please log in or register to add a comment.

Please log in or register to answer this question.

6 Answers

4 Comments

Please log in or register to add a comment.

4 Comments

Please log in or register to add a comment.

0 Comments

Please log in or register to add a comment.

Check previous question..in that question i calculated cache lines and MM block..Hence here Option B is correct.

0 Comments

Please log in or register to add a comment.

Related questions

Subscribe to GATE CSE 2024 Test Series

Subscribe to GO Classes for GATE CSE 2024

Recent Posts

Subjects

Recent Blog Comments