in Databases recategorized by
12,639 views
43 votes
43 votes

Consider a selection of the form $\sigma_{A\leq 100} (r)$, where $r$ is a relation with $1000$ tuples. Assume that the attribute values for $A$ among the tuples are uniformly distributed in the interval $[0, 500].$ Which one of the following options is the best estimate of the number of tuples returned by the given selection query ?

  1. $50$
  2. $100$
  3. $150$
  4. $200$
in Databases recategorized by
12.6k views

4 Comments

"Relational Algebra doesn't allow repetition"

Well, it's selection query, not Projection. Selection by default will never have duplicates, since all tuples have to be different.
0
0
If whole tuple will repeat then it will ignore repeating tuple. But Here may be more attributes apart from A...And here A is repeating but combination of other attribute with A may not repeat...bcz of this 200 is ans instead of 100.

 

If projection(A) has also given with this question then No doubt 100 would be answer.
0
0
edited by
Another way to solve this question

Probability density function $F(x) \  =  \frac{1}{500\  – \ 0} = \frac{1}{500}$

$\therefore P(x \le 100) = \int_{0}^{100} F(x) dx$

$P = \int_{0}^{100} \frac{1}{500} dx$

$P = \frac{1}{500}  \ [x]_{0}^{100}$

$P = \frac{100}{500} = \frac{1}{5}$

So, the total number of tuples $= NP$

$= 1000 \  \times  \ \frac{1}{5}$

$= 200$
11
11

5 Answers

0 votes
0 votes
My thought process:-

Since nothing is given about the schema, assume the schema is R(a,b,c,d),

now possible values of ‘a’:- 0,0,1,1,2,2,3,3,4,4,5,5………..499,499 ( not sure why 500 is included in the question if the distribution is uniform, but if we try to bring the distribution to almost be uniform, then i guess we can assume possible values of ‘a’ is 0,1,2,3,4,5….498,499,500,0,1,2,3,4,5,6,……..498.  Either approach should give me the correct answer since we’re dealing with values of ‘a’ less than 200 )

now the query only applies a condition on ‘a’, it does not project the ‘a’ field, therefore thinking about unique ‘a’ values would be wrong as the result will give us 202 tuples, this is where our assumption of the schema being not R(a) only comes into picture.

Hopefully, this should summarise the entire scenario, we have played it safe by assuming the schema to be general.
Answer:

Related questions