in Others edited by
2,335 views
1 vote
1 vote

Given a discrete $\text{K}$-class dataset containing $\text{N}$ points, where sample points are described using $\text{D}$ features with each feature capable of taking $\text{V}$ values, how many parameters need to be estimated for Naïve Bayes Classifier?

  1. $V^{D} K$
  2. $K^{V^{D}}$
  3. $\text{VDK}$
  4. $\text{K(V + D)}$

 

in Others edited by
by
2.3k views

3 Answers

1 vote
1 vote
  1. VDK

1 comment

How??
3
3
1 vote
1 vote

C. VDK is the closest answer, but the actual answer should be VDK + K.

Naive bayes calculates P(v | k) for each value v of each feature for each class k. Hence, this gets us VDK parameters. Also NB has to calculate P(k) for each class. Therefore, the total number of parameters are VDK + K.

3 Comments

Yeah. But I was think what about degrees of freedom (DOF) ? Let’s say for a feature D1 (taking V values, say 1 to V and K classes), and for k=1, we need to estimate P(v=1|k=1), P(v=2|k=1) … and so on up to P(v=V|k=1). Similarly for each K value, we get V and thus we get V*K for a single D1 feature. For D features, it’ll be K*V*D. For the prior probabilities of classes, say 1 to K, we need P(k=1), P(k=2) and so on up to P(k=K), (total K values), summing up the answer to K*V*D + K. But for the above D1 feature, P(v=1|k=1), P(v=2|k=1)… and so on up to P(v=V|k=1) sums to 1. So we need (V-1) values only (Cuz, Last value can be 1 – (all values), losing a single dof). For a feature we need, K*(V-1), and for all features we need K*D*(V-1) and similarly for prior probabilities, K-1 values only. So I thought the answer might be K*D*(V-1)+ (K-1) instead of K*V*D + K. But the options have K*V*D+K in the paper. Sorry if I’m wrong and hoping someone would help me before the exam. Best of luck to y’all.

0
0

ML models like Naive Bayes assume that they are operating on the whole population, i.e. there is no data outside of what has been provided in their dataset. Therefore, every parameter calculated by the model is assumed to be 100% true and hence it does not require any degrees of freedom. DOF mostly matters when we are inferring properties of a population using a small sample.

Also, you are correct that we actually need only KV(D-1) + (K-1) parameters. But the model calculates all the parameters to save computation time of recalculating them each time during inference. Internally it is possible that NB uses the formula '1-rest' to calculate the last probability, but it still counts as a parameter.

1
1

Yeah, you’re right. It still counts as a parameter. But the question says ‘ need to be estimated’? So I was thinking of KD(V-1)+ (K-1), coz... Let’s say in a multiple LR, y = a + bx1 + cx2. To calculate the sample variance, S^2 = ∑ i = 1 to n ((y<i> − μ )^2 )/(n − 1), (n-1 in denominator since we estimated μ, losing a 1 dof), but in this case, since we’re estimating ‘a’ and ‘b’, we lose the 2 dof  and the final formula will be S^2 = ∑ i = 1 to n ((y<i> − μ )^2 )/(n − 3). In a similar way, since we’re Estimating, even though it is a parameter, I thought it is actually not an estimator. 
P.S : I’ve used more terms without knowing them completely, please correct them.

0
0
0 votes
0 votes

Naive Bayes how many parameters need to be estimated : 


Given a dataset of N points in which each point has d features . Each feature can take V values.
X=<X1,X2…...Xd>  and Xi=<V Discrete Values> to be classified into Y = <K Discrete Classes>

P(Y|X=x1,x2,x3...xd) is prop to   P(Y).P(X=x1,x2,x3...xd|Y) 
 P(Y) needs K-1 parameters
P(X=x1,x2,x3...xd|Y)  needs dVK – dK parameters => dk(V-1)

In total   K-1 + dk(V-1) parameters needed.

 

Related questions

Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true

64.3k questions

77.9k answers

244k comments

80.0k users