in Others recategorized by
796 views
0 votes
0 votes

​​​​​​Given a dataset with $K$ binary-valued attributes (where $K>2$ ) for a two-class classification task, the number of parameters to be estimated for learning a naïve Bayes classifier is

  1. $2^{K}+1$
  2. $2 K+1$
  3. $2^{K+1}+1$
  4. $K^{2}+1$

in Others recategorized by
by
796 views

1 Answer

1 vote
1 vote

Naive Bayes Classifier assumes that attributes have independent distribution.

So, if there are $K$ features in the dataset as $x_1,x_2,...,x_K$ and the class label is $y$.

Here, both $x$ and $y$ are boolean variables. It means $x_i=0$ or $x_i=1$ for $1\leq i \leq K$  and $y=0$ or $y=1$

Now, according to the "Independent" assumption of Naive Bayes Classifier,

  

$\mathbb{P}(x_1,x_2,x_3,...,x_K \ | y)=\mathbb{P}(x_1 \ | \ y)\mathbb{P}(x_2 \ | \ y)\mathbb{P}(x_3 \ | \ y)...\mathbb{P}(x_K \ | \ y)$ $\; \; \; (1)$

   

Now, since, Bayesian Classifiers use the Bayes theorem i.e. $P(y|x)=\frac{P(x|y)P(y)}{P(x)}$ but we ignore the $P(x)$ here because it is same for all the classes "$y$" and so it is irrelavant here. 

So,

$\mathbb{P}(y|x_1,x_2,x_3,...,x_K)=\mathbb{P}(x_1,x_2,x_3,...,x_K \ | y)\mathbb{P}(y)$

     

Now, according to the equation $(1),$

   

$\mathbb{P}(y|x_1,x_2,x_3,...,x_K)=\mathbb{P}(x_1 \ | \ y)\mathbb{P}(x_2 \ | \ y)\mathbb{P}(x_3 \ | \ y)...\mathbb{P}(x_K \ | \ y) \mathbb{P}(y)$

Now, to estimate each $\mathbb{P}(x_i \ | \ y),$ we have to find $2$ values for each $y=0$ and $y=1$.

For example, if we have found $\mathbb{P}(x_i=1 \ | \ y=0)$ then we can easily find $\mathbb{P}(x_i=0 \ | \ y=0)= 1- \mathbb{P}(x_i=1 \ | \ y=0)$ and vice-versa.

The reason is being they are independent features.

Same goes for $y=1.$

Hence, we need $2$ values for estimating each $\mathbb{P}(x_i \ | \ y)$ and so we need total $2+2+...K \ times = 2K$ values and we also need one value to estimate $\mathbb{P}(y)$.

   

So, we need total $2K+1$ parameters.

   

$\textbf{Note:}$

If the features are not independent then we need total $2^{K+1}-2$ parameters to estimate $\mathbb{P}(x_1,x_2,x_3,...,x_K \ | y)$ because for $y=0$, you will need $2^K-1$ values because you can compute remaining one value by making probability sum as 1 and similarly for $y=1$, you will need $2^K-1$ values.

Hence total you will need $2(2^K-1)=2^{K+1}-2$ to estimate $\mathbb{P}(x_1,x_2,x_3,...,x_K \ | y).$

2 Comments

Doubt : We have to only count the how many parameters required for each class therefore 1 class + 2*K binary Value attributes ?
For a naïve Bayes classifier with K binary-valued attributes, the number of parameters to be estimated includes:

1. Prior probabilities for each class: 2 parameters (one for each class).

2. Conditional probabilities for each attribute given each class: For each attribute, there are two possible values (binary), so for K attributes and 2 classes, there are a total of \( K \times 2 \) conditional probabilities.

Therefore, the total number of parameters to be estimated is \( 1 + 2K \).

So, the correct answer is:

A. \( 2K + 1 \)
1
1

@NarutoUzumakisorry, I am not getting your doubt. could you please explain ?

0
0

Related questions