0 votes
0 votes
When choosing one feature from \(X_1, \ldots, X_n\) while building a Decision Tree, which of the following criteria is the most appropriate to maximize? (Here, \(H()\) means entropy, and \(P()\) means probability)

(a) \(P(Y | X_j)\)

(b) \(P(Y) - P(Y | X_j)\)

(c) \(H(Y) - H(Y | X_j)\)

(d) \(H(Y | X_j)\)

(e) \(H(Y) - P(Y)\)
in Artificial Intelligence
222 views

1 Answer

0 votes
0 votes
Best answer
The most appropriate criterion to maximize when choosing a feature in a decision tree is \((c) \ H(Y) - H(Y | X_j)\).

Explanation:

\(H(Y)\) represents the entropy of the target variable \(Y\), measuring its uncertainty or randomness. \(H(Y | X_j)\) represents the conditional entropy of \(Y\) given a specific feature \(X_j\), indicating how much uncertainty remains about \(Y\) after knowing the value of \(X_j\).

Information Gain:

The difference between these two entropies, \(H(Y) - H(Y | X_j)\), is called the information gain associated with feature \(X_j\). It quantifies the reduction in uncertainty about \(Y\) achieved by knowing the value of \(X_j\).

Goal of Decision Trees:

Decision trees aim to create splits that reduce uncertainty about the target variable as much as possible. Therefore, maximizing the information gain, which means maximizing \(H(Y) - H(Y | X_j)\), is the most appropriate criterion for feature selection.
selected by

Related questions