An parameter estimation technique for Probabilistic Generative Model

This approach makes the naive assumption that input features (the components of the input vector $x$ ) of each class are conditionally independent, resulting in a diagonal covariance matrix, $Σ_{j}$ , for each class, $C_{j}$ .

Conditional Independence

We know from stats that random variables $X$ and $Y$ are conditionally independent if the following holds:

p (X, Y ∣ Z) = p (X ∣ Z) p (Y ∣ Z)

Therefore, given feature vector:

x = [x_{1} \dots x_{d}]^{T}

With two different features $x_{i}$ and $x_{j}$ being independent we get the class conditionals as the product of separate input feature class conditionals:

p (x ∣ C) = n = 1 \prod d p (x_{n} ∣ C)

This is known as autoregressive decomposition. Don’t need to remember that though :-P

Since we assumed a Gaussian Distribution before, we can do so here as well for a Gaussian Naive Bayes estimation:

p (x ∣ C) = n = 1 \prod d \frac{1}{2 π σ _{n}^{2}} exp (- \frac{1}{2} \frac{( x _{n} - μ _{n} ) ^{2}}{σ _{n}^{2}}) = \frac{1}{∣2 π Σ∣} exp (- \frac{1}{2} (x - μ)^{T} Σ^{- 1} (x - μ))

With $σ_{n}^{2}$ being the variance of the $n$ -th feature and likewise $μ_{n}$ the mean. We can write it in matrix form with $Σ$ being diagonal with the variances on the diagonal.

From Bayes Theorem, we can find

P (C_{j} ∣ x) = \frac{P ( C _{j} ) p ( x ∣ C _{j} )}{p ( x )} = \frac{P ( C _{j} ) \prod _{n = 1}^{d} p ( x _{n} ∣ C _{j} )}{\sum _{i = 0}^{k} [ P ( C _{i} ) \prod _{n = 1}^{d} p ( x _{n} ∣ C _{i} ) ]}

With the bottom term being constant the most probable class is the max of the top term:

C^{*} = arg C_{j} max P (C_{j}) n = 1 \prod d p (x_{n} ∣ C_{j})

Naive Bayes Parameter Estimates

From above, we can find the equations for the parameters as follows:

P (C_{j}) μ_{nj} σ_{nj}^{2} = \frac{N _{j}}{N} = \frac{1}{N _{j}} {i : y_{i} \in C_{j}} \sum x_{ni} = \frac{1}{N _{j}} {i : y_{i} \in C_{j}} \sum (x_{ni} - μ_{nj})^{2}

The maximum likelihood is thus:

TODO:

continue from slide 19 and finish this section off.

How we’d do it in Python

from sklearn.naive_bayes import GaussianNB
 
clf = GaussianNB()
clf.fit(X, y)
 
y_pred = clf.predict(X)

📓 Daniel's Notes

Explorer

Naive Bayes

Conditional Independence

Naive Bayes Parameter Estimates

TODO:

How we’d do it in Python

Graph View

Table of Contents

Backlinks