An parameter estimation technique for Probabilistic Generative Model

This approach makes the naive assumption that input features (the components of the input vector ) of each class are conditionally independent, resulting in a diagonal covariance matrix, , for each class, .

Conditional Independence

We know from stats that random variables and are conditionally independent if the following holds:

Therefore, given feature vector:

With two different features and being independent we get the class conditionals as the product of separate input feature class conditionals:

This is known as autoregressive decomposition. Don’t need to remember that though :-P

Since we assumed a Gaussian Distribution before, we can do so here as well for a Gaussian Naive Bayes estimation:

With being the variance of the -th feature and likewise the mean. We can write it in matrix form with being diagonal with the variances on the diagonal.

From Bayes Theorem, we can find

With the bottom term being constant the most probable class is the max of the top term:

Naive Bayes Parameter Estimates

From above, we can find the equations for the parameters as follows:

The maximum likelihood is thus:

TODO:

continue from slide 19 and finish this section off.

How we’d do it in Python

from sklearn.naive_bayes import GaussianNB
 
clf = GaussianNB()
clf.fit(X, y)
 
y_pred = clf.predict(X)