An parameter estimation technique for Probabilistic Generative Model
This approach makes the naive assumption that input features (the components of the input vector ) of each class are conditionally independent, resulting in a diagonal covariance matrix, , for each class, .
Conditional Independence
We know from stats that random variables and are conditionally independent if the following holds:
Therefore, given feature vector:
With two different features and being independent we get the class conditionals as the product of separate input feature class conditionals:
This is known as autoregressive decomposition. Donβt need to remember that though :-P
Since we assumed a Gaussian Distribution before, we can do so here as well for a Gaussian Naive Bayes estimation:
With being the variance of the -th feature and likewise the mean. We can write it in matrix form with being diagonal with the variances on the diagonal.
From Bayes Theorem, we can find
With the bottom term being constant the most probable class is the max of the top term:
Naive Bayes Parameter Estimates
From above, we can find the equations for the parameters as follows:
The maximum likelihood is thus:
TODO:
continue from slide 19 and finish this section off.