Given a set of labelled data (a data set of observations each with a class label if )
We wish to construct the posterior probabilities, or class probabilities, from the given data . There are two methods to do this:
- Generative - through means of Naive Bayes
- Estimate the class-conditional probabilities, - which can be used to generate new data points, hence the name ‘generative model’
- Calculate the posterior probability using Bayes Theorem:
- Discriminative - through means of Logistic Regression
- Directly compute without first calculating class conditionals
So which one do I choose?
Well, here’s quick comparison:
Generative | Discriminative |
---|---|
More Flexible | Less Flexible |
Less Efficient for Classification | More Efficient for Classification |
Simpler Training (per class) | Harder Training |
Class Data | All Data |
Models each class | Focusses on class differences |
In other words: The generative model is trained per class and ignores the properties of the other classes, while the discriminative model considers all data during training.