Given a set of labelled data (a data set of observations each with a class label if )

We wish to construct the posterior probabilities, or class probabilities, from the given data . There are two methods to do this:

  • Generative - through means of Naive Bayes
    1. Estimate the class-conditional probabilities, - which can be used to generate new data points, hence the name ‘generative model’
    2. Calculate the posterior probability using Bayes Theorem:
  • Discriminative - through means of Logistic Regression
    1. Directly compute without first calculating class conditionals

So which one do I choose?

Well, here’s quick comparison:

GenerativeDiscriminative
More FlexibleLess Flexible
Less Efficient for ClassificationMore Efficient for Classification
Simpler Training (per class)Harder Training
Class DataAll Data
Models each classFocusses on class differences

In other words: The generative model is trained per class and ignores the properties of the other classes, while the discriminative model considers all data during training.