See first Singular Value Decomposition for background on how this all works.

We apply SVD to the centered data :

Solving for S (the covariance matrix) as derived in PCA:

Therefore, and , so we can use SVD to calculate variance for each PC

Using SVD over eigendcomposition results in numerical stability as covariance matrix computation is avoided

The cool thing about the compact SVD is that we reduce dimensionality to dimensions without data loss

Nothing prevents us from removing even more of the singular values such that to reduce the data to dimensions:

  • is
  • is
  • is

**Note that is now an approximation of

Another cool trick we can do is left multiply the data matrix by

This causes the new data matrix to be a rotation (and possible reflection) of the data matrix . Note the right hand side of the SVD is now the SVD of and its vector is a identity matrix, meaning the principle components of are the original component axes (no more rotation of the axes and the data is now on the original component axes)

Since variation is we can also see that standard deviation is

Assuming is all non-zero singular values, we can multiply by the inverse to get

For this data (), both and in its SVD are identity matrices. This means:

  • The principle directions are aligned with the coordinate axes
  • All the singular values are 1, this means the data is reshaped to be spherical. To make the standard deviations unity, we scale by an additional factor **The factor is the whitened data (up to a scalar multiplicative factor)