PCA with Singular Value Decomposition (SVD)

See first Singular Value Decomposition for background on how this all works.

We apply SVD to the centered data $D$ :

D = U Σ_{d} V_{d}^{T} (thin SVD)

Solving for S (the covariance matrix) as derived in PCA:

S = \frac{1}{N} D D^{T} = \frac{1}{N} U Σ_{d} V_{d}^{T} (U Σ_{d} V_{d}^{T})^{T} = \frac{1}{N} U Σ_{d} V_{d}^{T} V_{d} Σ_{d}^{T} U^{T} = \frac{1}{N} U Σ_{d}^{2} U^{T} = U \frac{Σ _{d}^{2}}{N} U^{T}

Therefore, $Q = U$ and $Λ = \frac{Σ _{d}^{2}}{N}$ , so we can use SVD to calculate variance for each PC

Using SVD over eigendcomposition results in numerical stability as covariance matrix computation is avoided

The cool thing about the compact SVD is that we reduce dimensionality to $r$ dimensions without data loss

Nothing prevents us from removing even more of the singular values such that $v \leq r$ to reduce the data to $v$ dimensions:

D \approx D_{v} = U_{v} Σ_{v} V_{v}^{T}

$U_{v}$ is $d \times v$
$Σ_{v}$ is $v \times v$
$V_{v}$ is $v \times N$

**Note that $D_{v}$ is now an approximation of $D$

Another cool trick we can do is left multiply the data matrix $D$ by $U^{T}$

U^{T} D = I Σ_{d} V_{d}^{T}

This causes the new $D^{'} = U^{T} D$ data matrix to be a rotation (and possible reflection) of the data matrix $D$ . Note the right hand side of the SVD is now the SVD of $D^{'}$ and its $U$ vector is a identity matrix, meaning the principle components of $D^{'}$ are the original component axes (no more rotation of the axes and the data is now on the original component axes)

Since variation is $\frac{Σ _{d}^{2}}{N}$ we can also see that standard deviation is $\frac{Σ _{d}}{N}$

Assuming $Σ_{d}$ is all non-zero singular values, we can multiply by the inverse to get

Σ_{d}^{- 1} U^{T} D = II V_{d}^{T}

For this data ( $Σ_{d}^{- 1} U^{T} D$ ), both $U$ and $Σ$ in its SVD are identity matrices. This means:

The principle directions are aligned with the coordinate axes
All the singular values are 1, this means the data is reshaped to be spherical. To make the standard deviations unity, we scale by an additional factor $N$ **The factor $V^{T}$ is the whitened data (up to a scalar multiplicative factor)

📓 Daniel's Notes

Explorer

PCA with Singular Value Decomposition (SVD)

Graph View

Backlinks