Singular Value Decomposition

We want to find a orthogonal set of vectors ( $V$ ), which when transformed by our matrix ( $X$ ), remain orthogonal ( $U$ ): $X V = U Σ$ (Transforming orthogonal matrix $V$ with $X$ produces orthogonal matrix $U$ scaled by $Σ$ (which is diagonal - only scales))

We can consider a 2D case of a sphere (circle) which is transformed by matrix $X$ (which will stretch and rotate the vector):

The n-dimensional sphere has orthogonal vectors ( $v_{1}, ..., v_{d}$ ) which is transformed to orthonormal unit vectors $u_{1}, ..., u_{r}$ and stretch amounts $σ_{1}, ..., σ_{r}$ (with $r$ being the matrix rank, typically $r = d$ )

$u_{i}$ are the principle axes
$σ_{i}$ are called the singular values

This leads to the following mathematical expression:

X v_{i} = σ_{i} u_{i}

This is similar to the eigenvalue problem $A v = λ v$ , however, it does not require the same vector $v$ on both sides.

We can write this in matrix form:

X V = U Σ

See the video I got this from

Terminology: Orthogonal matrix is a matrix $A$ such that $A^{T} A = A A^{T} = I$

In other words, any real $d \times N$ matrix can be decomposed as:

$X = U Σ V^{T}$
- $U$ is a $d \times d$ orthonormal matrix (this $U$ is not the same as the optimised $A$ transformation matrix used in PCA)
- V is a $N \times N$ orthogonal matrix
- $Σ$ is a $d \times N$ diagonal matrix with non-negative entries in non-decreasing order down the diagonal ( $σ_{1} \geq σ_{2} \geq ... \geq σ_{r} \geq 0$ )

This is called the SVD of $X$

The non-zero entries of $Σ$ are called the singular values of $X$ , also the number of non-zero entries in $Σ$ is $r ank (X)$

Solving these matrixes

X^{T} X X^{T} X V = (U Σ V^{T})^{T} (U Σ V^{T}) = V Σ^{T} U^{T} U Σ V^{T} = V Σ^{T} Σ V^{T} = V Σ^{2} V^{T} = V Σ^{2}

$X^{T} X V = V Σ^{2}$ is an eigenvalue problem ( $A x = λ x$ ), with $x = V$ , $A = X^{T} X$ , and $Σ^{2} = λ$ , thus a eigendecomposition is used to compute $V$

Likewise for $U$ :

X^{T} X X X^{T} U = (U Σ V^{T}) (U Σ V^{T})^{T} = U Σ V^{T} V Σ^{T} U^{T} = U Σ Σ^{T} U^{T} = U Σ^{2} U^{T} = U Σ^{2}

Another problem you can solve with eigendecomposition

Columns of $U$ and $V$ (not $V^{T}$ ) are called the left and right singular vectors of $X$ and are associated with a singular value in the same column of $Σ$

When $N ≫ d$ (much more samples than input dimensions), we may trim $Σ$ and $V$ of may zero column to get the thin SVD

X = U Σ_{d} V_{d}^{T} = U Σ_{+} V_{+}^{T} (as in the course notes)

$Σ_{d}$ is now $d \times d$ (as opposed to $d \times N$ ) (square diagonal matrix without zeros padding)
$V_{d}$ is now $d \times N$ (as opposed to $N \times N$ ) Furthermore, if $r = rank (X) \neq = d$ , we get the compact SVD which trims more zeros from the matrices (since $Σ$ only has $r$ non zero rows)

X = U_{r} Σ_{r} V_{r}^{T}

$U_{r}$ is now $d \times r$
$Σ_{r}$ is now $r \times r$
$V_{r}$ is now $r \times N$

You can also solve these matrices in python

u, s, vt = numpy.linalg.svd(X)
# s is a 1D-array of singular values, 
# and needs to be converted to a diagonal for Sigma
Sigma = np.diag(s)

📓 Daniel's Notes

Explorer

Singular Value Decomposition

Solving these matrixes

Graph View

Backlinks