Image credit: Made with Midjourney
4. Singular value decomposition#
In this chapter, we introduce the singular value decomposition (SVD), a fundamental concept in linear algebra. It is related to the spectral decomposition, but exists for any matrix including rectangular ones. The SVD has many applications in data science, including principle components analysis, low-rank approximation, pseudoinverses, and more. Here is a more detailed overview of the main sections of the chapter.
“Background: review of matrix rank and spectral decomposition” This section covers two key linear algebra concepts: matrix rank and the spectral theorem. The rank of a matrix is defined as the dimension of its column or row space. Properties of the rank are discussed, including the Rank-Nullity Theorem relating the rank and the dimension of the null space. The spectral theorem states that a real symmetric matrix has an orthonormal basis of eigenvectors with real eigenvalues, allowing for a spectral decomposition of the matrix. The section also characterizes positive semidefinite and positive definite matrices in terms of their eigenvalues.
“Approximating subspaces and the SVD” The section introduces the SVD as a matrix factorization that can be used to find the best low-dimensional approximating subspace to a set of data points. The section shows that this problem can be solved greedily by finding the best one-dimensional subspace, then the best one-dimensional subspace orthogonal to the first, and so on. It then formally defines the SVD and proves its existence for any matrix. It also discusses the relationship between the SVD and the spectral decomposition of a matrix. The section also discusses some important properties and relations satisfied by the SVD.
“Power iteration” The section discusses the power iteration method for computing the (partial) SVD of a matrix. The key lemma for power iteration states in essence that repeated multiplication of \(A^T A\) with a random vector will converge to the top right singular vector \(v_1\) of \(A\). The section also covers how to compute the corresponding singular value and left singular vector using the converged \(v_1\).
“Application: principal components analysis” This section discusses principal components analysis (PCA) as a dimensionality reduction technique. It explains that PCA finds linear combinations of features, called principal components, that capture the maximum variance in the data. The first principal component \(t_{i1} = \sum_{j=1}^p \phi_{j1} x_{ij}\) is obtained by solving \(\max \left\{ \frac{1}{n-1} \|X\phi_1\|^2 : \|\phi_1\|^2 = 1\right\}\), where \(X\) is the centered data matrix. Subsequent principal components are obtained by imposing additional constraints, in particular uncorrelatedness with previous components. The section establishes a connection between PCA and the SVD.
“Further applications of the SVD: low-rank approximations and ridge regression” This section first defines matrix norms, specifically the Frobenius norm and the induced 2-norm, and relates them to the singular values. The section then considers low-rank matrix approximations, highlighting the Eckart-Young theorem, which states that the best low-rank approximation of a matrix in both Frobenius and induced 2-norms is achieved by truncating the SVD. Finally, the section discusses ridge regression, a regularization technique that addresses multicollinearity in overdetermined systems by balancing data fitting with minimizing the solution’s norm.