Image credit: Made with Midjourney
2. Least squares: geometric, algebraic, and numerical aspects#
In this chapter, we introduce the linear least squares problem, an optimization problem that aims to find the best-fitting linear model for a given set of data points. It is closely connected to regression analysis, one of the most fundamental and widely used statistical techniques. We develop the mathematical concepts at its foundation, in particular, emphasizing the various facets of the problem: the algebra, the geometry, and the numerics. Here is a more detailed overview of the main sections of the chapter.
“Background: review of vector spaces and matrix inverses” This section reviews fundamental concepts in linear algebra that are essential for data science. It begins by defining linear subspaces and related notions like span, linear independence, and bases. The dimension theorem is stated, establishing that the dimension of a subspace is well-defined. Finally, the section discusses inverses of square matrices, proving that a square matrix is invertible if and only if it is nonsingular, and that the solution to the equation \(Ax=b\) is unique and given by \(x=A^{-1}b\) when \(A\) is nonsingular.
“The geometry of least squares” This section discusses the geometry of least squares. The important concept of orthogonality is reviewed, along with key results like Pythagoras’ theorem, the Cauchy-Schwarz inequality, and the Gram-Schmidt Theorem regarding the existence of orthonormal bases. It then introduces the concept of orthogonal projection, which is the process of finding the closest vector in a linear subspace to a given vector. Finally, the section applies these concepts to solve overdetermined systems using the least squares method, showing that the solution satisfies the normal equations.
“QR decomposition and Householder transformations” In this section, the Gram-Schmidt algorithm is introduced as a way to obtain an orthonormal basis from a set of linearly independent vectors, and its matrix factorization perspective is presented as the QR decomposition. The section then shows how the QR decomposition can be used to solve linear least squares problems. While the Gram-Schmidt algorithm is geometrically intuitive, it can have numerical issues. Householder reflections, on the other hand, provide a more numerically stable method for computing QR decompositions. These reflections are orthogonal transformations that can be used to introduce zeros below the diagonal of a matrix, ultimately leading to its triangularization.
“Application to regression analysis” This section applies the least squares method to regression analysis. It begins by using the method to find the coefficients that best fit a linear model to a given dataset. It then extends this approach to polynomial regression, showing how to fit higher-degree polynomials by adding columns to the data matrix. The section also discusses the concept of overfitting, where a model with too many parameters may fit the noise in the data rather than the underlying relationship.