Background: matrix rank; review of eigenvalues and eigenvectors

\(\newcommand{\horz}{\rule[.5ex]{2.5ex}{0.5pt}}\)

3.2. Background: matrix rank; review of eigenvalues and eigenvectors#

We will need two additional concepts from linear algebra, the rank of a matrix and the spectral theorem.

3.2.1. Rank of a matrix#

Recall the definition of the column space of a (not necessarily square) matrix.

DEFINITION (Column Space) Let \(A \in \mathbb{R}^{n\times m}\) be an \(n\times m\) matrix with columns \(\mathbf{a}_1,\ldots, \mathbf{a}_m \in \mathbb{R}^n\). The column space of \(A\), denoted \(\mathrm{col}(A)\), is the span of the columns of \(A\), that is, \(\mathrm{col}(A) = \mathrm{span}(\mathbf{a}_1,\ldots, \mathbf{a}_m)\). \(\natural\)

When thinking of \(A\) as a linear map, the column space is often referred to as the range or image.

When applied to a matrix \(A\), the dimension of the column space of \(A\) is called the column rank of \(A\). Similarly the row rank of \(A\) is the dimension of its row space.

DEFINITION (Row Space) The row space of \(A \in \mathbb{R}^{n \times m}\), denoted \(\mathrm{row}(A)\), is the span of the rows of \(A\) as vectors in \(\mathbb{R}^m\). \(\natural\)

Observe that the row space of \(A\) is equal to the column space of its transpose \(A^T\).

Definition of the rank As it turns out, these two notions of rank are the same. From now on, we refer to the row rank and column rank of \(A\) simply as the rank, which we denote by \(\mathrm{rk}(A)\).

THEOREM (Row Rank Equals Column Rank) For any \(A \in \mathbb{R}^{n \times m}\), the row rank of \(A\) equals the column rank of \(A\). Moreover, \(\mathrm{rk}(A) \leq \min\{n,m\}\). \(\sharp\)

Proof idea: Write \(A\) as a matrix factorization \(BC\) where the columns of \(B\) form a basis of \(\mathrm{col}(A)\). Then the rows of \(C\) necessarily form a spanning set of \(\mathrm{row}(A)\). So, because the number of columns of \(B\) and the number of rows of \(C\) match, we conclude that the row rank is less or equal than the column rank. Applying the same argument to the transpose gives the claim.

Recall the following observation from a previous chapter.

Observation D1: Any linear subspace \(U\) of \(\mathbb{R}^n\) has a basis.

Observation D2: The dimension of any linear subspace \(U\) of \(\mathbb{R}^n\) is smaller or equal than \(n\). More generally, if \(U\) and \(V\) are linear subspaces such that \(U \subseteq V\), then \(\mathrm{dim}(U) \leq \mathrm{dim}(V)\).

Observation D3: The dimension of \(\mathrm{span}(\mathbf{u}_1,\ldots,\mathbf{u}_m)\) is at most \(m\).

Proof: Assume that \(A\) has column rank \(r\). Then there exists a basis \(\mathbf{b}_1,\ldots, \mathbf{b}_r \in \mathbb{R}^n\) of \(\mathrm{col}(A)\) by Observation D1 above, and we know that \(r \leq n\) by Observation D2. That is, for each \(j\), letting \(\mathbf{a}_{j} = A_{\cdot,j}\) be the \(j\)-th column of \(A\) we can write

\[ \mathbf{a}_{j} = \sum_{\ell=1}^r \mathbf{b}_\ell c_{\ell j} \]

for some \(c_{\ell j}\)’s. Let \(B\) be the matrix whose columns are \(\mathbf{b}_1, \ldots, \mathbf{b}_r\) and let \(C\) be the matrix with entries \((C)_{\ell j} = c_{\ell j}\), \(\ell=1,\ldots,r\), \(j=1,\ldots,m\). Then the equation above can be re-written as the matrix factorization \(A = BC\). Indeed, by our previous observations about matrix-matrix products, the columns of \(A\) are linear combinations of the columns of \(B\) with coefficients taken from the corresponding column of \(C\).

The key point is the following: \(C\) necessarily has \(r\) rows. Let \(\boldsymbol{\alpha}_{i}^T = A_{i,\cdot}\) be the \(i\)-th row of \(A\) and \(\mathbf{c}_{\ell}^T = C_{\ell,\cdot}\) be the \(\ell\)-th row of \(C\). Using our alternative representation of matrix-matrix product in terms of rows, the decomposition is equivalent to

\[ \boldsymbol{\alpha}_{i}^T = \sum_{\ell=1}^r b_{i\ell} \mathbf{c}_\ell^T, \quad i=1,\ldots, n, \]

where \(b_{i\ell} = (\mathbf{b}_i)_\ell = (B)_{i\ell}\) is the \(i\)-th entry of the \(\ell\)-th column of \(B\). In words, the rows of \(A\) are linear combinations of the rows of \(C\) with coefficients taken from the corresponding row of \(B\). In particular, \(\mathcal{C} = \{\mathbf{c}_{j}:j=1,\ldots,r\}\) is a spanning list of the row space of \(A\), that is, each row of \(A\) can be written as a linear combination of \(\mathcal{C}\). Put differently, \(\mathrm{row}(A) \subseteq \mathrm{span}(\mathcal{C})\).

So the row rank of \(A\) is at most \(r\), the column rank of \(A\), by Observation D2.

Applying the same argument to \(A^T\), which switches the role of the columns and the rows, gives that the column rank of \(A\) (i.e. the row rank of \(A^T\)) is at most the row rank of \(A\) (i.e. the column rank of \(A^T\)). Hence the two notions of rank must be equal. (We also deduce \(r \leq m\) by Observation D2 again.) \(\square\)

EXAMPLE: (continued) We illustrate the proof of the theorem. Continuing a previous example, let \(A\) be the matrix with columns \(\mathbf{w}_1 = (1,0,1)\), \(\mathbf{w}_2 = (0,1,1)\), and \(\mathbf{w}_3 = (1,-1,0)\)

\[\begin{split} A = \begin{pmatrix} 1 & 0 & 1\\ 0 & 1 & -1\\ 1 & 1 & 0 \end{pmatrix}. \end{split}\]

We know that \(\mathbf{w}_1\) and \(\mathbf{w}_2\) form a basis of \(\mathrm{col}(A)\). We use them to construct our matrix \(B\)

\[\begin{split} B = \begin{pmatrix} 1 & 0\\ 0 & 1\\ 1 & 1 \end{pmatrix}. \end{split}\]

Recall that \(\mathbf{w}_3 = \mathbf{w}_1 - \mathbf{w}_2\). Hence the matrix \(C\) is

\[\begin{split} C = \begin{pmatrix} 1 & 0 & 1\\ 0 & 1 & -1 \end{pmatrix}. \end{split}\]

Indeed, column \(j\) of \(C\) gives the coefficients in the linear combination of the columns of \(B\) that produces column \(j\) of A. Check that \(A = B C\). \(\lhd\)

NUMERICAL CORNER: In Numpy, one can compute the rank of a matrix using the function numpy.linalg.matrix_rank. We will see later in the course how to compute it using the singular value decomposition (which is how LA.matrix_rank does it).

Let’s try the example above.

w1 = np.array([1., 0., 1.])
w2 = np.array([0., 1., 1.])
w3 = np.array([1., -1., 0.])
A = np.stack((w1, w2, w3), axis=-1)
print(A)
[[ 1.  0.  1.]
 [ 0.  1. -1.]
 [ 1.  1.  0.]]

We compute the rank of A.

LA.matrix_rank(A)
2

We take only the first two columns of A this time to form B.

B = np.stack((w1, w2),axis=-1)
print(B)
[[1. 0.]
 [0. 1.]
 [1. 1.]]
LA.matrix_rank(B)
2

Recall that, in Numpy, @ is used for matrix product.

C = np.array([[1., 0., 1.],[0., 1., -1.]])
print(C)
[[ 1.  0.  1.]
 [ 0.  1. -1.]]
LA.matrix_rank(C)
2
print(B @ C)
[[ 1.  0.  1.]
 [ 0.  1. -1.]
 [ 1.  1.  0.]]

\(\unlhd\)

EXAMPLE: Let \(A \in \mathbb{R}^{n \times k}\) and \(B \in \mathbb{R}^{k \times m}\). Then we claim that

\[ \mathrm{rk}(AB) \leq \mathrm{rk}(A). \]

Indeed, the columns of \(AB\) are linear combinations of the columns of \(A\). Hence \(\mathrm{col}(AB) \subseteq \mathrm{col}(A)\). The claim follows by Observation D2. \(\lhd\)

EXAMPLE: Let \(A \in \mathbb{R}^{n \times k}\) and \(B \in \mathbb{R}^{n \times m}\). Then we claim that

\[ \mathrm{rk}(A + B) \leq \mathrm{rk}(A) + \mathrm{rk}(B). \]

Indeed, the columns of \(A + B\) are linear combinations of the columns of \(A\) and of \(B\). Let \(\mathbf{u}_1,\ldots,\mathbf{u}_h\) be a basis of \(\mathrm{col}(A)\) and let \(\mathbf{v}_1,\ldots,\mathbf{v}_{\ell}\) be a basis of \(\mathrm{col}(B)\) by Observation D1. Then, we deduce

\[ \mathrm{col}(A + B) \subseteq \mathrm{span}(\mathbf{u}_1,\ldots,\mathbf{u}_h,\mathbf{v}_1,\ldots,\mathbf{v}_{\ell}). \]

By Observation D2, it follows that

\[ \mathrm{rk}(A + B) \leq \mathrm{dim}(\mathrm{span}(\mathbf{u}_1,\ldots,\mathbf{u}_h,\mathbf{v}_1,\ldots,\mathbf{v}_{\ell})). \]

By Observation D3, the right hand side is at most the length of the spanning list, i.e., \(h+\ell\). But by construction \(\mathrm{rk}(A) = h\) and \(\mathrm{rk}(B) = \ell\), so we are done. \(\lhd\)

EXAMPLE: (A Proof of the Rank-Nullity Theorem) Let \(A \in \mathbb{R}^{n \times m}\). Recall that the column space of \(A\), \(\mathrm{col}(A) \subseteq \mathbb{R}^n\), is the span of all its columns. We compute its othogonal complement. By definition, the columns of \(A\), which we denote by \(\mathbf{a}_1,\ldots,\mathbf{a}_m\), form a spanning list of \(\mathrm{col}(A)\). So \(\mathbf{u} \in \mathrm{col}(A)^\perp \subseteq \mathbb{R}^n\) if and only if

\[ \mathbf{a}_i^T\mathbf{u} = \langle \mathbf{u}, \mathbf{a}_i \rangle = 0, \quad \forall i=1,\ldots,m. \]

Indeed that then implies that for any \(\mathbf{v} \in \mathrm{col}(A)\), say \(\mathbf{v} = \beta_1 \mathbf{a}_1 + \cdots + \beta_m \mathbf{a}_m\) we have

\[ \left\langle \mathbf{u}, \sum_{i=1}^m \beta_i \mathbf{a}_i \right\rangle = \sum_{i=1}^m \beta_i \langle \mathbf{u}, \mathbf{a}_i \rangle = 0. \]

The \(m\) conditions above can be written in matrix form as

\[ A^T \mathbf{u} = \mathbf{0}. \]

That is, the orthogonal complement of the column space of \(A\) is the null space of \(A^T\)

\[ \mathrm{col}(A)^\perp = \mathrm{null}(A^T). \]

Applying the same argument to the column space of \(A^T\), it follows that

\[ \mathrm{col}(A^T)^\perp = \mathrm{null}(A), \]

where note that \(\mathrm{null}(A) \subseteq \mathbb{R}^m\). The four linear subspaces \(\mathrm{col}(A)\), \(\mathrm{col}(A^T)\), \(\mathrm{null}(A)\) and \(\mathrm{null}(A^T)\) are referred to as the fundamental subspaces of \(A\). We have shown

\[ \mathrm{col}(A) \oplus \mathrm{null}(A^T) = \mathbb{R}^n \quad \text{and} \quad \mathrm{col}(A^T) \oplus \mathrm{null}(A) = \mathbb{R}^m \]

By the Row Rank Equals Column Rank Theorem, \(\mathrm{dim}(\mathrm{col}(A)) = \mathrm{dim}(\mathrm{col}(A^T))\). Moreover, by our previous observation about the dimensions of direct sums, we have

\[ n = \mathrm{dim}(\mathrm{col}(A)) + \mathrm{dim}(\mathrm{null}(A^T)) = \mathrm{dim}(\mathrm{col}(A^T)) + \mathrm{dim}(\mathrm{null}(A^T)) \]

and

\[ m = \mathrm{dim}(\mathrm{col}(A^T)) + \mathrm{dim}(\mathrm{null}(A)) = \mathrm{dim}(\mathrm{col}(A)) + \mathrm{dim}(\mathrm{null}(A)). \]

So we deduce that

\[ \mathrm{dim}(\mathrm{null}(A)) = m - \mathrm{rk}(A) \]

and

\[ \mathrm{dim}(\mathrm{null}(A^T)) = n - \mathrm{rk}(A). \]

These formulas are referred to the Rank-Nullity Theorem. The dimension of the null space is sometimes called the nullity. \(\lhd\)

Outer products and rank-one matrices Let \(\mathbf{u} = (u_1,\ldots,u_n)^T \in \mathbb{R}^n\) and \(\mathbf{v} = (v_1,\ldots,v_m)^T \in \mathbf{R}^m\) be two column (nonzero) vectors. Their outer product is defined as the matrix

\[\begin{split} \mathbf{u} \mathbf{v}^T = \begin{pmatrix} u_1 v_1 & u_1 v_2 & \cdots & u_1 v_m\\ u_2 v_1 & u_2 v_2 & \cdots & u_2 v_m\\ \vdots & \vdots & \ddots & \vdots\\ u_n v_1 & u_n v_2 & \cdots & u_n v_m \end{pmatrix} = \begin{pmatrix} | & & | \\ v_{1} \mathbf{u} & \ldots & v_{m} \mathbf{u} \\ | & & | \end{pmatrix}. \end{split}\]

This is not to be confused with the inner product \(\mathbf{u}^T \mathbf{v}\), which requires \(n = m\) and produces a scalar.

If \(\mathbf{u}\) and \(\mathbf{v}\) are nonzero, the matrix \(\mathbf{u} \mathbf{v}^T\) has rank one. Indeed, its columns are all a multiple of the same vector \(\mathbf{u}\). So the column space spanned by the columns of \(\mathbf{u} \mathbf{v}^T\) is one-dimensional. Vice versa, any rank-one matrix can be written in this form by definition of the rank.

We have seen many different interpretations of matrix-matrix products. Here is yet another one. Let \(A = (a_{ij})_{i,j} \in \mathbb{R}^{n \times k}\) and \(B = (b_{ij})_{i,j} \in \mathbb{R}^{k \times m}\). Denote by \(\mathbf{a}_1,\ldots,\mathbf{a}_k\) the columns of \(A\) and denote by \(\mathbf{b}_1^T,\ldots,\mathbf{b}_k^T\) the rows of \(B\).

Then

\[\begin{align*} A B &= \begin{pmatrix} \sum_{j=1}^k a_{1j} b_{j1} & \sum_{j=1}^k a_{1j} b_{j2} & \cdots & \sum_{j=1}^k a_{1j} b_{jm}\\ \sum_{j=1}^k a_{2j} b_{j1} & \sum_{j=1}^k a_{2j} b_{j2} & \cdots & \sum_{j=1}^k a_{2j} b_{jm}\\ \vdots & \vdots & \ddots & \vdots\\ \sum_{j=1}^k a_{nj} b_{j1} & \sum_{j=1}^k a_{nj} b_{j2} & \cdots & \sum_{j=1}^k a_{nj} b_{jm} \end{pmatrix}\\ &= \sum_{j=1}^k \begin{pmatrix} a_{1j} b_{j1} & a_{1j} b_{j2} & \cdots & a_{1j} b_{jm}\\ a_{2j} b_{j1} & a_{2j} b_{j2} & \cdots & a_{2j} b_{jm}\\ \vdots & \vdots & \ddots & \vdots\\ a_{nj} b_{j1} & a_{nj} b_{j2} & \cdots & a_{nj} b_{jm} \end{pmatrix}\\ &= \sum_{j=1}^k \mathbf{a}_j \mathbf{b}_j^T. \end{align*}\]

In words, the matrix product \(AB\) can be interpreted as a sum of \(k\) rank-one matrices, each of which is the outer product of a column of \(A\) with the corresponding row of \(B\).

Because the rank of a sum is at most the sum of the ranks (as shown in a previous example), it follows that the rank of \(AB\) is at most \(k\). This is consistent with the fact that the rank of a product is at most the minimum of the ranks (also shown in a previous example).

NUMERICAL CORNER: In Numpy, the outer product is computed using numpy.outer.

u = np.array([0., 2., -1.])
v = np.array([3., -2.])
Z = np.outer(u, v)
print(Z)
[[ 0. -0.]
 [ 6. -4.]
 [-3.  2.]]
print(LA.matrix_rank(Z))
1

\(\unlhd\)

3.2.2. Eigenvalues and eigenvectors#

Recall the concepts of eigenvalues and eigenvectors. We work on \(\mathbb{R}^d\).

DEFINITION (Eigenvalues and Eigenvectors) Let \(A \in \mathbb{R}^{d \times d}\) be a square matrix. Then \(\lambda \in \mathbb{R}\) is an eigenvalue of \(A\) if there exists a nonzero vector \(\mathbf{x} \neq \mathbf{0}\) such that

\[ A \mathbf{x} = \lambda \mathbf{x}. \]

The vector \(\mathbf{x}\) is referred to as an eigenvector. \(\natural\)

As the next example shows, not every matrix has a (real) eigenvalue.

EXAMPLE: (No Real Eigenvalues) Set \(d = 2\) and let

\[\begin{split} A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}. \end{split}\]

For \(\lambda\) to be an eigenvalue, there must be an nonzero eigenvector \(\mathbf{x} = (x_1, x_2)^T\) such that

\[ A \mathbf{x} = \lambda \mathbf{x} \]

or put differently

\[ - x_2 = \lambda x_1 \quad\text{and}\quad x_1 = \lambda x_2. \]

Replacing these equations into each other, it must be that

\[ - x_2 = \lambda^2 x_2 \quad\text{and}\quad x_1 = - \lambda^2 x_1. \]

Because \(x_1, x_2\) cannot both be \(0\), \(\lambda\) must satisfy the equation

\[ \lambda^2 = -1 \]

for which there is no real solution. \(\lhd\)

In general, \(A \in \mathbb{R}^{d \times d}\) has at most \(d\) distinct eigenvalues. The proof of the next lemma is in an optional section.

LEMMA (Number of Eigenvalues) Let \(A \in \mathbb{R}^{d \times d}\) and let \(\lambda_1, \ldots, \lambda_m\) be distinct eigenvalues of \(A\) with corresponding eigenvectors \(\mathbf{x}_1, \ldots, \mathbf{x}_m\). Then \(\mathbf{x}_1, \ldots, \mathbf{x}_m\) are linearly independent. In particular, \(m \leq d\). \(\flat\)

EXAMPLE: (Diagonal (and Similar) Matrices) We use the notation \(\mathrm{diag}(\lambda_1,\ldots,\lambda_d)\) for the diagonal matrix with diagonal entries \(\lambda_1,\ldots,\lambda_d\). The upper bound in the Number of Eigenvalues Lemma can be achieved, for instance, by diagonal matrices with distinct diagonal entries \(A = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)\). Each standard basis vector \(\mathbf{e}_i\) is then an eigenvector

\[ A \mathbf{e}_i = \lambda_i \mathbf{e}_i. \]

More generally, let \(A\) be similar to a matrix \(D = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)\) with distinct diagonal entries, that is, there exists a nonsingular matrix \(P\) such that

\[ A = P D P^{-1}. \]

Let \(\mathbf{p}_1, \ldots, \mathbf{p}_d\) be the columns of \(P\). Note that, because the columns of \(P\) form a basis of \(\mathbb{R}^d\), the entries of the vector \(\mathbf{c} = P^{-1} \mathbf{x}\) are the coefficients of the unique linear combination of the \(\mathbf{p}_i\)’s equal to \(\mathbf{x}\). Indeed, \(P \mathbf{c} = \mathbf{x}\). Hence, \(A \mathbf{x}\) is can be thought of as: (1) expressing \(\mathbf{x}\) in the basis \(\mathbf{p}_1, \ldots, \mathbf{p}_d\), (2) and scaling the \(\mathbf{p}_i\)’s by the corresponding \(\lambda_i\)’s. In particular, the \(\mathbf{p}_i\)’s are eigenvectors of \(A\) since, by the above, \(P^{-1} \mathbf{p}_i = \mathbf{e}_i\) and so

\[ A \mathbf{p}_i = P D P^{-1} \mathbf{p}_i = P D \mathbf{e}_i = P (\lambda_i \mathbf{e}_i) = \lambda_i \mathbf{p}_i. \]

\(\lhd\)

NUMERICAL CORNER: In Numpy, the eigenvalues and eigenvectors of a matrix can be computed using numpy.linalg.eig.

A = np.array([[2.5, -0.5], [-0.5, 2.5]])
w, v = LA.eig(A)
print(w)
print(v)
[3. 2.]
[[ 0.70710678  0.70710678]
 [-0.70710678  0.70710678]]

Above, w are the eigenvalues in an array, whereas the columns of v are the corresponding eigenvectors.

\(\unlhd\)

Some matrix algebra We will need a few useful observations about matrices.

First recall that a matrix \(D \in \mathbb{R}^{k \times r}\) is diagonal if its non-diagonal entries are zero. That is, \(i \neq j\) implies that \(D_{ij} =0\). Note that a diagonal matrix is not necessarily square and that the diagonal elements are allowed to be zero.

Multiplying a matrix by a diagonal one has a very specific effect. Let \(A \in \mathbb{R}^{n \times k}\) and \(B \in \mathbb{R}^{r \times m}\). We focus here on the case \(k \geq r\). The matrix product \(A D\) produces a matrix whose columns are linear combinations of the columns of \(A\) where the coefficients are taken from the corresponding column of \(D\). But the columns of \(D\) have at most one nonzero elements, the diagonal one. So the columns of \(AD\) are in fact multiples of the columns of \(A\)

\[\begin{split} AD = \begin{pmatrix} | & & | \\ d_{11} \mathbf{a}_1 & \ldots & d_{rr} \mathbf{a}_r \\ | & & | \end{pmatrix} \end{split}\]

where \(\mathbf{a}_1,\ldots,\mathbf{a}_k\) are the columns of \(A\) and \(d_{11},\ldots,d_{rr}\) are the diagonal elements of \(D\).

Similarly, the rows of \(D B\) are linear combinations of the rows of \(B\) where the coefficients are taken from the corresponding row of \(D\). The rows of \(D\) have at most one nonzero elements, the diagonal one. In the case, \(k \geq r\), rows \(r+1,\ldots,k\) necessarily have only zero entries since there is no diagonal entry there. Hence the first \(r\) rows of \(DB\) are multiples of the rows of \(B\) and the next \(k-r\) are \(\mathbf{0}\)

\[\begin{split} DB = \begin{pmatrix} \horz & d_{11} \mathbf{b}_1^T & \horz \\ & \vdots &\\ \horz & d_{rr} \mathbf{b}_r^T & \horz\\ \horz & \mathbf{0} & \horz\\ & \vdots &\\ \horz & \mathbf{0} & \horz \end{pmatrix} \end{split}\]

where \(\mathbf{b}_1^T,\ldots,\mathbf{b}_r^T\) are the rows of \(B\).

EXAMPLE: The following special case will be useful later in this chapter. Suppose \(D, F \in \mathbb{R}^{n \times n}\) are both square diagonal matrices. Then \(D F\) is the matrix whose diagonal elements are \(d_{11} f_{11}, \ldots,d_{nn} f_{nn}\). \(\lhd\)

Spectral theorem When \(A\) is symmetric, that is, \(a_{ij} = a_{ji}\) for all \(i,j\), a remarkable result is that there exists an orthonormal basis of \(\mathbb{R}^d\) made of eigenvectors of \(A\). We will prove this result in a subsequent chapter.

THINK-PAIR-SHARE: Let \(A \in \mathbb{R}^{d \times d}\) be symmetric. Show that for any \(\mathbf{u}\) and \(\mathbf{v}\) in \(\mathbb{R}^d\) it holds that

\[ \langle \mathbf{u}, A \mathbf{v} \rangle = \langle A \mathbf{u}, \mathbf{v} \rangle. \]

\(\ddagger\)

Before stating the result formally, we make a few observations. Let \(A \in \mathbb{R}^{d \times d}\) be symmetric. Suppose that \(\mathbf{v}_i\) and \(\mathbf{v}_j\) are eigenvectors corresponding respectively to distinct eigenvalues \(\lambda_i\) and \(\lambda_j\). Then the following quantity can be written in two ways

\[ \langle \mathbf{v}_i, A \mathbf{v}_j \rangle = \langle \mathbf{v}_i, \lambda_j \mathbf{v}_j \rangle = \lambda_j \langle \mathbf{v}_i, \mathbf{v}_j \rangle \]

and, by symmetry of \(A\),

\[ \langle \mathbf{v}_i, A \mathbf{v}_j \rangle = \langle A \mathbf{v}_i, \mathbf{v}_j \rangle = \langle \lambda_i \mathbf{v}_i, \mathbf{v}_j \rangle = \lambda_i \langle \mathbf{v}_i, \mathbf{v}_j \rangle. \]

Subtracting the two,

\[ (\lambda_j - \lambda_i) \langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0 \]

and using that \(\lambda_i \neq \lambda_j\)

\[ \langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0. \]

That is, \(\mathbf{v}_i\) and \(\mathbf{v}_j\) are necessarily orthogonal.

We proved:

LEMMA Let \(A \in \mathbb{R}^{d \times d}\) be symmetric. Suppose that \(\mathbf{v}_i\) and \(\mathbf{v}_j\) are eigenvectors corresponding to distinct eigenvalues. Then \(\mathbf{v}_i\) and \(\mathbf{v}_j\) are orthogonal. \(\flat\)

This lemma gives a different proof – in the symmetric case – that the number of eigenvalues is at most \(d\) since a list of pairwise orthogonal vectors are linearly independent.

In fact:

THEOREM (Spectral Theorem) Let \(A \in \mathbb{R}^{d \times d}\) be a symmetric matrix, that is, \(A^T = A\). Then \(A\) has \(d\) orthonormal eigenvectors \(\mathbf{q}_1, \ldots, \mathbf{q}_d\) with corresponding (not necessarily distinct) real eigenvalues \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d\). Moreover, \(A\) can be written as the matrix factorization

\[ A = Q \Lambda Q^T = \sum_{i=1}^d \lambda_i \mathbf{q}_i \mathbf{q}_i^T \]

where \(Q\) has columns \(\mathbf{q}_1, \ldots, \mathbf{q}_d\) and \(\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)\). \(\sharp\)

We refer to this factorization as a spectral decomposition of \(A\).

Note that this decomposition indeed produces the eigenvectors of \(A\). For any \(j\), we have

\[ A \mathbf{q}_j = \sum_{i=1}^d \lambda_i \mathbf{q}_i \mathbf{q}_i^T \mathbf{q}_j = \lambda_j \mathbf{q}_j, \]

where we used that, by orthonormality, \(\mathbf{q}_i^T \mathbf{q}_j = 0\) if \(i \neq j\) and \(\mathbf{q}_i^T \mathbf{q}_j = 1\) if \(i = j\). The equation above says precisely that \(\mathbf{q}_j\) is an eigenvector of \(A\) with corresponding eigenvalue \(\lambda_j\). Since we have found \(d\) eigenvalues (possibly with repetition), we have found all of them.

Let \(\lambda_1, \lambda_2, \ldots, \lambda_d\) be the eigenvalues whose existence is guaranteed by the Spectral Theorem. We first argue that there is no other eigenvalue. Indeed, assume \(\mu \neq \lambda_1, \lambda_2, \ldots, \lambda_d\) is an eigenvalue with corresponding eigenvector \(\mathbf{p}\). We have seen that \(\mathbf{p}\) is orthogonal to the eigenvectors \(\mathbf{q}_1, \ldots, \mathbf{q}_d\). Since the latter list forms an orthonormal basis of \(\mathbb{R}^d\), this cannot be the case and we have a contradiction.

Some of the eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_d\) can be repeated however, that is, there can be \(i, j\) such that \(\lambda_{i} = \lambda_{j}\). For instance, if \(A = I_{d \times d}\) is the identity matrix, then the eigenvalues are \(\lambda_i = 1\) for all \(i \in [d]\).

For a fixed eigenvalue \(\lambda\) of \(A\), the set of eigenvectors with eigenvalue \(\lambda\) satisfy

\[ A \mathbf{v} = \lambda \mathbf{v} = \lambda I_{d \times d} \mathbf{v} \]

or, rearranging,

\[ (A - \lambda I_{d \times d})\mathbf{v} = \mathbf{0}. \]

Put differently, it is the set \(\mathrm{null}(A - \lambda I_{d \times d})\).

The eigenvectors in the Spectral Theorem corresponding the same eigenvalue \(\lambda\) can be replaced by any orthonormal basis of the subspace \(\mathrm{null}(A - \lambda I_{d \times d})\). But, beyond this freedom, we have the following characterization: let \(\mu_1,\ldots,\mu_f\) be the unique values in \(\lambda_1, \lambda_2, \ldots, \lambda_d\)

\[ \mathbb{R}^d = \mathrm{null}(A - \mu_1 I_{d \times d}) \oplus \mathrm{null}(A - \mu_2 I_{d \times d}) \oplus \cdots \oplus \mathrm{null}(A - \mu_f I_{d \times d}), \]

where we used the fact that for any \(\mathbf{u} \in \mathrm{null}(A - \mu_i I_{d \times d})\) and \(\mathbf{v} \in \mathrm{null}(A - \mu_j I_{d \times d})\) with \(i \neq j\), we have that \(\mathbf{u}\) is orthogonal to \(\mathbf{v}\).

We have shown in particular that the sequence of eigenvalues in the Spectral Theorem is unique (counting repeats).

The case of positive semidefinite matrices The eigenvalues of a symmetric matrix – while real – may be negative. There is however an important special case where the eigenvalues are nonnegative, positive semidefinite matrices.

THEOREM (Characterization of Positive Semidefiniteness) Let \(A \in \mathbb{R}^{d \times d}\) be a symmetric matrix and let \(A = Q \Lambda Q^T\) be a spectral decomposition of \(A\) with \(\Lambda = \mathrm{diag}(\lambda_1, \ldots, \lambda_d)\). Then \(A \succeq 0\) if and only if its eigenvalues \(\lambda_1, \ldots, \lambda_d\) are nonnegative. \(\sharp\)

Proof: Assume \(A \succeq 0\). Let \(\mathbf{q}_i\) be an eigenvector of \(A\) with corresponding eigenvalue \(\lambda_i\). Then

\[ \langle \mathbf{q}_i, A \mathbf{q}_i \rangle = \langle \mathbf{q}_i, \lambda_i \mathbf{q}_i \rangle = \lambda_i \]

which must be nonnegative by definition of a positive semidefinite matrix.

In the other direction, assume \(\lambda_1, \ldots, \lambda_d \geq 0\). Then, by the spectral decomposition in outer-product form

\[ \langle \mathbf{x}, A \mathbf{x} \rangle = \mathbf{x}^T \left(\sum_{i=1}^d \lambda_i \mathbf{q}_i \mathbf{q}_i^T\right) \mathbf{x} = \sum_{i=1}^d \lambda_i \mathbf{x}^T\mathbf{q}_i \mathbf{q}_i^T\mathbf{x} = \sum_{i=1}^d \lambda_i \langle \mathbf{q}_i, \mathbf{x}\rangle^2 \]

which is necessarily nonnegative. \(\square\)

Similarly, a symmetric matrix is positive definite if and only if all its eigenvalues are strictly positive. The proof is essentially the same.

THINK-PAIR-SHARE: Prove this last claim by modifying the proof above. \(\ddagger\)

NUMERICAL CORNER: Hence, we can check whether a matrix is positive semidefinite by computing its eigenvalues using numpy.linalg.eig.

A = np.array([[1, -1], [-1, 1]])
w, v = LA.eig(A)
print(w)
[2. 0.]
B = np.array([[1, -2], [-2, 1]])
z, u = LA.eig(B)
print(z)
[ 3. -1.]

MULTIPLE CHOICE: Which one(s) of these matrices is positive semidefinite?

\[\begin{split} A = \begin{pmatrix} 1 & -1\\ -1 & 1 \end{pmatrix} \qquad B = \begin{pmatrix} 1 & -2\\ -2 & 1 \end{pmatrix} \end{split}\]

a) Both

b) \(A\)

c) \(B\)

d) Neither

\(\ddagger\)

\(\unlhd\)