1.5. More advanced material: clustering in high dimension#
In this section, we give a more rigorous analysis of the high-dimensional phenomena discussed in the previous section.
1.5.1. A more rigorous analysis of clustering in high dimension#
In this optional section, we give one formal statement of the phenomenon described in the previous subsection.
THEOREM Let \(\mathbf{X}_1, \mathbf{X}_2, \mathbf{Y}_1\) be independent spherical \(d\)-dimensional Gaussians with mean \(-w_d \mathbf{e}_1\) and variance \(1\), where \(\{w_d\}\) is a monotone sequence in \(d\). Let \(\mathbf{Y}_2\) be an indepedent spherical \(d\)-dimensional Gaussian with mean \(w_d \mathbf{e}_1\) and variance \(1\). Then, letting \(\Delta_d = \|\mathbf{Y}_1 - \mathbf{Y}_2\|^2 - \|\mathbf{X}_1 - \mathbf{X}_2\|^2\), as \(d \to +\infty\)
where \(w_d \ll d^{1/4}\) means \(w_d/d^{1/4} \to 0\). \(\sharp\)
The ratio is the statement is referred to as the signal-to-noise ratio.
To prove the claim, we will need the following property.
LEMMA Let \(W\) be a real-valued random variable symmetric about zero, that is, such that \(W\) and \(-W\) are identically distributed. Then for all odd \(k\), \(\mathbb{E}[W^k] = 0\). \(\flat\)
Proof: By the symmetry,
The only way to satisfy this equation is to have \(\mathbb{E}[W^k] = 0\). \(\square\)
Returning to the proof of the claim:
Proof idea: (Theorem) The only coordinate contributing to \(\mathbb{E}[\Delta_d]\) is the first one by linearity of expectation, while all coordinates contribute to \(\mathrm{Var}[\Delta_d]\). More specifically, a calculation shows that the former is \(c_0 w^2\) while the latter is \(c_1 w^2 + c_2 d\), where \(c_0, c_1, c_2\) are constants.
Proof: (Claim) Write \(w := w_d\) and \(\Delta := \Delta_d\) to simplify the notation. There are two steps:
(1) Expectation of \(\Delta\): By defintion, the random variables \(X_{1,i} - X_{2,i}\), \(i = 1,\ldots, d\), and \(Y_{1,i} - Y_{2,i}\), \(i = 2,\ldots, d\), are identically distributed. So, by linearity of expectation,
Further, we can write \(Y_{1,1} - Y_{1,2} \sim (Z_1 -w) - (Z_2+w)\) where \(Z_1, Z_2 \sim N(0,1)\) are independent, where here \(\sim\) indicates equality in distribution. Hence, we have
Similarly, \(X_{1,1} - X_{1,2} \sim Z_1 - Z_2\) so \(\mathbb{E}[(X_{1,1} - X_{2,1})^2] = \mathbb{E}[(Z_1 - Z_2)^2]\). Since \(\mathbb{E}[Z_1 - Z_2] = 0\), we finally get \(\mathbb{E}[\Delta] = 4 w^2\).
(2) Variance of \(\Delta\): Using the observations from (1) and the independence of the coordinates we get
By the Variance of a Sum,
Because \(Z_1\) and \(Z_2\) are independent, \(\mathrm{Var}[Z_1 - Z_2] = \mathrm{Var}[Z_1] + \mathrm{Var}[Z_2] = 2\). Moreover, the random variable \((Z_1 - Z_2)\) is symmetric, so
Finally,
Putting everything together:
Taking \(d \to +\infty\) gives the claim. \(\square\)