More advanced material

\(\newcommand{\bSigma}{\boldsymbol{\Sigma}}\) \(\newcommand{\bmu}{\boldsymbol{\mu}}\) \(\newcommand{\blambda}{\boldsymbol{\lambda}}\)

6.6. More advanced material#

In this section, we cover some more advanced material.

6.6.1. Additional proofs#

Proof of Second-Order Sufficient Conditions We prove the Second-Order Sufficient Conditions.

We will need a simple lemma.

LEMMA Let \(A = (a_{i,j})_{i,j}\) and \(B = (b_{i,j})_{i,j}\) be matrices in \(\mathbb{R}^{n \times m}\). For any unit vectors \(\mathbf{u} \in \mathbb{R}^n\) and \(\mathbf{v} \in \mathbb{R}^m\)

\[ \left| \mathbf{u}^T A \,\mathbf{v} - \mathbf{u}^T B \,\mathbf{v} \right| \leq \|A - B\|_F. \]

\(\flat\)

Proof: By Cauchy-Schwarz,

\[\begin{align*} \left| \mathbf{u}^T A \,\mathbf{v} - \mathbf{u}^T B \,\mathbf{v} \right| &= \left| \sum_{i=1}^n \sum_{j=1}^m u_i v_j (a_{i,j} - b_{i,j}) \right|\\ &\leq \sqrt{\sum_{i=1}^n \sum_{j=1}^m u_i^2 v_j^2} \sqrt{\sum_{i=1}^n \sum_{j=1}^m (a_{i,j} - b_{i,j})^2}\\ &= \|A - B\|_F, \end{align*}\]

where we used that \(\mathbf{u}\) and \(\mathbf{v}\) have unit norm on the last line. \(\square\)

Proof: By Taylor’s Theorem, for all unit vectors \(\mathbf{v} \in \mathbb{R}^d\) and \(\alpha \in \mathbb{R}\), there is \(\xi_{\alpha} \in (0,1)\) such that

\[\begin{align*} f(\mathbf{x}_0 + \alpha \mathbf{v}) &= f(\mathbf{x}_0) + \nabla f(\mathbf{x}_0)^T(\alpha \mathbf{v}) + \frac{1}{2}(\alpha \mathbf{v})^T \mathbf{H}_f(\mathbf{x}_0 + \xi_\alpha \alpha \mathbf{v}) (\alpha \mathbf{v})\\ &= f(\mathbf{x}_0) + \frac{1}{2}\alpha^2 \mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0 + \xi_\alpha \alpha \mathbf{v})\,\mathbf{v}. \end{align*}\]

where we used that \(\nabla f(\mathbf{x}_0) = \mathbf{0}\). The second term on the last line is \(0\) at \(\mathbf{v} = \mathbf{0}\). Our goal is to show that it is strictly positive (except at \(\mathbf{0}\)) in a neighborhood of \(\mathbf{0}\).

The set \(\mathbb{S}^{d-1}\) of unit vectors in \(\mathbb{R}^d\) is closed and bounded. The expression \(\mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0) \,\mathbf{v}\), viewed as a function of \(\mathbf{x}\) is continuous since it is a polynomial. Hence, by the Extreme Value Theorem, it attains its minimum on \(\mathbb{S}^{d-1}\). By our assumption that \(\mathbf{H}_f(\mathbf{x}_0)\) is positive definite, that minimum must be strictly positive, say \(\mu > 0\).

By the previous lemma (ignoring the absolute value),

\[ \mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0) \,\mathbf{v} - \mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0 + \mathbf{w})\,\mathbf{v} \leq \|\mathbf{H}_f(\mathbf{x}_0) - \mathbf{H}_f(\mathbf{x}_0 + \mathbf{w})\|_F. \]

The Frobenius norm above is continuous in \(\mathbf{w}\) as a composition of continuous functions. Moreover, we of course have at \(\mathbf{w} = \mathbf{0}\) that this Frobenius norm is \(0\). Hence, by definition of continuity, for any \(\epsilon > 0\) – say \(\epsilon := \mu/2\) – there is \(\delta > 0\) such that \(\mathbf{w} \in B_{\delta}(\mathbf{0})\) implies \(\|\mathbf{H}_f(\mathbf{x}_0) - \mathbf{H}_f(\mathbf{x}_0 + \mathbf{w})\|_F < \epsilon = \mu/2\).

Since \(\mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0) \,\mathbf{v} > \mu\), the inequality in the previous display implies that

\[ \mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0 + \mathbf{w})\,\mathbf{v} \geq \mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0) \,\mathbf{v} - \|\mathbf{H}_f(\mathbf{x}_0) - \mathbf{H}_f(\mathbf{x}_0 + \mathbf{w})\|_F > \frac{\mu}{2}. \]

This holds for any unit vector \(\mathbf{v}\) and any \(\mathbf{w} \in B_{\delta}(\mathbf{0})\).

Going back to our Taylor expansion, for \(\alpha > 0\) small enough (not depending on \(\mathbf{v}\)), it holds that \(\mathbf{w} = \xi_\alpha \alpha \mathbf{v} \in B_{\delta}(\mathbf{0})\) so that we get from the previous inequality

\[\begin{align*} f(\mathbf{x}_0 + \alpha \mathbf{v}) &= f(\mathbf{x}_0) + \frac{1}{2}\alpha^2 \mathbf{v}^T \mathbf{H}_f(\mathbf{x}_0 + \xi_\alpha \alpha \mathbf{v})\,\mathbf{v}\\ &> f(\mathbf{x}_0) + \frac{1}{4} \alpha^2 \mu \\ &> f(\mathbf{x}_0). \end{align*}\]

Therefore \(\mathbf{x}_0\) is a strict local minimizer. \(\square\)

Proofs of Lagrange Multipliers Conditions We first prove the Lagrange Multipliers: First-Order Necessary Conditions. We follow the excellent textbook [Ber, Section 4.1].

Proof: (Lagrange Multipliers: First-Order Necessary Conditions) We reduce the problem to an unconstrained optimization problem by penalizing the constraint in the objective function. We also add a regularization term to ensure that \(\mathbf{x}^*\) is the unique local minimizer in a neighborhood. Specifically, for each non-negative integer \(k\), consider the objective function

\[ F^k(\mathbf{x}) = f(\mathbf{x}) + \frac{k}{2} \|\mathbf{h}(\mathbf{x})\|^2 + \frac{\alpha}{2} \|\mathbf{x} - \mathbf{x}^*\|^2 \]

for some positive constant \(\alpha > 0\). Note that as \(k\) gets larger, the penalty becomes more significant and, therefore, enforcing the constraint becomes more desirable. The proof proceeds in several steps.

We first consider a version minimizing \(F^k\) constrained to lie in a neighborhood of \(\mathbf{x}^*\). Because \(\mathbf{x}^*\) is a local minimizer of \(f\) subject to \(\mathbf{h}(\mathbf{x}) = \mathbf{0}\), there is \(\delta > 0\) such that \(f(\mathbf{x}^*)\leq f(\mathbf{x})\) for all feasible \(\mathbf{x}\) within

\[ \mathscr{X} = B_{\delta}(\mathbf{x}^*) = \{\mathbf{x}:\|\mathbf{x} - \mathbf{x}^*\| \leq \delta\}. \]

LEMMA (Step I: Solving the Penalized Problem in a Neighborhood of \(\mathbf{x}^*\)) For \(k \geq 1\), let \(\mathbf{x}^k\) be a global minimizer of the minimization problem

\[ \min_{\mathbf{x} \in \mathscr{X}} F^k(\mathbf{x}). \]

a) The sequence \(\{\mathbf{x}^k\}_{k=1}^{+\infty}\) converges to \(\mathbf{x}^*\).

b) For \(k\) sufficiently large, \(\mathbf{x}^k\) is a local minimizer of the objective function \(F^k\) without any constraint.

\(\flat\)

Proof: The set \(\mathscr{X}\) is closed and bounded, and \(F^k\) is continuous. Hence, the sequence \(\{\mathbf{x}^k\}_{k=1}^{+\infty}\) is well-defined by the Extreme Value Theorem. Let \(\bar{\mathbf{x}}\) be any limit point of \(\{\mathbf{x}^k\}_{k=1}^{+\infty}\). We show that \(\bar{\mathbf{x}} = \mathbf{x}^*\). That will imply a). It also implied b) since hen, for large enough \(k\), \(\mathbf{x}^k\) must be an interior point of \(\mathscr{X}\).

Let \(-\infty < m \leq M < + \infty\) be the smallest and largest values of \(f\) on \(\mathscr{X}\), which exist by the Extreme Value Theorem. Then, for all \(k\), by definition of \(x^k\) and the fact that \(\mathbf{x}^*\) is feasible

\[\begin{align*} (*) \qquad f(\mathbf{x}^k) &+ \frac{k}{2} \|\mathbf{h}(\mathbf{x}^k)\|^2 + \frac{\alpha}{2} \|\mathbf{x}^k - \mathbf{x}^*\|^2\\ &\leq f(\mathbf{x}^*) + \frac{k}{2} \|\mathbf{h}(\mathbf{x}^*)\|^2 + \frac{\alpha}{2} \|\mathbf{x}^* - \mathbf{x}^*\|^2 = f(\mathbf{x}^*). \end{align*}\]

Rearraning gives

\[ \|\mathbf{h}(\mathbf{x}^k)\|^2 \leq \frac{2}{k} \left[f(\mathbf{x}^*) - f(\mathbf{x}^k) - \frac{\alpha}{2} \|\mathbf{x}^k - \mathbf{x}^*\|^2\right] \leq \frac{2}{k} \left[ f(\mathbf{x}^*) - m\right]. \]

So \(\lim_{k \to \infty} \|\mathbf{h}(\mathbf{x}^k)\|^2 = 0\), which, by the continuity of \(\mathbf{h}\) and of the Frobenius norm, implies that \(\|\mathbf{h}(\bar{\mathbf{x}})\|^2 = 0\), that is, \(\mathbf{h}(\bar{\mathbf{x}}) = \mathbf{0}\). In other words, any limit point \(\bar{\mathbf{x}}\) is feasible.

In addition to being feasible, \(\bar{\mathbf{x}} \in \mathscr{X}\) because that constraint set is closed. So, by the choice of \(\mathscr{X}\), we have \(f(\mathbf{x}^*) \leq f(\bar{\mathbf{x}})\). Furthermore, by \((*)\), we get

\[ f(\mathbf{x}^*) \leq f(\bar{\mathbf{x}}) + \frac{\alpha}{2} \|\bar{\mathbf{x}} - \mathbf{x}^*\|^2 \leq f(\mathbf{x}^*). \]

This is only possible if \(\|\bar{\mathbf{x}} - \mathbf{x}^*\|^2 = 0\) or, put differently, \(\bar{\mathbf{x}} = \mathbf{x}^*\). That proves the lemma. \(\square\)

LEMMA (Step II: Applying the Unconstrained Necessary Conditions) Let \(\{\mathbf{x}^k\}_{k=1}^{+\infty}\) be the sequence in the previous lemma.

a) For sufficiently large \(k\), the vectors \(\nabla h_i(\mathbf{x}^k)\), \(i=1,\ldots,\ell\), are linearly independent.

b) Let \(\mathbf{J}_{\mathbf{h}}(\mathbf{x})\) be the Jacobian matrix of \(\mathbf{h}\), that is, the matrix whose rows are the (row) vectors \(\nabla h_i(\mathbf{x})^T\), \(i=1,\ldots,\ell\). Then

\[ \nabla f(\mathbf{x}^*) + \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \blambda^* = \mathbf{0} \]

where

\[ \blambda^* = - (\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \, \mathbf{J}_{\mathbf{h}}^T(\mathbf{x}^*))^{-1} \mathbf{J}_{\mathbf{h}}^T(\mathbf{x}^*) \nabla f(\mathbf{x}^*). \]

\(\flat\)

Proof: By the previous lemma, for \(k\) large enough, \(\mathbf{x}^k\) is an unconstrained local minimizer of \(F^k\). So by the (unconstrained) First-Order Necessary Conditions, it holds that

\[ \nabla F^k(\mathbf{x}^k) = \mathbf{0}. \]

To compute the gradient of \(F^k\) we note that

\[ \|\mathbf{h}(\mathbf{x})\|^2 = \sum_{i=1}^\ell (h_i(\mathbf{x}))^2. \]

The partial derivatives are

\[ \frac{\partial}{\partial x_j} \|\mathbf{h}(\mathbf{x})\|^2 = \sum_{i=1}^\ell \frac{\partial}{\partial x_j} (h_i(\mathbf{x}))^2 = \sum_{i=1}^\ell 2 h_i(\mathbf{x}) \frac{\partial h_i(\mathbf{x})}{\partial x_j}, \]

by the Chain Rule. So, in vector form,

\[ \nabla \|\mathbf{h}(\mathbf{x})\|^2 = 2 \mathbf{J}_{\mathbf{h}}(\mathbf{x})^T \mathbf{h}(\mathbf{x}). \]

The term \(\|\mathbf{x} - \mathbf{x}^*\|^2\) can be rewritten as the quadratic function

\[ \|\mathbf{x} - \mathbf{x}^*\|^2 = \frac{1}{2}\mathbf{x}^T (2 I_{d \times d}) \mathbf{x} - 2 (\mathbf{x}^*)^T \mathbf{x} + (\mathbf{x}^*)^T \mathbf{x}^*. \]

Using a previous formula with \(P = 2 I_{d \times d}\) (which is symmetric), \(\mathbf{q} = -2 \mathbf{x}^*\) and \(r = (\mathbf{x}^*)^T \mathbf{x}^*\), we get

\[ \nabla \|\mathbf{x} - \mathbf{x}^*\|^2 = 2\mathbf{x} -2 \mathbf{x}^*. \]

So, putting everything together,

\[ (**) \qquad \mathbf{0} = \nabla F^k(\mathbf{x}^k) = \nabla f(\mathbf{x}^k) + \mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T (k \mathbf{h}(\mathbf{x}^k)) + \alpha(\mathbf{x}^k - \mathbf{x}^*). \]

By the previous lemma, \(\mathbf{x}^k \to \mathbf{x}^*\), \(\nabla f(\mathbf{x}^k) \to \nabla f(\mathbf{x}^*)\), and \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k) \to \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)\) as \(k \to +\infty\).

So it remains to derive the limit of \(k \mathbf{h}(\mathbf{x}^k)\). By assumption, the columns of \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T\) are linearly independent. That implies that for any unit vector \(\mathbf{z} \in \mathbb{R}^\ell\)

\[ \mathbf{z}^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \mathbf{z} = \|\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \mathbf{z}\|^2 > 0 \]

otherwise we would have \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \mathbf{z} = \mathbf{0}\), contradicting the linear independence assumption. By the Extreme Value Theorem, there is \(\beta > 0\) such that

\[ \mathbf{z}^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \mathbf{z} \geq \beta \]

for all unit vectors \(\mathbf{z} \in \mathbb{R}^\ell\). Since \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k) \to \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)\), it follows from a previous lemma that, for \(k\) large enough and any unit vector \(\mathbf{z} \in \mathbb{R}^\ell\),

\[ |\mathbf{z}^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)\, \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \mathbf{z} - \mathbf{z}^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^k) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T \mathbf{z}| \leq \| \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T - \mathbf{J}_{\mathbf{h}}(\mathbf{x}^k) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T \|_F \leq \frac{\beta}{2}. \]

That implies

\[ \mathbf{z}^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^k) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T \mathbf{z} \geq \frac{\beta}{2}, \]

so by the same argument as above the columns of \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T\) are linearly independent for \(k\) large enough and \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k) \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T\) is invertible. That proves a).

Going back to \((**)\), multiplying both sides by \((\mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)\, \mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)^T)^{-1} \mathbf{J}_{\mathbf{h}}(\mathbf{x}^k)\), and taking a limit \(k \to +\infty\), we get after rearranging that

\[ k \mathbf{h}(\mathbf{x}^k) \to - (\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) ^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*))^{-1} \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \nabla f(\mathbf{x}^*) = \blambda^*. \]

Plugging back we get

\[ \nabla f(\mathbf{x}^*) + \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \blambda^* = \mathbf{0} \]

as claimed. That proves b). \(\square\)

Combining the lemmas establishes the theorem. \(\square\)

Next, we prove the Lagrange Multipliers: Second-Order Sufficient Conditions. Again, We follow [Ber, Section 4.2]. We will need the following lemma. The proof can be skipped.

LEMMA Let \(P\) and \(Q\) be symmetric matrices in \(\mathbb{R}^{n \times n}\). Assume that \(Q\) is positive semidefinite and that \(P\) is positive definite on the null space of \(Q\), that is, \(\mathbf{w}^T P \mathbf{w} > 0\) for all \(\mathbf{w} \neq \mathbf{0}\) such that \(\mathbf{w}^T Q \mathbf{w} = \mathbf{0}\). Then there is a scalar \(\bar{c} \geq 0\) such that \(P + c Q\) is positive definite for all \(c > \bar{c}\). \(\flat\)

Proof: (lemma) We argue by contradiction. Suppose there is an non-negative, increasing, diverging sequence \(\{c_k\}_{k=1}^{+\infty}\) and a sequence of unit vectors \(\{\mathbf{x}^k\}_{k=1}^{+\infty}\) such that

\[ (\mathbf{x}^k)^T (P + c_k Q) \mathbf{x}^k \leq 0 \]

for all \(k\). Because the sequence is bounded, it has a limit point \(\bar{\mathbf{x}}\). Assume without loss of generality that \(\mathbf{x}^k \to \bar{\mathbf{x}}\), as \(k \to \infty\). Since \(c_k \to +\infty\) and \((\mathbf{x}^k)^T Q \mathbf{x}^k \geq 0\) by assumption, we must have \((\bar{\mathbf{x}})^T Q \bar{\mathbf{x}} = 0\), otherwise \((\mathbf{x}^k)^T (P + c_k Q) \mathbf{x}^k \) would diverge. Hence, by the assumption in the statement, it must be that \((\bar{\mathbf{x}})^T P \bar{\mathbf{x}} > 0\). This contradicts the inequality in the display above. \(\square\)

Proof: (Lagrange Multipliers: Second-Order Sufficient Conditions) We consider the modified problem

\[\begin{align*} &\text{min} f(\mathbf{x}) + \frac{c}{2} \|\mathbf{h}(\mathbf{x})\|^2\\ &\text{s.t.}\ \mathbf{h}(\mathbf{x}) = \mathbf{0}. \end{align*}\]

It has the same local minimizers as the orginal problem as the additional term in the objective is zero for feasible vectors. That extra term will allow us to use the previous lemma. For notational convenience, we define

\[ g_c(\mathbf{x}) = f(\mathbf{x}) + \frac{c}{2} \|\mathbf{h}(\mathbf{x})\|^2. \]

The Lagrangian of the modified problem is

\[ L_c(\mathbf{x}, \blambda) = g_c(\mathbf{x}) + \mathbf{h}(\mathbf{x})^T \blambda. \]

We will apply the Second-Order Sufficient Conditions to problem of minimizing \(L_c\) over \(\mathbf{x}\). We indicate the Hessian with respect to only the variables \(\mathbf{x}\) as \(\nabla^2_{\mathbf{x},\mathbf{x}}\).

Recall from the proof of the Lagrange Multipliers: First-Order Necessary Conditions that

\[ \nabla \|\mathbf{h}(\mathbf{x})\|^2 = 2 \mathbf{J}_{\mathbf{h}}(\mathbf{x})^T \mathbf{h}(\mathbf{x}). \]

To compute the Hessian of that function, we note

\[\begin{align*} \frac{\partial}{\partial x_i}\left( \frac{\partial}{\partial x_j} \|\mathbf{h}(\mathbf{x})\|^2\right) &= \frac{\partial}{\partial x_i}\left( \sum_{k=1}^\ell 2 h_k(\mathbf{x}) \frac{\partial h_k(\mathbf{x})}{\partial x_j}\right)\\ &= 2 \sum_{k=1}^\ell\left( \frac{\partial h_k(\mathbf{x})}{\partial x_i} \frac{\partial h_k(\mathbf{x})}{\partial x_j} + h_k(\mathbf{x}) \frac{\partial^2 h_k(\mathbf{x})}{\partial x_i \partial x_j} \right)\\ &= 2 \left(\mathbf{J}_{\mathbf{h}}(\mathbf{x})^T \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}) + \sum_{k=1}^\ell h_k(\mathbf{x}) \, \mathbf{H}_{h_k}(\mathbf{x})\right)_{i,j}. \end{align*}\]

Hence

\[ \nabla_{\mathbf{x}} L_c(\mathbf{x}^*, \blambda^*) = \nabla f(\mathbf{x}^*) + c \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \mathbf{h}(\mathbf{x}^*) + \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \blambda^* \]

and

\[ \nabla^2_{\mathbf{x},\mathbf{x}} L_c(\mathbf{x}^*, \blambda^*) = \mathbf{H}_{f}(\mathbf{x}^*) + c \left(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) + \sum_{k=1}^\ell h_k(\mathbf{x}^*) \, \mathbf{H}_{h_k}(\mathbf{x}^*)\right) + \sum_{k=1}^\ell \lambda^*_k \, \mathbf{H}_{h_k}(\mathbf{x}^*). \]

By the assumptions of the theorem, this simplifies to

\[ \nabla_{\mathbf{x}} L_c(\mathbf{x}^*, \blambda^*) = \mathbf{0} \]

and

\[ \nabla^2_{\mathbf{x},\mathbf{x}} L_c(\mathbf{x}^*, \blambda^*) =\underbrace{\left\{ \mathbf{H}_{f}(\mathbf{x}^*) + \sum_{k=1}^\ell \lambda^*_k \, \mathbf{H}_{h_k}(\mathbf{x}^*) \right\}}_{P} + c \underbrace{\left\{\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \right\}}_{Q}, \]

where further \(\mathbf{v}^T P \mathbf{v} > 0\) for any \(\mathbf{v}\) such that \(\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \,\mathbf{v} = \mathbf{0}\) (which itself implies \(\mathbf{v}^T Q \mathbf{v} = \mathbf{0}\)). Furthermore, \(Q \succeq \mathbf{0}\) since

\[ \mathbf{w}^T Q \mathbf{w} = \mathbf{w}^T \mathbf{J}_{\mathbf{h}}(\mathbf{x}^*)^T \,\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \mathbf{w} = \left\|\mathbf{J}_{\mathbf{h}}(\mathbf{x}^*) \mathbf{w}\right\|^2 \geq 0 \]

for any \(\mathbf{w} \in \mathbb{R}^d\). The previous lemma allows us to take \(c\) large enough that \(\nabla^2_{\mathbf{x},\mathbf{x}} L_c(\mathbf{x}^*, \blambda^*) \succ \mathbf{0}\).

As a result, the unconstrained Second-Order Sufficient Conditions are satisfied at \(\mathbf{x}^*\) for the problem

\[ \min_{\mathbf{x} \in \mathbb{R}^d} L_c(\mathbf{x}, \blambda^*). \]

That is, there is \(\delta > 0\) such that

\[ L_c(\mathbf{x}^*, \blambda^*) < L_c(\mathbf{x}, \blambda^*), \qquad \forall \mathbf{x} \in B_{\delta}(\mathbf{x}^*) \setminus \{\mathbf{x}^*\}. \]

Restricting this to the feasible vectors of the modified constrained problem (i.e., those such that \(\mathbf{h}(\mathbf{x}) = \mathbf{0}\)) implies after simplification

\[ f(\mathbf{x}^*) < f(\mathbf{x}), \qquad \forall \mathbf{x} \in \{\mathbf{x} : \mathbf{h}(\mathbf{x}) = \mathbf{0}\} \cap (B_{\delta}(\mathbf{x}^*) \setminus \{\mathbf{x}^*\}). \]

Therefore, \(\mathbf{x}^*\) is a strict local minimizer of the modified constrained problem (and, in turn, of the original constrained problem). That concludes the proof. \(\square\)