3.7.1. Warm-up worksheets
(with help from Claude, Gemini, and ChatGPT)
Section 3.2
E3.2.1 Calculate the gradient of the function at the point .
E3.2.2 Find the Hessian matrix of the function .
E3.2.3 Compute the partial derivatives and for the function at the point .
E3.2.4 Let and . Calculate using the Chain Rule.
E3.2.5 Verify the Symmetry of the Hessian for the function at the point .
E3.2.6 Find the gradient of the function at the point .
E3.2.7 Calculate the second-order partial derivatives of the function .
E3.2.8 Compute the gradient of the quadratic function at the point .
E3.2.9 Find the Hessian matrix of the function .
E3.2.10 Apply the multivariable Mean Value Theorem to the function on the line segment joining the points and .
E3.2.11 Find the partial derivatives and of the function .
E3.2.12 Compute the gradient of the function at the point .
E3.2.13 Find the Hessian matrix of the function .
E3.2.14 If , find all second partial derivatives of .
E3.2.15 Verify that the function satisfies Laplace’s equation: .
E3.2.16 Consider the function . Use the Mean Value Theorem to approximate using the point .
E3.2.17 A particle moves along a path given by . The temperature at a point is given by . Find the rate of change of the temperature experienced by the particle at time .
E3.2.18 Apply the Chain Rule to find for and .
E3.2.19 Use the Chain Rule to find for and .
Section 3.3
E3.3.1 Consider the function . Find all points where .
E3.3.2 Let . Find the directional derivative of at the point in the direction .
E3.3.3 Consider the function . Find the second directional derivative of at the point in the direction .
E3.3.4 Consider the optimization problem subject to . Write down the Lagrangian function .
E3.3.5 For the optimization problem in E3.3.4, find all points satisfying the first-order necessary conditions.
E3.3.6 For the optimization problem in E3.3.4, check the second-order sufficient conditions at the point .
E3.3.7 Consider the function . Find all points satisfying the first-order necessary conditions for the optimization problem subject to .
E3.3.8 For the optimization problem in E3.3.7, check the second-order sufficient conditions at the point .
E3.3.9 Let be defined by . Determine whether the vector is a descent direction for at the point .
E3.3.10 Let be defined by . Find the Hessian matrix of .
E3.3.11 Let be defined by . Compute the second directional derivative of at the point in the direction .
E3.3.12 Calculate the directional derivative of at the point in the direction of .
E3.3.13 Determine the Hessian matrix of at the point .
Section 3.4
E3.4.1 Find the convex combination of points and with .
E3.4.2 Determine whether the following set is convex:
E3.4.3 Let and . Show that is a convex set.
E3.4.4 Verify that the function is convex using the Hessian matrix.
E3.4.5 Find the global minimizer of the function .
E3.4.6 Consider the function . Show that is convex but not differentiable at .
E3.4.7 Verify that the function is strongly convex with .
E3.4.8 Find the projection of the point onto the set .
E3.4.9 Let . Determine whether is convex.
E3.4.10 Let . Determine whether is log-concave.
E3.4.11 Let . Determine whether is strongly convex.
E3.4.12 Let and . Find the vector that minimizes .
E3.4.13 Given the set , determine if is convex.
E3.4.14 Determine if the set is convex.
E3.4.15 Determine if the set is convex.
Section 3.5
E3.5.1 Given the function , find the direction of steepest descent at the point .
E3.5.2 Consider the function . Compute the gradient and find the direction of steepest descent at .
E3.5.3 Given the function , perform two iterations of gradient descent starting from with a step size of .
E3.5.4 Let . Show that is -smooth for some .
E3.5.5 Let . Perform one step of gradient descent starting from with step size .
E3.5.6 Consider the -smooth function . Starting from , compute the point obtained after one iteration of gradient descent with step size .
E3.5.7 Verify that the function is -strongly convex.
E3.5.8 For the -strongly convex function with global minimizer , compute and compare it with .
E3.5.9 Let . Is -smooth for some ? Justify your answer.
E3.5.10 Let . Is -strongly convex for some ? Justify your answer.
E3.5.11 Let . Show that is -strongly convex for some .
E3.5.12 Let . Perform one step of gradient descent starting from with step size .
E3.5.13 Given the function , perform two steps of gradient descent starting at with a step size .
E3.5.14 For a quadratic function , derive the the valuer of under which gradient descent will converge to the minimum in one step.
Section 3.6
E3.6.1 Compute the log-odds of an event with probability .
E3.6.2 Given the features and the coefficients , compute the linear combination .
E3.6.3 For the data point with the label and the coefficients , calculate the probability using the sigmoid function.
E3.6.4 Compute the cross-entropy loss for a single data point with label , given .
E3.6.5 Prove that the logistic function satisfies .
E3.6.6 Given the feature vectors , , , and the corresponding labels , , , compute the logistic regression objective function for .
E3.6.7 For the same data as in E3.6.6, compute the gradient of the logistic regression objective function at .
E3.6.8 For the same data as in E3.6.6, compute the Hessian of the logistic regression objective function at .
E3.6.9 For the same data as in E3.6.6, perform one step of gradient descent with step size starting from .
E3.6.10 For the same data as in E3.6.6, compute the smoothness parameter of the logistic regression objective function.
3.7.2. Problems
3.1 Let be twice continuously differentiable, let be vectors in , and let . Consider the following real-valued function of :
a) Compute the gradient of in vector form in terms of the derivative of . (By vector form, we mean that it is not enough to write down each element of separately.)
b) Compute the Hessian of in matrix form in terms of the first derivative and second derivative of . (By matrix form, we mean that it is not enough to write down each element of or each of its columns separately.)
3.2 Give an alternative proof of the Descent Direction and Directional Derivative Lemma using the definition of the directional derivative. [Hint: Mimic the proof of the single-variable case.]
3.3 (Adapted from [CVX]) Show that if and are convex sets in , then so is their partial sum
3.4 A convex combination of is a linear combination of the form
where for all and . Show that a set is convex if and only if it contains all convex combinations of its elements. [Hint: Use induction on .]
3.5 A set is a cone if, for all and all , . A conic combination of is a linear combination of the form
where for all . Show that a set is a convex cone if and only if it contains all conic combinations of its elements.
3.6 Show that any linear subspace of is convex.
3.7 Show that the set
is convex.
3.8 Let be a strictly convex function. Assume further that there is a global minimizer . Show that it is unique.
3.9 Show that is a convex cone.
3.10 Prove (a)-(e) from the Operations that Preserve Convexity Lemma.
3.11 Let be a function. The eipgraph of is the set
Show that is convex if and only if is a convex set.
3.12 Let be convex functions and let . Show that
is convex.
3.13 Let be convex functions. Show that the pointwise maximum function
is convex. [Hint: First show that .]
3.14 Prove the following composition theorem: if is convex and is convex and nondecreasing, then the composition is convex.
3.15 Show that , where , is convex.
3.16 Let be twice continuously differentiable. Show that is -smooth if for all .
3.17 For and , let where
be a classifier parametrized by . For a dataset and , , let the cross-entropy loss be
where
a) Show that for all .
b) Use (a) to show that
c) Use (b) to show that
d) Use © to show that, for any dataset , is convex and -smooth as a function of .
3.18 Prove the Quadratic Bound for Strongly Convex Functions. [Hint: Adapt the proof of the Quadratic Bound for Smooth Functions.]
3.19 Show that for all . [Hint: Compute the derivative of and the value .]
3.20 Consider the affine function
where and . Fix a value . Assume that
Show that
3.21 Let , where , and let , where . Assume that is continuously differentiable at , an interior point of , and that is continuously differentiable at , an interior point of . Assume further that there is a value such that
Show that is orthogonal to .
3.22 Fix a partition of . Under the -means objective, its cost is
where
with , the center of cluster .
a) Suppose is a global minimizer of
for each . Show that is a global minimizer of .
b) Compute the gradient of .
c) Find the stationary points of .
d) Compute the Hessian of .
e) Show that is convex.
f) Compute the global minimizer of .
Justify your answer.
3.23 Consider the quadratic function
where is symmetric. Show that, if is positive definite, then is strongly convex.
3.24 A closed halfspace is a set of the form
for some and .
a) Show that is convex.
b) Show that there exists such that
3.25 A hyperplane is a set of the form
for some and .
a) Show that is convex.
b) Assume . Compute and justify your answer.
3.26 Recall that the -norm is defined as
a) Show that the -norm satisfies the triangle inequality.
b) Show that the closed ball
for some , is convex.
3.27 The Lorentz cone is the set
Show that is convex.
3.28 A polyhedron is a set of the form
for some , , and
, , . Show that is convex.
3.29 The probability simplex in is the set
Show that
is convex.
3.30 Consider the function
a) Show that if then .
b) Show that if then .
3.31 Let for some and let be an interior point of . Assume and are continuously differentiable at . Let for all in . Compute . [Hint: You can make use of the corresponding single variable result.]
3.32 Let for some and let be an interior point of where . Assume is continuously differentiable at . Compute . [Hint: You can make use of the corresponding single variable result without proof.]
3.33 Let for some and let be an interior point of . Assume and are twice continuously differentiable at . Let for all in . Compute .
3.34 (Adapted from [Rud]) Let for some convex open set . Assume that is continuously differentiable everywhere on and that further for all . Show that depends only on .
3.35 (Adapted from [Rud]) Let for some open set . Assume that is continuously differentiable everywhere on and that further
Show that for all . [Hint: Use composition.]
3.36 (Adapted from [Khu]) Let for some open set . Assume that is continuously differentiable everywhere on and that further
for any and any scalar such that . In that case, is said to be homogeneous of degree . Prove that for all we have
3.37 (Adapted from [Khu]) Apply Taylor’s Theorem in the neighborhood of to the function
[Hint: You can use standard formulas for the derivatives of single-variable functions without proof.]
3.38 (Adapted from [Khu]) Apply Taylor’s Theorem in the neighborhood of to the function
[Hint: You can use standard formulas for the derivatives of single-variable functions without proof.]
3.39 (Adapted from [Khu]) Apply Taylor’s Theorem in the neighborhood of to the function
[Hint: You can use standard formulas for the derivatives of single-variable functions without proof.]
3.40 (Adapted from [Khu]) Consider the function
a) Compute the gradient and Hessian of at .
b) Show that is not a strict local minimizer of . [Hint: Consider a parametric curve of the form and .]
c) Let for some nonzero vector . Show that is a strict local minimizer of .
3.41 Suppose is convex. Let and . Show that
is convex over .
3.42 (Adapted from [CVX]) For , let
be the coordinates of in non-increasing order. Let be an integer and . Show that
is convex.
3.43 For a fixed , let
a) Compute the gradient and Hessian of .
b) Show that is strongly convex.
c) For a finite set , let
Use Problem 3.13 to show that is convex.
3.44 Let , , be a collection of convex subsets of . Here can be infinite. Show that
is convex.
3.45 a) Let , , be a collection of convex real-valued functions over . Use Problems 3.11, 3.44 to show that
is convex.
b) Let be a convex function and let be a (not necessarily convex) set. Use (a) to show that the function
is convex. You can assume that for all .
3.46 Use Problem 3.45 to show that the largest eigenvalue of a matrix is a convex function over the set of all symmetric matrices. [Hint: First show that is a linear function of .
3.47 Let be a convex function. Show that the set of global minimizers is convex.
3.48 Consider the set of symmetric matrices
a) Show that is a linear subspace of the vector space of matrices in the sense that for all and
and
b) Show that there exists a basis of of size , that is, a collection of symmetric matrices such that any matrix can be written as a linear combination
for some .
c) Show that the matrices you constructed in (b) are linearly independent in the sense that
3.49 Let be a convex function over the set
a) Show that is convex.
b) Show that the First-Order Optimality Conditions for a Convex Functions on Convex Sets reduces to
and
[Hint: Choose the right in the condition of the theorem.]
3.50 Let be twice continuously differentiable and -strongly convex with .
a) Show that
along any sequence such that as
b) Show that, for any , the set
is closed and bounded.
c) Show that there exists a unique global minimizer of .
3.51 Suppose that is -smooth and -strongly convex with a global minimizer at . Apply gradient descent with step size started from an arbitrary point .
a) Show that as . [Hint: Use the Quadratic Bound for Strongly Convex Functions at .]
b) Give a bound on depending on and .
3.52 Let be twice continuously differentiable.
a) Show that
where . Here the integral of a vector-valued function is performed entry-wise. [Hint: Let . The fundamental theorem of calculus implies that ]
b) Assume is -smooth. Use (a) to show that the gradient of is -Lipschitz in the sense that
3.53 Consider the quadratic function
where is a symmetric matrix. Use the First-Order Convexity Condition to show that is convex if and only is positive semidefinite.
3.54 Let
be the loss function in logistic regression, where has rows . Show that
[Hint: Read the proof of the Smoothness of logistic regression lemma.]