3.7. Exercises#

3.7.1. Warm-up worksheets#

(with help from Claude, Gemini, and ChatGPT)

Section 3.2

E3.2.1 Calculate the gradient of the function f(x1,x2)=3x122x1x2+4x225x1+2x2 at the point (1,1).

E3.2.2 Find the Hessian matrix of the function f(x1,x2,x3)=2x1x2+3x2x3x124x32.

E3.2.3 Compute the partial derivatives fx1 and fx2 for the function f(x1,x2)=sin(x1)cos(x2) at the point (π4,π3).

E3.2.4 Let f(x1,x2)=ex1+x1x2x22 and g(t)=(t2,t). Calculate (fg)(1) using the Chain Rule.

E3.2.5 Verify the Symmetry of the Hessian for the function f(x1,x2)=x13+3x1x222x23 at the point (1,2).

E3.2.6 Find the gradient of the function f(x1,x2,x3)=2x13x2+4x3 at the point (1,2,3).

E3.2.7 Calculate the second-order partial derivatives of the function f(x1,x2)=x12sin(x2).

E3.2.8 Compute the gradient of the quadratic function f(x1,x2)=3x12+2x1x2x22+4x12x2 at the point (1,1).

E3.2.9 Find the Hessian matrix of the function f(x1,x2,x3)=x12+2x22+3x322x1x2+4x1x36x2x3.

E3.2.10 Apply the multivariable Mean Value Theorem to the function f(x1,x2)=x12+x22 on the line segment joining the points (0,0) and (1,2).

E3.2.11 Find the partial derivatives fx and fy of the function f(x,y)=x3y22xy3+y4.

E3.2.12 Compute the gradient f(x,y) of the function f(x,y)=ln(x2+2y2) at the point (1,2).

E3.2.13 Find the Hessian matrix of the function g(x,y)=sin(x)cos(y).

E3.2.14 If p(x,y,z)=x2yz+3xyz2, find all second partial derivatives of p.

E3.2.15 Verify that the function q(x,y)=x33xy2 satisfies Laplace’s equation: 2qx2+2qy2=0.

E3.2.16 Consider the function s(x,y)=x2+4xy+5y2. Use the Mean Value Theorem to approximate s(1.1,0.9) using the point (1,1).

E3.2.17 A particle moves along a path given by c(t)=(t2,t3). The temperature at a point (x,y) is given by u(x,y)=ex2y2. Find the rate of change of the temperature experienced by the particle at time t=1.

E3.2.18 Apply the Chain Rule to find ddtf(g(t)) for f(x,y)=x2+y2 and g(t)=(t,sint).

E3.2.19 Use the Chain Rule to find ddtf(g(t)) for f(x,y)=xy and g(t)=(t2,cost).

Section 3.3

E3.3.1 Consider the function f(x1,x2)=x12+2x22. Find all points (x1,x2) where f(x1,x2)=0.

E3.3.2 Let f(x1,x2)=x12x22. Find the directional derivative of f at the point (1,1) in the direction v=(12,12).

E3.3.3 Consider the function f(x1,x2)=x12+2x1x2+x22. Find the second directional derivative of f at the point (1,1) in the direction v=(12,12).

E3.3.4 Consider the optimization problem minx1,x2x12+x22 subject to x1+x2=1. Write down the Lagrangian function L(x1,x2,λ).

E3.3.5 For the optimization problem in E3.3.4, find all points (x1,x2,λ) satisfying the first-order necessary conditions.

E3.3.6 For the optimization problem in E3.3.4, check the second-order sufficient conditions at the point (12,12,1).

E3.3.7 Consider the function f(x1,x2,x3)=x12+x22+x32. Find all points (x1,x2,x3) satisfying the first-order necessary conditions for the optimization problem minx1,x2,x3f(x1,x2,x3) subject to x1+2x2+3x3=6.

E3.3.8 For the optimization problem in E3.3.7, check the second-order sufficient conditions at the point (12,1,32,1).

E3.3.9 Let f:R2R be defined by f(x1,x2)=x133x1x22. Determine whether the vector v=(1,1) is a descent direction for f at the point (1,0).

E3.3.10 Let f:R2R be defined by f(x1,x2)=x12+x222x14x2+5. Find the Hessian matrix of f.

E3.3.11 Let f:R2R be defined by f(x1,x2)=x12x22. Compute the second directional derivative of f at the point (0,0) in the direction v=(1,0).

E3.3.12 Calculate the directional derivative of f(x,y)=x2+y2 at the point (1,1) in the direction of v=(1,1).

E3.3.13 Determine the Hessian matrix of f(x,y)=x3+y33xy at the point (1,1).

Section 3.4

E3.4.1 Find the convex combination of points (2,3) and (4,5) with α=0.3.

E3.4.2 Determine whether the following set is convex:

S={(x1,x2)R2:x12+x221}.

E3.4.3 Let S1={xR2:x1+x21} and S2={xR2:x1x21}. Show that S1S2 is a convex set.

E3.4.4 Verify that the function f(x)=log(ex1+ex2) is convex using the Hessian matrix.

E3.4.5 Find the global minimizer of the function f(x)=x2+2x+1.

E3.4.6 Consider the function f(x)=|x|. Show that f is convex but not differentiable at x=0.

E3.4.7 Verify that the function f(x)=x2+2y2 is strongly convex with m=2.

E3.4.8 Find the projection of the point x=(1,2) onto the set D={(x1,x2)R2:x1+x2=1}.

E3.4.9 Let f(x)=x42x2+1. Determine whether f is convex.

E3.4.10 Let f(x)=ex. Determine whether f is log-concave.

E3.4.11 Let f(x)=x22x+2. Determine whether f is strongly convex.

E3.4.12 Let A=(1225) and b=(10). Find the vector x that minimizes Axb22.

E3.4.13 Given the set D={(x,y)R2:x2+y2<4}, determine if D is convex.

E3.4.14 Determine if the set D={(x,y)R2:x+y1} is convex.

E3.4.15 Determine if the set D={(x,y)R2:x2y21} is convex.

Section 3.5

E3.5.1 Given the function f(x,y)=x2+4y2, find the direction of steepest descent at the point (1,1).

E3.5.2 Consider the function f(x)=x2+4x+5. Compute the gradient f(x) and find the direction of steepest descent at x0=1.

E3.5.3 Given the function f(x)=x36x2+9x+2, perform two iterations of gradient descent starting from x0=0 with a step size of α=0.1.

E3.5.4 Let f(x)=x2. Show that f is L-smooth for some L>0.

E3.5.5 Let f(x)=x2. Perform one step of gradient descent starting from x0=2 with step size α=0.1.

E3.5.6 Consider the 2-smooth function f(x)=x2. Starting from x0=3, compute the point x1 obtained after one iteration of gradient descent with step size α=12.

E3.5.7 Verify that the function f(x)=2x2+1 is 4-strongly convex.

E3.5.8 For the 4-strongly convex function f(x)=2x2+1 with global minimizer x=0, compute 12mf(1)2 and compare it with f(1)f(x).

E3.5.9 Let f(x)=x4. Is f L-smooth for some L>0? Justify your answer.

E3.5.10 Let f(x)=x3. Is f m-strongly convex for some m>0? Justify your answer.

E3.5.11 Let f(x)=x2+2x+1. Show that f is m-strongly convex for some m>0.

E3.5.12 Let f(x)=x2+2x+1. Perform one step of gradient descent starting from x0=3 with step size α=0.2.

E3.5.13 Given the function f(x,y)=x2+y2, perform two steps of gradient descent starting at (1,1) with a step size α=0.1.

E3.5.14 For a quadratic function f(x)=ax2+bx+c, derive the the valuer of α under which gradient descent will converge to the minimum in one step.

Section 3.6

E3.6.1 Compute the log-odds of an event with probability p=0.25.

E3.6.2 Given the features x=(2,1) and the coefficients α=(0.5,0.3), compute the linear combination αTx.

E3.6.3 For the data point x=(1,3) with the label b=1 and the coefficients α=(0.2,0.4), calculate the probability p(x;α) using the sigmoid function.

E3.6.4 Compute the cross-entropy loss for a single data point x=(1,2) with label b=0, given α=(0.3,0.4).

E3.6.5 Prove that the logistic function σ(z)=11+ez satisfies σ(z)=σ(z)(1σ(z)).

E3.6.6 Given the feature vectors α1=(1,2), α2=(1,1), α3=(0,1), and the corresponding labels b1=1, b2=0, b3=1, compute the logistic regression objective function (x;A,b) for x=(1,1).

E3.6.7 For the same data as in E3.6.6, compute the gradient of the logistic regression objective function (x;A,b) at x=(1,1).

E3.6.8 For the same data as in E3.6.6, compute the Hessian H(x;A,b) of the logistic regression objective function at x=(1,1).

E3.6.9 For the same data as in E3.6.6, perform one step of gradient descent with step size β=0.1 starting from x0=(0,0).

E3.6.10 For the same data as in E3.6.6, compute the smoothness parameter L of the logistic regression objective function.

3.7.2. Problems#

3.1 Let f:RR be twice continuously differentiable, let a1,a2 be vectors in Rd, and let b1,b2R. Consider the following real-valued function of xRd:

g(x)=12f(a1Tx+b1)+12f(a2Tx+b2).

a) Compute the gradient of g in vector form in terms of the derivative f of f. (By vector form, we mean that it is not enough to write down each element of g(x) separately.)

b) Compute the Hessian of g in matrix form in terms of the first derivative f and second derivative f of f. (By matrix form, we mean that it is not enough to write down each element of Hg(x) or each of its columns separately.)

3.2 Give an alternative proof of the Descent Direction and Directional Derivative Lemma using the definition of the directional derivative. [Hint: Mimic the proof of the single-variable case.]

3.3 (Adapted from [CVX]) Show that if S1 and S2 are convex sets in Rm+n, then so is their partial sum

S={(x,y1+y2)|xRm,y1,y2Rn,(x,y1)S1,(x,y2)S2}.

3.4 A convex combination of z1,,zmRd is a linear combination of the form

w=i=1mαizi

where αi0 for all i and i=1mαi=1. Show that a set is convex if and only if it contains all convex combinations of its elements. [Hint: Use induction on m.]

3.5 A set CRd is a cone if, for all xC and all α0, αxC. A conic combination of z1,,zmRd is a linear combination of the form

w=i=1mαizi

where αi0 for all i. Show that a set C is a convex cone if and only if it contains all conic combinations of its elements.

3.6 Show that any linear subspace of Rd is convex.

3.7 Show that the set

R+d={xRd:x0}

is convex.

3.8 Let f:RdR be a strictly convex function. Assume further that there is a global minimizer x. Show that it is unique.

3.9 Show that S+n is a convex cone.

3.10 Prove (a)-(e) from the Operations that Preserve Convexity Lemma.

3.11 Let f:RdR be a function. The eipgraph of f is the set

epif={(x,y):xRd,yf(x)}.

Show that f is convex if and only if epif is a convex set.

3.12 Let f1,,fm:RdR be convex functions and let β1,,βm0. Show that

f(x)=i=1mβifi(x)

is convex.

3.13 Let f1,f2:RdR be convex functions. Show that the pointwise maximum function

f(x)=max{f1(x),f2(x)}

is convex. [Hint: First show that max{α+β,η+ϕ}max{α,η}+max{β,ϕ}.]

3.14 Prove the following composition theorem: if f:RdR is convex and g:RR is convex and nondecreasing, then the composition h=gf is convex.

3.15 Show that f(x)=eβx, where x,βR, is convex.

3.16 Let f:RR be twice continuously differentiable. Show that f is L-smooth if |f(x)|L for all xR.

3.17 For a[1,1] and b{0,1}, let f^(x,a)=σ(xa) where

σ(t)=11+et

be a classifier parametrized by xR. For a dataset ai[1,1] and bi{0,1}, i=1,,n, let the cross-entropy loss be

L(x,{(ai,bi)}i=1n)=1ni=1n(x,ai,bi)

where

(x,a,b)=blog(f^(x,a))(1b)log(1f^(x,a)).

a) Show that σ(t)=σ(t)(1σ(t)) for all tR.

b) Use (a) to show that

xL(x,{(ai,bi)}i=1n)=1ni=1n(bif^(x,ai))ai.

c) Use (b) to show that

2x2L(x,{(ai,bi)}i=1n)=1ni=1nf^(x,ai)(1f^(x,ai))ai2.

d) Use © to show that, for any dataset {(ai,bi)}i=1n, L is convex and 1-smooth as a function of x.

3.18 Prove the Quadratic Bound for Strongly Convex Functions. [Hint: Adapt the proof of the Quadratic Bound for Smooth Functions.]

3.19 Show that logxx1 for all x>0. [Hint: Compute the derivative of s(x)=x1logx and the value s(1).]

3.20 Consider the affine function

f(x)=qTx+r

where x=(x1,,xd)T,q=(q1,,qd)TRd and rR. Fix a value cR. Assume that

f(x0)=f(x1)=c.

Show that

f(x0)T(x1x0)=0.

3.21 Let f:D1R, where D1Rd, and let g:D2Rd, where D2R. Assume that f is continuously differentiable at g(t0), an interior point of D1, and that g is continuously differentiable at t0, an interior point of D2. Assume further that there is a value cR such that

fg(t)=c,tD2.

Show that g(t0) is orthogonal to f(g(t0)).

3.22 Fix a partition C1,,Ck of [n]. Under the k-means objective, its cost is

G(C1,,Ck)=minμ1,,μkRdG(C1,,Ck;μ1,,μk)

where

G(C1,,Ck;μ1,,μk)=i=1kjCixjμi2=j=1nxjμc(j)2,

with μiRd, the center of cluster Ci.

a) Suppose μi is a global minimizer of

Fi(μi)=jCixjμi2,

for each i. Show that μ1,,μk is a global minimizer of G(C1,,Ck;μ1,,μk).

b) Compute the gradient of Fi(μi).

c) Find the stationary points of Fi(μi).

d) Compute the Hessian of Fi(μi).

e) Show that Fi(μi) is convex.

f) Compute the global minimizer of G(C1,,Ck;μ1,,μk). Justify your answer.

3.23 Consider the quadratic function

f(x)=12xTPx+qTx+r

where P is symmetric. Show that, if P is positive definite, then f is strongly convex.

3.24 A closed halfspace is a set of the form

K={xRd:aTxb},

for some aRd and bR.

a) Show that K is convex.

b) Show that there exists x0Rd such that

K={xRd:aT(xx0)0}.

3.25 A hyperplane is a set of the form

H={xRd:aTx=b},

for some aRd and bR.

a) Show that H is convex.

b) Assume b=0. Compute H and justify your answer.

3.26 Recall that the 1-norm is defined as

x1=i=1d|xi|.

a) Show that the 1-norm satisfies the triangle inequality.

b) Show that the closed 1 ball

{xRd:x1r},

for some r>0, is convex.

3.27 The Lorentz cone is the set

C={(x,t)Rd×R:x2t}.

Show that C is convex.

3.28 A polyhedron is a set of the form

K={xRd:aiTxbi,i=1,,m, and cjTx=dj,j=1,,n},

for some aiRd, biR, i=1,,m and cjRd, djR, j=1,,n. Show that K is convex.

3.29 The probability simplex in Rd is the set

Δ={xRd:x0 and xT1=1}.

Show that

Δ is convex.

3.30 Consider the function

f(y)=supxRd{yTxx2}.

a) Show that if y21 then f(y)=0.

b) Show that if y2>1 then f(y)=+.

3.31 Let f,g:DR for some DRd and let x0 be an interior point of D. Assume f and g are continuously differentiable at x0. Let h(x)=f(x)g(x) for all x in D. Compute h(x0). [Hint: You can make use of the corresponding single variable result.]

3.32 Let f:DR for some DRd and let x0 be an interior point of D where f(x0)0. Assume f is continuously differentiable at x0. Compute (1/f). [Hint: You can make use of the corresponding single variable result without proof.]

3.33 Let f,g:DR for some DRd and let x0 be an interior point of D. Assume f and g are twice continuously differentiable at x0. Let h(x)=f(x)g(x) for all x in D. Compute Hh(x0).

3.34 (Adapted from [Rud]) Let f:DR for some convex open set DRd. Assume that f is continuously differentiable everywhere on D and that further fx1(x0)=0 for all x0D. Show that f depends only on x2,,xd.

3.35 (Adapted from [Rud]) Let g:DRd for some open set DR. Assume that g is continuously differentiable everywhere on D and that further

g(t)2=1,tD.

Show that g(t)Tg(t)=0 for all tD. [Hint: Use composition.]

3.36 (Adapted from [Khu]) Let f:DR for some open set DRd. Assume that f is continuously differentiable everywhere on D and that further

f(tx)=tnf(x),

for any xD and any scalar t such that txD. In that case, f is said to be homogeneous of degree n. Prove that for all xD we have

xTf(x)=nf(x).

3.37 (Adapted from [Khu]) Apply Taylor’s Theorem in the neighborhood of 0 to the function

f(x1,x2)=exp(x2sinx1).

[Hint: You can use standard formulas for the derivatives of single-variable functions without proof.]

3.38 (Adapted from [Khu]) Apply Taylor’s Theorem in the neighborhood of 0 to the function

f(x1,x2)=cos(x1x2).

[Hint: You can use standard formulas for the derivatives of single-variable functions without proof.]

3.39 (Adapted from [Khu]) Apply Taylor’s Theorem in the neighborhood of 0 to the function

f(x1,x2,x3)=sin(ex1+x22+x33).

[Hint: You can use standard formulas for the derivatives of single-variable functions without proof.]

3.40 (Adapted from [Khu]) Consider the function

f(x1,x2)=(x2x12)(x22x12).

a) Compute the gradient and Hessian of f at (0,0).

b) Show that (0,0) is not a strict local minimizer of f. [Hint: Consider a parametric curve of the form x1=tα and x2=tβ.]

c) Let g(t)=at for some nonzero vector a=(a1,a2)TR2. Show that t=0 is a strict local minimizer of h(t)=f(g(t)).

3.41 Suppose f:RdR is convex. Let ARd×m and bRd. Show that

g(x)=f(Ax+b)

is convex over Rm.

3.42 (Adapted from [CVX]) For x=(x1,,xn)Rn, let

x[1]x[2]x[n],

be the coordinates of x in non-increasing order. Let 1rn be an integer and w1w2wr0. Show that

f(x)=i=1rwix[i],

is convex.

3.43 For a fixed zRd, let

f(x)=xz22.

a) Compute the gradient and Hessian of f.

b) Show that f is strongly convex.

c) For a finite set ZRd, let

g(x)=maxzZxz22.

Use Problem 3.13 to show that g is convex.

3.44 Let Si, iI, be a collection of convex subsets of Rd. Here I can be infinite. Show that

iISi,

is convex.

3.45 a) Let fi, iI, be a collection of convex real-valued functions over Rd. Use Problems 3.11, 3.44 to show that

g(x)=supiIfi(x),

is convex.

b) Let f:Rd+fR be a convex function and let SRf be a (not necessarily convex) set. Use (a) to show that the function

g(x)=supySf(x,y),

is convex. You can assume that g(x)<+ for all xRd.

3.46 Use Problem 3.45 to show that the largest eigenvalue of a matrix is a convex function over the set of all symmetric matrices. [Hint: First show that x,Ax is a linear function of A.

3.47 Let f:RdR be a convex function. Show that the set of global minimizers is convex.

3.48 Consider the set of symmetric n×n matrices

Sn={XRn×n:X=XT}.

a) Show that Sn is a linear subspace of the vector space of n×n matrices in the sense that for all X1,X2Sn and αR

X1+X2Sn,

and

αX1Sn.

b) Show that there exists a basis of Sn of size d=(n2)+n, that is, a collection of symmetric matrices X1,,XdSn such that any matrix YSn can be written as a linear combination

Y=i=1dαiXi,

for some α1,,αdR.

c) Show that the matrices X1,,XdSn you constructed in (b) are linearly independent in the sense that

i=1dαiXi=0n×nα1==αd=0.

3.49 Let f:DR be a convex function over the set

D={x=(x1,,xd)Rd:xi0,i}.

a) Show that D is convex.

b) Show that the First-Order Optimality Conditions for a Convex Functions on Convex Sets reduces to

f(x0)xi0,i[d]

and

f(x0)xi=0,if xi0>0.

[Hint: Choose the right y in the condition of the theorem.]

3.50 Let f:RdR be twice continuously differentiable and m-strongly convex with m>0.

a) Show that

f(yn)+,

along any sequence (yn) such that yn+ as n+.

b) Show that, for any x0, the set

C={xRd:f(x)f(x0)},

is closed and bounded.

c) Show that there exists a unique global minimizer x of f.

3.51 Suppose that f:RdR is L-smooth and m-strongly convex with a global minimizer at x. Apply gradient descent with step size α=1/L started from an arbitrary point x0.

a) Show that xtx as t+. [Hint: Use the Quadratic Bound for Strongly Convex Functions at x.]

b) Give a bound on xtx depending on t,m,L,f(x0) and f(x).

3.52 Let f:RdR be twice continuously differentiable.

a) Show that

f(y)=f(x)+01Hf(x+ξp)pdξ,

where p=yx. Here the integral of a vector-valued function is performed entry-wise. [Hint: Let gi(ξ)=xif(x+ξp). The fundamental theorem of calculus implies that gi(1)gi(0)=01gi(ξ)dξ.]

b) Assume f is L-smooth. Use (a) to show that the gradient of f is L-Lipschitz in the sense that

f(y)f(x)2Lyx2,x,yRd.

3.53 Consider the quadratic function

f(x)=12xTPx+qTx+r,

where P is a symmetric matrix. Use the First-Order Convexity Condition to show that f is convex if and only P is positive semidefinite.

3.54 Let

(x;A,b)=1ni=1n{bilog(σ(αiTx))(1bi)log(1σ(αiTx))}.

be the loss function in logistic regression, where ARn×d has rows αiT. Show that

H(x;A,b)14nATA.

[Hint: Read the proof of the Smoothness of logistic regression lemma.]