3.2. Background: review of differentiable functions of several variables#
We review the differential calculus of several variables. We highlight a few key results that will play an important role: the Chain Rule and the Mean Value Theorem.
3.2.1. Gradient#
Recall the definition of the gradient.
DEFINITION (Gradient)
is called the gradient of
Note that the gradient is itself a function of
EXAMPLE: Consider the affine function
where
So the gradient of
EXAMPLE: Consider the quadratic function
where
where we used that all terms not including
This last expression is
So the gradient of
If
It will be useful to compute the derivative of a function
EXAMPLE: (Parametric Line) The straight line between
where
Then
so that
Recall the Chain Rule in the single-variable case. Quoting Wikipedia:
The simplest form of the chain rule is for real-valued functions of one real variable. It states that if
is a function that is differentiable at a point (i.e. the derivative exists) and f is a function that is differentiable at , then the composite function is differentiable at , and the derivative is .
Here is a straightforward generalization of the Chain Rule.
THEOREM (Chain Rule)
Proof: We apply the Chain Rule for functions of one variable to the partial derivatives. For all
Collecting the partial derivatives in a vector gives the claim.
Here is a different generalization of the Chain Rule. Again the composition
THEOREM (Chain Rule) Let
Proof: To simplify the notation, suppose that
We seek to compute the limit
Applying the Mean Value Theorem to each term gives
where
As a first application of the Chain Rule, we generalize the Mean Value Theorem to the case of several variables. We will use this result later to prove a multivariable Taylor expansion result that will play a central role in this chapter.
THEOREM (Mean Value)
for some
One way to think of the Mean Value Theorem is as a
Proof idea: We apply the single-variable result and the Chain Rule.
Proof: Let
In particular,
for some
3.2.2. Second-order derivatives#
One can also define higher-order derivatives. We start with the single-variable case, where
provided the limit exists.
In the several variable case, we have the following:
DEFINITION (Second Partial Derivatives and Hessian)
To simplify the notation, we write this as
The matrix of second derivatives is called the Hessian and is denoted by
Like
When
THEOREM (Symmetry of the Hessian)
Proof idea: Two applications of the Mean Value Theorem show that the limits can be interchanged.
Proof: By definition of the partial derivative,
for some
Because
The claim then follows from the continuity of
EXAMPLE: Consider the quadratic function
Recall that the gradient of
To simplify the calculation, let
Each component of
Row
Putting this together we get
Observe that this is indeed a symmetric matrix.
Self-assessment quiz (with help from Claude, Gemini, and ChatGPT)
1 What does it mean for a function
a)
b) All partial derivatives of
c) All partial derivatives of
d) The gradient of
2 What is the gradient of a function
a) The rate of change of
b) The vector of all second partial derivatives of
c) The vector of all first partial derivatives of
d) The matrix of all second partial derivatives of
3 Which of the following statements is true about the Hessian matrix of a twice continuously differentiable function?
a) It is always a diagonal matrix.
b) It is always a symmetric matrix.
c) It is always an invertible matrix.
d) It is always a positive definite matrix.
4 Let
a)
b)
c)
d)
5 What is the Hessian matrix of the quadratic function
a)
b)
c)
d)
Answer for 1: c. Justification: The text states, “If
Answer for 2: c. Justification: From the text: “The (column) vector
Answer for 3: b). Justification: The text states: “When
Answer for 4: a). Justification: The Hessian is the matrix of second partial derivatives:
Answer for 5: c. Justification: The text shows that the Hessian of the quadratic function is