Hessian matrix
This article describes an analogue for functions of multiple variables of the following term/fact/notion for functions of one variable: second derivative
Contents
Definition
Definition in terms of Jacobian matrix and gradient vector
Suppose is a real-valued function of
variables
. The Hessian matrix of
is a
-matrix-valued function with domain a subset of the domain of
, defined as follows: the Hessian matrix at any point in the domain is the Jacobian matrix of the gradient vector of
at the point. In point-free notation, we denote by
the Hessian matrix function, and we define it as:
Interpretation as second derivative
The Hessian matrix function is the correct notion of second derivative for a real-valued function of variables. Here's why:
- The correct notion of first derivative for a scalar-valued function of multiple variables is the gradient vector, so the correct notion of first derivative for
is
.
- The gradient vector
is itself a vector-valued function with
-dimensional inputs and
-dimensional outputs. The correct notion of derivative for that is the Jacobian matrix, with
-dimensional inputs and outputs valued in
-matrices.
Thus, the Hessian matrix is the correct notion of second derivative.
Relation with second-order partial derivatives
For further information, refer: Relation between Hessian matrix and second-order partial derivatives
Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function is a real-valued function of
variables
, the Hessian matrix
is a
-matrix-valued function whose
entry is the second-order partial derivative
, which is the same as
. Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give second-order mixed partial derivatives.
Some people choose to define the Hessian matrix as the matrix whose entries are the second-order partial derivatives as indicated here. However, that is not quite the correct definition of Hessian matrix because it is possible for all the second-order partial derivatives to exist but for the function to not be twice differentiable at the point. The main disadvantage of defining the Hessian matrix in the more expansive sense (i.e., in terms of second-order partial derivatives) is that all the important results about the Hessian matrix crucially rely on the function being twice differentiable, so we don't actually gain anything by using the more expansive definition.
Continuity assumptions and symmetric matrix
If we assume that all the second-order mixed partial derivatives are continuous at and around a point in the domain, and the Hessian matrix exists, then the Hessian matrix must be a symmetric matrix by Clairaut's theorem on equality of mixed partials. Note that we don't need to assume for this that the second-order pure partials are continuous at or around the point.
In symbols, for a function of variables
, we get:
Relation with second-order directional derivatives
For further information, refer: Hessian matrix defines bilinear form that outputs second-order directional derivatives
Suppose is a function of
variables
, which we think of as a vector variable
. Suppose
are unit vectors in
-space. Then, we have the following:
where are treated as column vectors, so
is
as a row vector, and
is
as a column vector. The multiplication on the right side is matrix multiplication. Note that this tells us that the bilinear form corresponding to the Hessian matrix outputs second-order directional derivatives.
Note further that if the second-order mixed partials are continuous, this forces the Hessian matrix to be symmetric, which means that the bilinear form we obtain is symmetric, and hence, we will get: