Hessian matrix

From Calculus

This article describes an analogue for functions of multiple variables of the following term/fact/notion for functions of one variable: second derivative

Definition

Definition in terms of Jacobian matrix and gradient vector

Suppose is a real-valued function of variables . The 'Hessian matrix of is a -matrix-valued function with domain a subset of the domain of , defined as follows: the Hessian matrix at any point in the domain is the Jacobian matrix of the gradient vector of at the point. In point-free notation, we denote by the Hessian matrix function, and we define it as:

Interpretation as second derivative

The Hessian matrix function is the correct notion of second derivative for a real-valued function of variables. Here's why:

  • The correct notion of first derivative for a scalar-valued function of multiple variables is the gradient vector, so the correct notion of first derivative for is .
  • The gradient vector is itself a vector-valued function with -dimensional inputs and -dimensional outputs. The correct notion of derivative for that is the Jacobian matrix, with -dimensional inputs and outputs valued in -matrices.

Thus, the Hessian matrix is the correct notion of second derivative.

Definition in terms of second-order partial derivatives

For further information, refer: Relation between Hessian matrix and second-order partial derivatives

Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function is a real-valued function of variables , the Hessian matrix is a -matrix-valued function whose entry is the second-order partial derivative , which is the same as . Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give second-order mixed partial derivatives.

Computationally useful definition at a point

For a function of two variables at a point

Suppose is a real-valued function of two variables and is a point in the domain of at which is twice differentiable. In particular, this means that all the four second-order partial derivatives exist at , i.e., the two pure second-order partials exist, and so do the two second-order mixed partial derivatives and . Then, the Hessian matrix of at , denoted , can be expressed explicitly as a matrix of real numbers defined as follows:

{{#widget:YouTube|id=47WX0VfWS8k}}

For a function of multiple variables at a point

Suppose is a real-valued function of multiple variables . Suppose is a point in the domain of at which is twice differentiable. In other words, are real numbers and the point has coordinates . Suppose, further, that all the second-order partials (pure and mixed) of with respect to these variables exist at the point . Then, the Hessian matrix of at , denoted , is a matrix of real numbers that can be expressed explicitly as follows:

The entry (i.e., the entry in the row and column) is . This is the same as . Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

The matrix looks like this:

{{#widget:YouTube|id=FIRMeFAeYqc}}

Definition as a function

For a function of two variables

Suppose is a real-valued function of two variables . The Hessian matrix of , denoted , is a matrix-valued function that sends each point to the Hessian matrix at that point, if that matrix is defined. It is defined as:

In the point-free notation, we can write this as:

{{#widget:YouTube|id=39VO16DieuQ}}

For a function of multiple variables

Suppose is a function of variables . The Hessian matrix of , denoted , is a matrix-valued function that sends each point to the Hessian matrix at that point, if the matrix is defined. It is defined as:

In the point-free notation, we can write it as:

{{#widget:YouTube|id=DeFoV-NfjQQ}}

Under continuity assumptions

If we assume that all the second-order partials of are continuous functions everywhere, then the following happens:

  • The Hessian matrix of at any point is a symmetric matrix, i.e., its entry equals its entry. This follows from Clairaut's theorem on equality of mixed partials.
  • We can think of the Hessian matrix as the second derivative of the function, i.e., it is a matrix describing the second derivative.
  • is twice differentiable as a function. Hence, the Hessian matrix of is the same as the Jacobian matrix of the gradient vector , where the latter is viewed as a vector-valued function.

Note that the final conclusion actually only requires the existence of the gradient vector, hence it holds even if the second-order partials are not continuous.