Hessian matrix

This article describes an analogue for functions of multiple variables of the following term/fact/notion for functions of one variable: second derivative

Definition

Definition in terms of Jacobian matrix and gradient vector

Suppose $f$ is a real-valued function of $n$ variables $x_{1}, x_{2}, \dots, x_{n}$ . The 'Hessian matrix of $f$ is a $n \times n$ -matrix-valued function with domain a subset of the domain of $f$ , defined as follows: the Hessian matrix at any point in the domain is the Jacobian matrix of the gradient vector of $f$ at the point. In point-free notation, we denote by $H (f)$ the Hessian matrix function, and we define it as:

$H (f) = J (\nabla f)$

Interpretation as second derivative

The Hessian matrix function is the correct notion of second derivative for a real-valued function of $n$ variables. Here's why:

The correct notion of first derivative for a scalar-valued function of multiple variables is the gradient vector, so the correct notion of first derivative for $f$ is $\nabla f$ .
The gradient vector $\nabla f$ is itself a vector-valued function with $n$ -dimensional inputs and $n$ -dimensional outputs. The correct notion of derivative for that is the Jacobian matrix, with $n$ -dimensional inputs and outputs valued in $n \times n$ -matrices.

Thus, the Hessian matrix is the correct notion of second derivative.

Definition in terms of second-order partial derivatives

For further information, refer: Relation between Hessian matrix and second-order partial derivatives

Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function $f$ is a real-valued function of $n$ variables $x_{1}, x_{2}, \dots, x_{n}$ , the Hessian matrix $H (f)$ is a $n \times n$ -matrix-valued function whose $(i j)^{t h}$ entry is the second-order partial derivative Failed to parse (syntax error): {\displaystyle \partial^2/(\partial x_j\partial x_i}} , which is the same as $f_{x_{i} x_{j}}$ . Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give second-order mixed partial derivatives.

Computationally useful definition at a point

For a function of two variables at a point

Suppose $f$ is a real-valued function of two variables $x, y$ and $(x_{0}, y_{0})$ is a point in the domain of $f$ at which $f$ is twice differentiable. In particular, this means that all the four second-order partial derivatives exist at $(x_{0}, y_{0})$ , i.e., the two pure second-order partials $f_{x x} (x_{0}, y_{0}), f_{y y} (x_{0}, y_{0})$ exist, and so do the two second-order mixed partial derivatives $f_{x y} (x_{0}, y_{0})$ and $f_{y x} (x_{0}, y_{0})$ . Then, the Hessian matrix of $f$ at $(x_{0}, y_{0})$ , denoted $H (f) (x_{0}, y_{0})$ , can be expressed explicitly as a $2 \times 2$ matrix of real numbers defined as follows:

$(\begin{matrix} f_{x x} (x_{0}, y_{0}) & f_{x y} (x_{0}, y_{0}) \\ f_{y x} (x_{0}, y_{0}) & f_{y y} (x_{0}, y_{0}) \end{matrix})$

For a function of multiple variables at a point

Suppose $f$ is a real-valued function of multiple variables $(x_{1}, x_{2}, \dots, x_{n})$ . Suppose $(a_{1}, a_{2}, \dots, a_{n})$ is a point in the domain of $f$ at which $f$ is twice differentiable. In other words, $a_{1}, a_{2}, \dots, a_{n}$ are real numbers and the point has coordinates $x_{1} = a_{1}, x_{2} = a_{2}, \dots, x_{n} = a_{n}$ . Suppose, further, that all the second-order partials (pure and mixed) of $f$ with respect to these variables exist at the point $(a_{1}, a_{2}, \dots, a_{n})$ . Then, the Hessian matrix of $f$ at $(a_{1}, a_{2}, \dots, a_{n})$ , denoted $H (f) (a_{1}, a_{2}, \dots, a_{n})$ , is a $n \times n$ matrix of real numbers that can be expressed explicitly as follows:

The $(i j)^{t h}$ entry (i.e., the entry in the $i^{t h}$ row and $j^{t h}$ column) is $f_{x_{i} x_{j}} (a_{1}, a_{2}, \dots, a_{n})$ . This is the same as $\frac{\partial^{2}}{\partial x_{j} \partial x_{i}} f (x_{1}, x_{2}, \dots, x_{n}) |_{(x_{1}, x_{2}, \dots, x_{n}) = (a_{1}, a_{2}, \dots, a_{n})}$ . Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

The matrix looks like this:

$(\begin{matrix} f_{x_{1} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{1} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{1} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \\ f_{x_{2} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{2} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{2} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ f_{x_{n} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{n} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{n} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \end{matrix})$

Definition as a function

For a function of two variables

Suppose $f$ is a real-valued function of two variables $x, y$ . The Hessian matrix of $f$ , denoted $H (f)$ , is a $2 \times 2$ matrix-valued function that sends each point to the Hessian matrix at that point, if that matrix is defined. It is defined as:

$(x_{0}, y_{0}) \mapsto H (f) (x_{0}, y_{0}) = (\begin{matrix} f_{x x} (x_{0}, y_{0}) & f_{x y} (x_{0}, y_{0}) \\ f_{y x} (x_{0}, y_{0}) & f_{y y} (x_{0}, y_{0}) \end{matrix})$

In the point-free notation, we can write this as:

$H (f) = (\begin{matrix} f_{x x} & f_{x y} \\ f_{y x} & f_{y y} \end{matrix})$

For a function of multiple variables

Suppose $f$ is a function of variables $x_{1}, x_{2}, \dots, x_{n}$ . The Hessian matrix of $f$ , denoted $H (f)$ , is a $n \times n$ matrix-valued function that sends each point to the Hessian matrix at that point, if the matrix is defined. It is defined as:

$(a_{1}, a_{2}, \dots, a_{n}) \mapsto H (f) (a_{1}, a_{2}, \dots, a_{n})$

In the point-free notation, we can write it as:

$(\begin{matrix} f_{x_{1} x_{1}} & f_{x_{1} x_{2}} & \dots & f_{x_{1} x_{n}} \\ f_{x_{2} x_{1}} & f_{x_{2} x_{2}} & \dots & f_{x_{2} x_{n}} \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ f_{x_{n} x_{1}} & f_{x_{n} x_{2}} & \dots & f_{x_{n} x_{n}} \end{matrix})$

Under continuity assumptions

If we assume that all the second-order partials of $f$ are continuous functions everywhere, then the following happens:

The Hessian matrix of $f$ at any point is a symmetric matrix, i.e., its $(i j)^{t h}$ entry equals its $(j i)^{t h}$ entry. This follows from Clairaut's theorem on equality of mixed partials.
We can think of the Hessian matrix as the second derivative of the function, i.e., it is a matrix describing the second derivative.
$f$ is twice differentiable as a function. Hence, the Hessian matrix of $f$ is the same as the Jacobian matrix of the gradient vector $\nabla f$ , where the latter is viewed as a vector-valued function.

Note that the final conclusion actually only requires the existence of the gradient vector, hence it holds even if the second-order partials are not continuous.