Relation between Hessian matrix and second-order partial derivatives

Relation at a point

For a function of two variables at a point

Suppose $f$ is a real-valued function of two variables $x, y$ and $(x_{0}, y_{0})$ is a point in the domain of $f$ at which $f$ is twice differentiable. In particular, this means that all the four second-order partial derivatives exist at $(x_{0}, y_{0})$ , i.e., the two pure second-order partials $f_{x x} (x_{0}, y_{0}), f_{y y} (x_{0}, y_{0})$ exist, and so do the two second-order mixed partial derivatives $f_{x y} (x_{0}, y_{0})$ and $f_{y x} (x_{0}, y_{0})$ . Then, the Hessian matrix of $f$ at $(x_{0}, y_{0})$ , denoted $H (f) (x_{0}, y_{0})$ , can be expressed explicitly as a $2 \times 2$ matrix of real numbers defined as follows:

$(\begin{matrix} f_{x x} (x_{0}, y_{0}) & f_{x y} (x_{0}, y_{0}) \\ f_{y x} (x_{0}, y_{0}) & f_{y y} (x_{0}, y_{0}) \end{matrix})$

Note that it may be possible that $f$ is not twice differentiable at $(x_{0}, y_{0})$ but the above $2 \times 2$ matrix still exists. In that case, we do not call it the Hessian matrix, because it does not satisfy any of the nice behavior that we expect of the Hessian matrix.

For a function of multiple variables at a point

Suppose $f$ is a real-valued function of multiple variables $(x_{1}, x_{2}, \dots, x_{n})$ . Suppose $(a_{1}, a_{2}, \dots, a_{n})$ is a point in the domain of $f$ at which $f$ is twice differentiable. In other words, $a_{1}, a_{2}, \dots, a_{n}$ are real numbers and the point has coordinates $x_{1} = a_{1}, x_{2} = a_{2}, \dots, x_{n} = a_{n}$ . It then follows that all the second-order partials (pure and mixed) of $f$ with respect to these variables exist at the point $(a_{1}, a_{2}, \dots, a_{n})$ . Then, the Hessian matrix of $f$ at $(a_{1}, a_{2}, \dots, a_{n})$ , denoted $H (f) (a_{1}, a_{2}, \dots, a_{n})$ , is a $n \times n$ matrix of real numbers that can be expressed explicitly as follows:

The $(i j)^{t h}$ entry (i.e., the entry in the $i^{t h}$ row and $j^{t h}$ column) is $f_{x_{i} x_{j}} (a_{1}, a_{2}, \dots, a_{n})$ . This is the same as $\frac{\partial^{2}}{\partial x_{j} \partial x_{i}} f (x_{1}, x_{2}, \dots, x_{n}) |_{(x_{1}, x_{2}, \dots, x_{n}) = (a_{1}, a_{2}, \dots, a_{n})}$ . Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

The matrix looks like this:

$(\begin{matrix} f_{x_{1} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{1} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{1} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \\ f_{x_{2} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{2} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{2} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ f_{x_{n} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{n} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{n} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \end{matrix})$

Note that it may be possible that $f$ is not twice differentiable at $(a_{1}, a_{2}, \dots, a_{n})$ but the above $n \times n$ matrix still exists. In that case, we do not call it the Hessian matrix, because it does not satisfy any of the nice behavior that we expect of the Hessian matrix.

Relation as functions

For a function of two variables

Suppose $f$ is a real-valued function of two variables $x, y$ . The Hessian matrix of $f$ , denoted $H (f)$ satisfies the following:

$H (f) (x, y) = (\begin{matrix} f_{x x} (x, y) & f_{x y} (x, y) \\ f_{y x} (x, y) & f_{y y} (x, y) \end{matrix})$

The equality holds whenever the left side makes sense. In other words, the domain of $H (f)$ is contained in the domain of the right side, but it is possible that the domain of definition for the matrix on the right side is strictly bigger than the domain of definition of $H (f)$ .

We can also write the above in point-free notation:

$H (f) (x, y) = (\begin{matrix} f_{x x} & f_{x y} \\ f_{y x} & f_{y y} \end{matrix})$

Again, equality holds whenever the left side makes sense.

For a function of multiple variables

Suppose $f$ is a function of variables $x_{1}, x_{2}, \dots, x_{n}$ . The Hessian matrix of $f$ , denoted $H (f)$ , is related to the second-order partial derivatives as follows (expressed in point-free notation):

$H (f) = (\begin{matrix} f_{x_{1} x_{1}} & f_{x_{1} x_{2}} & \dots & f_{x_{1} x_{n}} \\ f_{x_{2} x_{1}} & f_{x_{2} x_{2}} & \dots & f_{x_{2} x_{n}} \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ f_{x_{n} x_{1}} & f_{x_{n} x_{2}} & \dots & f_{x_{n} x_{n}} \end{matrix})$

The equality holds whenever the left side makes sense. In other words, the domain of $H (f)$ is contained in the domain of the right side, but it is possible that the domain of definition for the matrix on the right side is strictly bigger than the domain of definition of $H (f)$ .

Note for functions with continuous second-order partials

If all the second-order partial derivatives (both pure and mixed) of $f$ are continuous functions at and around a particular point in the domain, then the Hessian matrix exists and is given by the matrix expression in terms of second-order partials. In particular, if all the second-order partials of $f$ exist and are continuous everywhere, then the Hessian matrix of $f$ exists everywhere and is given by the matrix expression everywhere.

The key role of continuity is as follows: it helps us conclude that, in fact, the Hessian matrix does exist. Without checking continuity, we can still compute the matrix of second-order partials, and we know it must equal the Hessian matrix if it exists, but we don't know whether the Hessian matrix exists. Continuity helps bridge that knowledge gap, allowing us to go from computing second-order partials to computing the Hessian matrix without reverting to first principles.