Hessian matrix: Difference between revisions

Revision as of 19:52, 24 April 2012

Definition at a point

For a function of two variables at a point

Suppose $f$ is a real-valued function of two variables $x, y$ and $(x_{0}, y_{0})$ is a point in the domain of $f$ . Suppose all the four second-order partial derivatives exist at $(x_{0}, y_{0})$ , i.e., the two pure second-order partials $f_{x x} (x_{0}, y_{0}), f_{y y} (x_{0}, y_{0})$ exist, and so do the two second-order mixed partial derivatives $f_{x y} (x_{0}, y_{0})$ and $f_{y x} (x_{0}, y_{0})$ . Then, the Hessian matrix of $f$ at $(x_{0}, y_{0})$ , denoted $H (f) (x_{0}, y_{0})$ , is a $2 \times 2$ matrix of real numbers defined as follows:

$(\begin{matrix} f_{x x} (x_{0}, y_{0}) & f_{x y} (x_{0}, y_{0}) \\ f_{y x} (x_{0}, y_{0}) & f_{y y} (x_{0}, y_{0}) \end{matrix})$

For a function of multiple variables at a point

Suppose $f$ is a real-valued function of multiple variables $(x_{1}, x_{2}, \dots, x_{n})$ . Suppose $(a_{1}, a_{2}, \dots, a_{n})$ is a point in the domain of $f$ . In other words, $a_{1}, a_{2}, \dots, a_{n}$ are real numbers and the point has coordinates $x_{1} = a_{1}, x_{2} = a_{2}, \dots, x_{n} = a_{n}$ . Suppose, further, that all the second-order partials (pure and mixed) of $f$ with respect to these variables exist at the point $(a_{1}, a_{2}, \dots, a_{n})$ . Then, the Hessian matrix of $f$ at $(a_{1}, a_{2}, \dots, a_{n})$ , denoted $H (f) (a_{1}, a_{2}, \dots, a_{n})$ , is a $n \times n$ matrix of real numbers defined as follows:

The $(i j)^{t h}$ entry (i.e., the entry in the $i^{t h}$ row and $j^{t h}$ column) is $f_{x_{i} x_{j}} (a_{1}, a_{2}, \dots, a_{n})$ . This is the same as $\frac{\partial^{2}}{\partial x_{j} \partial x_{i}} f (x_{1}, x_{2}, \dots, x_{n}) |_{(x_{1}, x_{2}, \dots, x_{n}) = (a_{1}, a_{2}, \dots, a_{n})}$ . Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

The matrix looks like this:

$(\begin{matrix} f_{x_{1} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{1} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{1} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \\ f_{x_{2} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{2} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{2} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ f_{x_{n} x_{1}} (a_{1}, a_{2}, \dots, a_{n}) & f_{x_{n} x_{2}} (a_{1}, a_{2}, \dots, a_{n}) & \dots & f_{x_{n} x_{n}} (a_{1}, a_{2}, \dots, a_{n}) \end{matrix})$

Definition as a function

For a function of two variables

Suppose $f$ is a real-valued function of two variables $x, y$ . The Hessian matrix of $f$ , denoted $H (f)$ , is a $2 \times 2$ matrix-valued function that sends each point to the Hessian matrix at that point, if that matrix is defined. It is defined as:

$(x_{0}, y_{0}) \mapsto H (f) (x_{0}, y_{0}) = (\begin{matrix} f_{x x} (x_{0}, y_{0}) & f_{x y} (x_{0}, y_{0}) \\ f_{y x} (x_{0}, y_{0}) & f_{y y} (x_{0}, y_{0}) \end{matrix})$

In the point-free notation, we can write this as:

$H (f) = (\begin{matrix} f_{x x} & f_{x y} \\ f_{y x} & f_{y y} \end{matrix})$

For a function of multiple variables

Suppose $f$ is a function of variables $x_{1}, x_{2}, \dots, x_{n}$ . The Hessian matrix of $f$ , denoted $H (f)$ , is a $n \times n$ matrix-valued function that sends each point to the Hessian matrix at that point, if the matrix is defined. It is defined as:

$(a_{1}, a_{2}, \dots, a_{n}) \mapsto H (f) (a_{1}, a_{2}, \dots, a_{n})$

In the point-free notation, we can write it as:

$(\begin{matrix} f_{x_{1} x_{1}} & f_{x_{1} x_{2}} & \dots & f_{x_{1} x_{n}} \\ f_{x_{2} x_{1}} & f_{x_{2} x_{2}} & \dots & f_{x_{2} x_{n}} \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot \\ f_{x_{n} x_{1}} & f_{x_{n} x_{2}} & \dots & f_{x_{n} x_{n}} \end{matrix})$

Under continuity assumptions

If we assume that all the second-order partials of $f$ are continuous functions everywhere, then the following happens:

The Hessian matrix of $f$ at any point is a symmetric matrix, i.e., its $(i j)^{t h}$ entry equals its $(j i)^{t h}$ entry. This follows from Clairaut's theorem on equality of mixed partials.
We can think of the Hessian matrix as the second derivative of the function, i.e., it is a matrix describing the second derivative.
$f$ is twice differentiable as a function. Hence, the Hessian matrix of $f$ is the same as the Jacobian matrix of the gradient vector $\nabla f$ , where the latter is viewed as a vector-valued function.

Note that the final conclusion actually only requires the existence of the gradient vector, hence it holds even if the second-order partials are not continuous.

@@ Line 6: / Line 6: @@
 <math>\begin{pmatrix} f_{xx}(x_0,y_0) & f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) & f_{yy}(x_0,y_0) \\\end{pmatrix}</math>
+<center>{{#widget:YouTube|id=47WX0VfWS8k}}</center>
 ===For a function of multiple variables at a point===
@@ Line 21: / Line 23: @@
 \cdot & \cdot & \cdot & \cdot\\
 f_{x_nx_1}(a_1,a_2,\dots,a_n) & f_{x_nx_2}(a_1,a_2,\dots,a_n) & \dots & f_{x_nx_n}(a_1,a_2,\dots,a_n)\\\end{pmatrix}</math>
+<center>{{#widget:YouTube|id=FIRMeFAeYqc}}</center>
 ==Definition as a function==
@@ Line 33: / Line 37: @@
 <math>H(f) = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \\\end{pmatrix}</math>
+<center>{{#widget:YouTube|id=39VO16DieuQ}}</center>
 ===For a function of multiple variables===
@@ Line 48: / Line 54: @@
 \cdot & \cdot & \cdot & \cdot\\
 f_{x_nx_1} & f_{x_nx_2} & \dots & f_{x_nx_n}\\\end{pmatrix}</math>
+<center>{{#widget:YouTube|id=DeFoV-NfjQQ}}</center>
 ==Under continuity assumptions==