Hessian matrix: Difference between revisions

Latest revision as of 00:06, 24 April 2014

This article describes an analogue for functions of multiple variables of the following term/fact/notion for functions of one variable: second derivative

Definition

Definition in terms of Jacobian matrix and gradient vector

Suppose $f$ is a real-valued function of $n$ variables $x_{1},x_{2},\dots ,x_{n}$ . The Hessian matrix of $f$ is a $n\times n$ -matrix-valued function with domain a subset of the domain of $f$ , defined as follows: the Hessian matrix at any point in the domain is the Jacobian matrix of the gradient vector of $f$ at the point. In point-free notation, we denote by $H(f)$ the Hessian matrix function, and we define it as:

$H(f)=J(\nabla f)$

Interpretation as second derivative

The Hessian matrix function is the correct notion of second derivative for a real-valued function of $n$ variables. Here's why:

The correct notion of first derivative for a scalar-valued function of multiple variables is the gradient vector, so the correct notion of first derivative for $f$ is $\nabla f$ .
The gradient vector $\nabla f$ is itself a vector-valued function with $n$ -dimensional inputs and $n$ -dimensional outputs. The correct notion of derivative for that is the Jacobian matrix, with $n$ -dimensional inputs and outputs valued in $n\times n$ -matrices.

Thus, the Hessian matrix is the correct notion of second derivative.

Relation with second-order partial derivatives

For further information, refer: Relation between Hessian matrix and second-order partial derivatives

Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function $f$ is a real-valued function of $n$ variables $x_{1},x_{2},\dots ,x_{n}$ , the Hessian matrix $H(f)$ is a $n\times n$ -matrix-valued function whose $(ij)^{th}$ entry is the second-order partial derivative $\partial ^{2}f/(\partial x_{j}\partial x_{i})$ , which is the same as $f_{x_{i}x_{j}}$ . Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give second-order mixed partial derivatives.

Some people choose to define the Hessian matrix as the matrix whose entries are the second-order partial derivatives as indicated here. However, that is not quite the correct definition of Hessian matrix because it is possible for all the second-order partial derivatives to exist but for the function to not be twice differentiable at the point. The main disadvantage of defining the Hessian matrix in the more expansive sense (i.e., in terms of second-order partial derivatives) is that all the important results about the Hessian matrix crucially rely on the function being twice differentiable, so we don't actually gain anything by using the more expansive definition.

Continuity assumptions and symmetric matrix

If we assume that all the second-order mixed partial derivatives are continuous at and around a point in the domain, and the Hessian matrix exists, then the Hessian matrix must be a symmetric matrix by Clairaut's theorem on equality of mixed partials. Note that we don't need to assume for this that the second-order pure partials are continuous at or around the point.

In symbols, for a function $f$ of variables $x_{1},x_{2},\dots ,x_{n}$ , we get:

$H(f)_{ij}=f_{x_{i}x_{j}}=f_{x_{j}x_{i}}=H(f)_{ji}\ \forall i,j\in \{1,2,\dots ,n\}$

Relation with second-order directional derivatives

For further information, refer: Hessian matrix defines bilinear form that outputs second-order directional derivatives

Suppose $f$ is a function of $n$ variables $x_{1},x_{2},\dots ,x_{n}$ , which we think of as a vector variable ${\overline {x}}$ . Suppose ${\overline {u}},{\overline {v}}$ are unit vectors in $n$ -space. Then, we have the following:

$D_{\overline {v}}(D_{\overline {u}}(f))={\overline {u}}^{T}H(f){\overline {v}}$

where ${\overline {u}},{\overline {v}}$ are treated as column vectors, so ${\overline {u}}^{T}$ is ${\overline {u}}$ as a row vector, and ${\overline {v}}$ is ${\overline {v}}$ as a column vector. The multiplication on the right side is matrix multiplication. Note that this tells us that the bilinear form corresponding to the Hessian matrix outputs second-order directional derivatives.

Note further that if the second-order mixed partials are continuous, this forces the Hessian matrix to be symmetric, which means that the bilinear form we obtain is symmetric, and hence, we will get:

$D_{\overline {v}}(D_{\overline {u}}(f))=D_{\overline {u}}(D_{\overline {v}}(f))$

@@ Line 5: / Line 5: @@
 ===Definition in terms of Jacobian matrix and gradient vector===
-Suppose <math>f</math> is a real-valued function of <math>n</math> variables <math>x_1,x_2,\dots,x_n</math>. The '''Hessian matrix'' of <math>f</math> is a <math>n \times n</math>-matrix-valued function with [[domain]] a subset of the domain of <math>f</math>, defined as follows: the Hessian matrix at any point in the domain is the [[Jacobian matrix]] of the [[gradient vector]] of <math>f</math> at the point. In point-free notation, we denote by <math>H(f)</math> the Hessian matrix function, and we define it as:
+Suppose <math>f</math> is a real-valued function of <math>n</math> variables <math>x_1,x_2,\dots,x_n</math>. The '''Hessian matrix''' of <math>f</math> is a <math>n \times n</math>-matrix-valued function with [[domain]] a subset of the domain of <math>f</math>, defined as follows: the Hessian matrix at any point in the domain is the [[Jacobian matrix]] of the [[gradient vector]] of <math>f</math> at the point. In point-free notation, we denote by <math>H(f)</math> the Hessian matrix function, and we define it as:
 <math>H(f) = J(\nabla f)</math>
@@ Line 18: / Line 18: @@
 Thus, the Hessian matrix is the correct notion of second derivative.
-===Definition in terms of second-order partial derivatives===
+<center>{{#widget:YouTube|id=X84tMY_7hqU}}</center>
+==Relation with second-order partial derivatives==
 {{further|[[Relation between Hessian matrix and second-order partial derivatives]]}}
@@ Line 24: / Line 26: @@
 Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function <math>f</math> is a real-valued function of <math>n</math> variables <math>x_1,x_2,\dots,x_n</math>, the Hessian matrix <math>H(f)</math> is a <math>n \times n</math>-matrix-valued function whose <math>(ij)^{th}</math> entry is the second-order partial derivative <math>\partial^2f/(\partial x_j\partial x_i)</math>, which is the same as <math>f_{x_ix_j}</math>. Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give [[second-order mixed partial derivative]]s.
-==Computationally useful definition at a point==
+Some people choose to ''define'' the Hessian matrix as the matrix whose entries are the second-order partial derivatives as indicated here. However, that is not quite the correct definition of Hessian matrix because it is possible for all the second-order partial derivatives to exist but for the function to not be twice differentiable at the point. The main disadvantage of defining the Hessian matrix in the more expansive sense (i.e., in terms of second-order partial derivatives) is that all the important results about the Hessian matrix crucially rely on the function being twice differentiable, so we don't actually gain anything by using the more expansive definition.
-===For a function of two variables at a point===
-Suppose <math>f</math> is a real-valued function of two variables <math>x,y</math> and <math>(x_0,y_0)</math> is a point in the domain of <math>f</math> at which <math>f</matH> is twice differentiable. In particular, this means that all the four second-order partial derivatives exist at <math>(x_0,y_0)</math>, i.e., the two pure second-order partials <math>f_{xx}(x_0,y_0),f_{yy}(x_0,y_0)</math> exist, and so do the two [[second-order mixed partial derivative]]s <math>f_{xy}(x_0,y_0)</math> and <math>f_{yx}(x_0,y_0)</math>. Then, the Hessian matrix of <math>f</math> at <math>(x_0,y_0)</math>, denoted <math>H(f)(x_0,y_0)</math>, can be expressed explicitly as a <math>2 \times 2</math> matrix of real numbers defined as follows:
-<math>\begin{pmatrix} f_{xx}(x_0,y_0) & f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) & f_{yy}(x_0,y_0) \\\end{pmatrix}</math>
-<center>{{#widget:YouTube|id=47WX0VfWS8k}}</center>
-===For a function of multiple variables at a point===
-Suppose <math>f</math> is a real-valued function of multiple variables <math>(x_1,x_2,\dots,x_n)</math>. Suppose <math>(a_1,a_2,\dots,a_n)</math> is a point in the domain of <math>f</math> at which <math>f</math> is twice differentiable. In other words, <math>a_1,a_2,\dots,a_n</math> are real numbers and the point has coordinates <math>x_1 = a_1, x_2 = a_2, \dots,x_n = a_n</math>. Suppose, further, that all the second-order partials (pure and mixed) of <math>f</math> with respect to these variables exist at the point <math>(a_1,a_2,\dots,a_n)</math>. Then, the Hessian matrix of <math>f</math> at <math>(a_1,a_2,\dots,a_n)</math>, denoted <math>H(f)(a_1,a_2,\dots,a_n)</math>, is a <math>n \times n</math> matrix of real numbers that can be expressed explicitly as follows:
-The <math>(ij)^{th}</math> entry (i.e., the entry in the <math>i^{th}</math> row and <math>j^{th}</math> column) is <math>f_{x_ix_j}(a_1,a_2,\dots,a_n)</math>. This is the same as <math>\frac{\partial^2}{\partial x_j \partial x_i}f(x_1,x_2,\dots,x_n)|_{(x_1,x_2,\dots,x_n) = (a_1,a_2,\dots,a_n)}</math>. Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).
-The matrix looks like this:
-<math>\begin{pmatrix} f_{x_1x_1}(a_1,a_2,\dots,a_n) & f_{x_1x_2}(a_1,a_2,\dots,a_n) & \dots & f_{x_1x_n}(a_1,a_2,\dots,a_n)\\
-f_{x_2x_1}(a_1,a_2,\dots,a_n) & f_{x_2x_2}(a_1,a_2,\dots,a_n) & \dots & f_{x_2x_n}(a_1,a_2,\dots,a_n)\\
-\cdot & \cdot & \cdot& \cdot\\
-\cdot & \cdot & \cdot & \cdot\\
-\cdot & \cdot & \cdot & \cdot\\
-f_{x_nx_1}(a_1,a_2,\dots,a_n) & f_{x_nx_2}(a_1,a_2,\dots,a_n) & \dots & f_{x_nx_n}(a_1,a_2,\dots,a_n)\\\end{pmatrix}</math>
-<center>{{#widget:YouTube|id=FIRMeFAeYqc}}</center>
-==Definition as a function==
-===For a function of two variables===
-Suppose <math>f</math> is a real-valued function of two variables <math>x,y</math>. The '''Hessian matrix''' of <math>f</math>, denoted <math>H(f)</math>, is a <math>2 \times 2</math> matrix-valued function that sends each point to the Hessian matrix at that point, if that matrix is defined. It is defined as:
-<math>(x_0,y_0) \mapsto H(f)(x_0,y_0) = \begin{pmatrix} f_{xx}(x_0,y_0) & f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) & f_{yy}(x_0,y_0) \\\end{pmatrix}</math>
-In the point-free notation, we can write this as:
-<math>H(f) = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \\\end{pmatrix}</math>
-<center>{{#widget:YouTube|id=39VO16DieuQ}}</center>
+==Continuity assumptions and symmetric matrix==
-===For a function of multiple variables===
+If we assume that all the [[second-order mixed partial derivative]]s are continuous at and around a point in the domain, and the Hessian matrix exists, then the Hessian matrix must be a symmetric matrix by [[Clairaut's theorem on equality of mixed partials]]. Note that we don't need to assume for this that the second-order ''pure'' partials are continuous at or around the point.
-Suppose <math>f</math> is a function of variables <math>x_1,x_2,\dots,x_n</math>. The '''Hessian matrix''' of <math>f</math>, denoted <math>H(f)</math>, is a <math>n \times n</math> matrix-valued function that sends each point to the Hessian matrix at that point, if the matrix is defined. It is defined as:
+In symbols, for a function <math>f</math> of variables <math>x_1,x_2,\dots,x_n</math>, we get:
-<math>(a_1,a_2,\dots,a_n) \mapsto H(f)(a_1,a_2,\dots,a_n)</math>
+<math>H(f)_{ij} = f_{x_ix_j} = f_{x_jx_i} = H(f)_{ji} \ \forall i,j \in \{ 1,2,\dots,n \}</math>
-In the point-free notation, we can write it as:
+==Relation with second-order directional derivatives==
-<math>\begin{pmatrix} f_{x_1x_1} & f_{x_1x_2}& \dots & f_{x_1x_n}\\
+{{further|[[Hessian matrix defines bilinear form that outputs second-order directional derivatives]]}}
-f_{x_2x_1} & f_{x_2x_2} & \dots & f_{x_2x_n}\\
-\cdot & \cdot & \cdot& \cdot\\
-\cdot & \cdot & \cdot & \cdot\\
-\cdot & \cdot & \cdot & \cdot\\
-f_{x_nx_1} & f_{x_nx_2} & \dots & f_{x_nx_n}\\\end{pmatrix}</math>
-<center>{{#widget:YouTube|id=DeFoV-NfjQQ}}</center>
+Suppose <math>f</math> is a function of <math>n</math> variables <math>x_1,x_2,\dots,x_n</math>, which we think of as a vector variable <math>\overline{x}</math>. Suppose <math>\overline{u},\overline{v}</math> are unit vectors in <math>n</math>-space. Then, we have the following:
-==Under continuity assumptions==
+<math>D_{\overline{v}}(D_{\overline{u}}(f)) = \overline{u}^TH(f)\overline{v}</math>
-If we assume that all the second-order partials of <math>f</math> are continuous functions everywhere, then the following happens:
+where <math>\overline{u},\overline{v}</math> are treated as column vectors, so <math>\overline{u}^T</math> is <math>\overline{u}</math> as a row vector, and <math>\overline{v}</math> is <math>\overline{v}</math> as a column vector. The multiplication on the right side is matrix multiplication. Note that this tells us that the bilinear form corresponding to the Hessian matrix outputs second-order directional derivatives.
-* The Hessian matrix of <math>f</math> at any point is a symmetric matrix, i.e., its <math>(ij)^{th}</math> entry equals its <math>(ji)^{th}</matH> entry. This follows from [[Clairaut's theorem on equality of mixed partials]].
+Note further that if the second-order mixed partials are continuous, this forces the Hessian matrix to be symmetric, which means that the bilinear form we obtain is symmetric, and hence, we will get:
-* We can think of the Hessian matrix as '''the''' second derivative of the function, i.e., it is a matrix describing the second derivative.
-* <math>f</math> is twice differentiable as a function. Hence, the Hessian matrix of <math>f</math> is the same as the [[Jacobian matrix]] of the [[gradient vector]] <math>\nabla f</math>, where the latter is viewed as a vector-valued function.
-Note that the final conclusion actually only requires the existence of the [[gradient vector]], hence it holds even if the second-order partials are not continuous.
+<math>D_{\overline{v}}(D_{\overline{u}}(f)) = D_{\overline{u}}(D_{\overline{v}}(f))</math>