Hessian matrix: Difference between revisions

From Calculus
No edit summary
No edit summary
Line 1: Line 1:
{{multivariable analogue of|second derivative}}
{{multivariable analogue of|second derivative}}
==Definition at a point==
 
==Definition==
 
===Definition in terms of Jacobian matrix and gradient vector===
 
Suppose <math>f</math> is a real-valued function of <math>n</math> variables <math>x_1,x_2,\dots,x_n</math>. The '''Hessian matrix'' of <math>f</math> is a <math>n \times n</math>-matrix-valued function with [[domain]] a subset of the domain of <math>f</math>, defined as follows: the Hessian matrix at any point in the domain is the [[Jacobian matrix]] of the [[gradient vector]] of <math>f</math> at the point. In point-free notation, we denote by <math>H(f)</math> the Hessian matrix function, and we define it as:
 
<math>H(f) = J(\nabla f)</math>
 
===Interpretation as second derivative===
 
The Hessian matrix function is the correct notion of second derivative for a real-valued function of <math>n</math> variables. Here's why:
 
* The correct notion of ''first'' derivative for a scalar-valued function of multiple variables is the [[gradient vector]], so the correct notion of first derivative for <math>f</math> is <math>\nabla f</math>.
* The gradient vector <math>\nabla f</math> is itself a vector-valued function with <math>n</math>-dimensional inputs and <math>n</math>-dimensional outputs. The correct notion of derivative for ''that'' is the [[Jacobian matrix]], with <math>n</math>-dimensional inputs and outputs valued in <math>n \times n</math>-matrices.
 
Thus, the Hessian matrix is the correct notion of second derivative.
 
===Definition in terms of second-order partial derivatives===
 
{{further|[[Relation between Hessian matrix and second-order partial derivatives]]}}
 
Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function <math>f</math> is a real-valued function of <math>n</math> variables <math>x_1,x_2,\dots,x_n</math>, the Hessian matrix <math>H(f)</math> is a <math>n \times n</math>-matrix-valued function whose <math>(ij)^{th}</math> entry is the second-order partial derivative <math>\partial^2/(\partial x_j\partial x_i}</math>, which is the same as <math>f_{x_ix_j}</math>. Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give [[second-order mixed partial derivative]]s.
 
==Computationally useful definition at a point==


===For a function of two variables at a point===
===For a function of two variables at a point===


Suppose <math>f</math> is a real-valued function of two variables <math>x,y</math> and <math>(x_0,y_0)</math> is a point in the domain of <math>f</math>. Suppose all the four second-order partial derivatives exist at <math>(x_0,y_0)</math>, i.e., the two pure second-order partials <math>f_{xx}(x_0,y_0),f_{yy}(x_0,y_0)</math> exist, and so do the two [[second-order mixed partial derivative]]s <math>f_{xy}(x_0,y_0)</math> and <math>f_{yx}(x_0,y_0)</math>. Then, the Hessian matrix of <math>f</math> at <math>(x_0,y_0)</math>, denoted <math>H(f)(x_0,y_0)</math>, is a <math>2 \times 2</math> matrix of real numbers defined as follows:
Suppose <math>f</math> is a real-valued function of two variables <math>x,y</math> and <math>(x_0,y_0)</math> is a point in the domain of <math>f</math> at which <math>f</matH> is twice differentiable. In particular, this means that all the four second-order partial derivatives exist at <math>(x_0,y_0)</math>, i.e., the two pure second-order partials <math>f_{xx}(x_0,y_0),f_{yy}(x_0,y_0)</math> exist, and so do the two [[second-order mixed partial derivative]]s <math>f_{xy}(x_0,y_0)</math> and <math>f_{yx}(x_0,y_0)</math>. Then, the Hessian matrix of <math>f</math> at <math>(x_0,y_0)</math>, denoted <math>H(f)(x_0,y_0)</math>, can be expressed explicitly as a <math>2 \times 2</math> matrix of real numbers defined as follows:


<math>\begin{pmatrix} f_{xx}(x_0,y_0) & f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) & f_{yy}(x_0,y_0) \\\end{pmatrix}</math>
<math>\begin{pmatrix} f_{xx}(x_0,y_0) & f_{xy}(x_0,y_0) \\ f_{yx}(x_0,y_0) & f_{yy}(x_0,y_0) \\\end{pmatrix}</math>
Line 12: Line 36:
===For a function of multiple variables at a point===
===For a function of multiple variables at a point===


Suppose <math>f</math> is a real-valued function of multiple variables <math>(x_1,x_2,\dots,x_n)</math>. Suppose <math>(a_1,a_2,\dots,a_n)</math> is a point in the domain of <math>f</math>. In other words, <math>a_1,a_2,\dots,a_n</math> are real numbers and the point has coordinates <math>x_1 = a_1, x_2 = a_2, \dots,x_n = a_n</math>. Suppose, further, that all the second-order partials (pure and mixed) of <math>f</math> with respect to these variables exist at the point <math>(a_1,a_2,\dots,a_n)</math>. Then, the Hessian matrix of <math>f</math> at <math>(a_1,a_2,\dots,a_n)</math>, denoted <math>H(f)(a_1,a_2,\dots,a_n)</math>, is a <math>n \times n</math> matrix of real numbers defined as follows:
Suppose <math>f</math> is a real-valued function of multiple variables <math>(x_1,x_2,\dots,x_n)</math>. Suppose <math>(a_1,a_2,\dots,a_n)</math> is a point in the domain of <math>f</math> at which <math>f</math> is twice differentiable. In other words, <math>a_1,a_2,\dots,a_n</math> are real numbers and the point has coordinates <math>x_1 = a_1, x_2 = a_2, \dots,x_n = a_n</math>. Suppose, further, that all the second-order partials (pure and mixed) of <math>f</math> with respect to these variables exist at the point <math>(a_1,a_2,\dots,a_n)</math>. Then, the Hessian matrix of <math>f</math> at <math>(a_1,a_2,\dots,a_n)</math>, denoted <math>H(f)(a_1,a_2,\dots,a_n)</math>, is a <math>n \times n</math> matrix of real numbers that can be expressed explicitly as follows:


The <math>(ij)^{th}</math> entry (i.e., the entry in the <math>i^{th}</math> row and <math>j^{th}</math> column) is <math>f_{x_ix_j}(a_1,a_2,\dots,a_n)</math>. This is the same as <math>\frac{\partial^2}{\partial x_j \partial x_i}f(x_1,x_2,\dots,x_n)|_{(x_1,x_2,\dots,x_n) = (a_1,a_2,\dots,a_n)}</math>. Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).
The <math>(ij)^{th}</math> entry (i.e., the entry in the <math>i^{th}</math> row and <math>j^{th}</math> column) is <math>f_{x_ix_j}(a_1,a_2,\dots,a_n)</math>. This is the same as <math>\frac{\partial^2}{\partial x_j \partial x_i}f(x_1,x_2,\dots,x_n)|_{(x_1,x_2,\dots,x_n) = (a_1,a_2,\dots,a_n)}</math>. Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

Revision as of 16:12, 12 May 2012

This article describes an analogue for functions of multiple variables of the following term/fact/notion for functions of one variable: second derivative

Definition

Definition in terms of Jacobian matrix and gradient vector

Suppose f is a real-valued function of n variables x1,x2,,xn. The 'Hessian matrix of f is a n×n-matrix-valued function with domain a subset of the domain of f, defined as follows: the Hessian matrix at any point in the domain is the Jacobian matrix of the gradient vector of f at the point. In point-free notation, we denote by H(f) the Hessian matrix function, and we define it as:

H(f)=J(f)

Interpretation as second derivative

The Hessian matrix function is the correct notion of second derivative for a real-valued function of n variables. Here's why:

  • The correct notion of first derivative for a scalar-valued function of multiple variables is the gradient vector, so the correct notion of first derivative for f is f.
  • The gradient vector f is itself a vector-valued function with n-dimensional inputs and n-dimensional outputs. The correct notion of derivative for that is the Jacobian matrix, with n-dimensional inputs and outputs valued in n×n-matrices.

Thus, the Hessian matrix is the correct notion of second derivative.

Definition in terms of second-order partial derivatives

For further information, refer: Relation between Hessian matrix and second-order partial derivatives

Wherever the Hessian matrix for a function exists, its entries can be described as second-order partial derivatives of the function. Explicitly, for a function f is a real-valued function of n variables x1,x2,,xn, the Hessian matrix H(f) is a n×n-matrix-valued function whose (ij)th entry is the second-order partial derivative Failed to parse (syntax error): {\displaystyle \partial^2/(\partial x_j\partial x_i}} , which is the same as fxixj. Note that the diagonal entries give second-order pure partial derivatives whereas the off-diagonal entries give second-order mixed partial derivatives.

Computationally useful definition at a point

For a function of two variables at a point

Suppose f is a real-valued function of two variables x,y and (x0,y0) is a point in the domain of f at which f is twice differentiable. In particular, this means that all the four second-order partial derivatives exist at (x0,y0), i.e., the two pure second-order partials fxx(x0,y0),fyy(x0,y0) exist, and so do the two second-order mixed partial derivatives fxy(x0,y0) and fyx(x0,y0). Then, the Hessian matrix of f at (x0,y0), denoted H(f)(x0,y0), can be expressed explicitly as a 2×2 matrix of real numbers defined as follows:

(fxx(x0,y0)fxy(x0,y0)fyx(x0,y0)fyy(x0,y0))

{{#widget:YouTube|id=47WX0VfWS8k}}

For a function of multiple variables at a point

Suppose f is a real-valued function of multiple variables (x1,x2,,xn). Suppose (a1,a2,,an) is a point in the domain of f at which f is twice differentiable. In other words, a1,a2,,an are real numbers and the point has coordinates x1=a1,x2=a2,,xn=an. Suppose, further, that all the second-order partials (pure and mixed) of f with respect to these variables exist at the point (a1,a2,,an). Then, the Hessian matrix of f at (a1,a2,,an), denoted H(f)(a1,a2,,an), is a n×n matrix of real numbers that can be expressed explicitly as follows:

The (ij)th entry (i.e., the entry in the ith row and jth column) is fxixj(a1,a2,,an). This is the same as 2xjxif(x1,x2,,xn)|(x1,x2,,xn)=(a1,a2,,an). Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

The matrix looks like this:

(fx1x1(a1,a2,,an)fx1x2(a1,a2,,an)fx1xn(a1,a2,,an)fx2x1(a1,a2,,an)fx2x2(a1,a2,,an)fx2xn(a1,a2,,an)fxnx1(a1,a2,,an)fxnx2(a1,a2,,an)fxnxn(a1,a2,,an))

{{#widget:YouTube|id=FIRMeFAeYqc}}

Definition as a function

For a function of two variables

Suppose f is a real-valued function of two variables x,y. The Hessian matrix of f, denoted H(f), is a 2×2 matrix-valued function that sends each point to the Hessian matrix at that point, if that matrix is defined. It is defined as:

(x0,y0)H(f)(x0,y0)=(fxx(x0,y0)fxy(x0,y0)fyx(x0,y0)fyy(x0,y0))

In the point-free notation, we can write this as:

H(f)=(fxxfxyfyxfyy)

{{#widget:YouTube|id=39VO16DieuQ}}

For a function of multiple variables

Suppose f is a function of variables x1,x2,,xn. The Hessian matrix of f, denoted H(f), is a n×n matrix-valued function that sends each point to the Hessian matrix at that point, if the matrix is defined. It is defined as:

(a1,a2,,an)H(f)(a1,a2,,an)

In the point-free notation, we can write it as:

(fx1x1fx1x2fx1xnfx2x1fx2x2fx2xnfxnx1fxnx2fxnxn)

{{#widget:YouTube|id=DeFoV-NfjQQ}}

Under continuity assumptions

If we assume that all the second-order partials of f are continuous functions everywhere, then the following happens:

  • The Hessian matrix of f at any point is a symmetric matrix, i.e., its (ij)th entry equals its (ji)th entry. This follows from Clairaut's theorem on equality of mixed partials.
  • We can think of the Hessian matrix as the second derivative of the function, i.e., it is a matrix describing the second derivative.
  • f is twice differentiable as a function. Hence, the Hessian matrix of f is the same as the Jacobian matrix of the gradient vector f, where the latter is viewed as a vector-valued function.

Note that the final conclusion actually only requires the existence of the gradient vector, hence it holds even if the second-order partials are not continuous.