Hessian matrix

From Calculus
Revision as of 19:52, 24 April 2012 by Vipul (talk | contribs)

Definition at a point

For a function of two variables at a point

Suppose f is a real-valued function of two variables x,y and (x0,y0) is a point in the domain of f. Suppose all the four second-order partial derivatives exist at (x0,y0), i.e., the two pure second-order partials fxx(x0,y0),fyy(x0,y0) exist, and so do the two second-order mixed partial derivatives fxy(x0,y0) and fyx(x0,y0). Then, the Hessian matrix of f at (x0,y0), denoted H(f)(x0,y0), is a 2×2 matrix of real numbers defined as follows:

(fxx(x0,y0)fxy(x0,y0)fyx(x0,y0)fyy(x0,y0))

{{#widget:YouTube|id=47WX0VfWS8k}}

For a function of multiple variables at a point

Suppose f is a real-valued function of multiple variables (x1,x2,,xn). Suppose (a1,a2,,an) is a point in the domain of f. In other words, a1,a2,,an are real numbers and the point has coordinates x1=a1,x2=a2,,xn=an. Suppose, further, that all the second-order partials (pure and mixed) of f with respect to these variables exist at the point (a1,a2,,an). Then, the Hessian matrix of f at (a1,a2,,an), denoted H(f)(a1,a2,,an), is a n×n matrix of real numbers defined as follows:

The (ij)th entry (i.e., the entry in the ith row and jth column) is fxixj(a1,a2,,an). This is the same as 2xjxif(x1,x2,,xn)|(x1,x2,,xn)=(a1,a2,,an). Note that in the two notations, the order in which we write the partials differs because the convention differs (left-to-right versus right-to-left).

The matrix looks like this:

(fx1x1(a1,a2,,an)fx1x2(a1,a2,,an)fx1xn(a1,a2,,an)fx2x1(a1,a2,,an)fx2x2(a1,a2,,an)fx2xn(a1,a2,,an)fxnx1(a1,a2,,an)fxnx2(a1,a2,,an)fxnxn(a1,a2,,an))

{{#widget:YouTube|id=FIRMeFAeYqc}}

Definition as a function

For a function of two variables

Suppose f is a real-valued function of two variables x,y. The Hessian matrix of f, denoted H(f), is a 2×2 matrix-valued function that sends each point to the Hessian matrix at that point, if that matrix is defined. It is defined as:

(x0,y0)H(f)(x0,y0)=(fxx(x0,y0)fxy(x0,y0)fyx(x0,y0)fyy(x0,y0))

In the point-free notation, we can write this as:

H(f)=(fxxfxyfyxfyy)

{{#widget:YouTube|id=39VO16DieuQ}}

For a function of multiple variables

Suppose f is a function of variables x1,x2,,xn. The Hessian matrix of f, denoted H(f), is a n×n matrix-valued function that sends each point to the Hessian matrix at that point, if the matrix is defined. It is defined as:

(a1,a2,,an)H(f)(a1,a2,,an)

In the point-free notation, we can write it as:

(fx1x1fx1x2fx1xnfx2x1fx2x2fx2xnfxnx1fxnx2fxnxn)

{{#widget:YouTube|id=DeFoV-NfjQQ}}

Under continuity assumptions

If we assume that all the second-order partials of f are continuous functions everywhere, then the following happens:

  • The Hessian matrix of f at any point is a symmetric matrix, i.e., its (ij)th entry equals its (ji)th entry. This follows from Clairaut's theorem on equality of mixed partials.
  • We can think of the Hessian matrix as the second derivative of the function, i.e., it is a matrix describing the second derivative.
  • f is twice differentiable as a function. Hence, the Hessian matrix of f is the same as the Jacobian matrix of the gradient vector f, where the latter is viewed as a vector-valued function.

Note that the final conclusion actually only requires the existence of the gradient vector, hence it holds even if the second-order partials are not continuous.