Quadratic function of multiple variables: Difference between revisions

From Calculus
 
(10 intermediate revisions by the same user not shown)
Line 18: Line 18:


{| class="sortable" border="1"
{| class="sortable" border="1"
! Item !! Value !! Consistency with the case <math>n = 1</math>, where <math>f(x) = ax^2 + bx + c</math>, <math>A = (a), \vec{b} = b</math>
! Item !! Value !! Consistency with the case <math>n = 1</math>, where <math>f(x) = ax^2 + bx + c</math>, <math>A = (a)</math> (a <math>1 \times 1</math> matrix), <math>\vec{b} = (b)</math> (a 1-dimensional vector)
|-
|-
| default [[domain]] || the whole of <math>\R^n</math> || the whole of <math>\R</math>
| default [[domain]] || the whole of <math>\R^n</math> || the whole of <math>\R</math>
Line 26: Line 26:
| [[local minimum value]] and points of attainment || If the matrix <math>A</math> is positive definite, then <math>c - \frac{1}{4}\vec{b}^TA^{-1}\vec{b}</math>, attained at <math>\frac{-1}{2}A^{-1}\vec{b}</math><br>If <math>A</math> is positive semidefinite but not positive definite, it depends on whether <math>\vec{b}</math> is in the image of <math>A</math>. If yes, replace <math>A^{-1}\vec{b}</math> with the solution <math>\vec{v}</math> to <math>A\vec{v} = \vec{b}</math>, so we get a local minimum of <math>c - \frac{1}{4}\vec{b}^T\vec{v}</math> attained at <math>\frac{-1}{2}\vec{v}</math><br>If <math>A</math> is not positive semidefinite or if <math>\vec{b}</math> is not in the image of <math>A</math>, no local minimum value || The positive definite case corresponds to <math>a > 0</math>: Here, the local minimum value of <math>c - \frac{b^2}{4a}</math> is attained at <math>\frac{-b}{2a}</math> (consistent with the matrix formulation)<br>The negative definite case corresponds to <math>a < 0</math>, and there is no minimum in this case.
| [[local minimum value]] and points of attainment || If the matrix <math>A</math> is positive definite, then <math>c - \frac{1}{4}\vec{b}^TA^{-1}\vec{b}</math>, attained at <math>\frac{-1}{2}A^{-1}\vec{b}</math><br>If <math>A</math> is positive semidefinite but not positive definite, it depends on whether <math>\vec{b}</math> is in the image of <math>A</math>. If yes, replace <math>A^{-1}\vec{b}</math> with the solution <math>\vec{v}</math> to <math>A\vec{v} = \vec{b}</math>, so we get a local minimum of <math>c - \frac{1}{4}\vec{b}^T\vec{v}</math> attained at <math>\frac{-1}{2}\vec{v}</math><br>If <math>A</math> is not positive semidefinite or if <math>\vec{b}</math> is not in the image of <math>A</math>, no local minimum value || The positive definite case corresponds to <math>a > 0</math>: Here, the local minimum value of <math>c - \frac{b^2}{4a}</math> is attained at <math>\frac{-b}{2a}</math> (consistent with the matrix formulation)<br>The negative definite case corresponds to <math>a < 0</math>, and there is no minimum in this case.
|-
|-
| [[local maximum value]] and points of attainment || If the matrix <math>A</math> is negative semidefinite, then <math>c - \frac{1}{4}\vec{b}^TA^{-1}\vec{b}</math>, attained at <math>\frac{-1}{2}A^{-1}\vec{b}</math><br>If <math>A</math> is negative semidefinite but not negative definite, it depends on whether <math>\vec{b}</math> is in the image of <math>A</math>. If yes, replace <math>A^{-1}\vec{b}</math> with the solution <math>\vec{v}</math> to <math>A\vec{v} = \vec{b}</math>, so we get a local minimum of <math>c - \frac{1}{4}\vec{b}^T\vec{v}</math> attained at <math>\frac{-1}{2}\vec{v}</math><br>If <math>A</math> is not positive semidefinite or if <math>\vec{b}</math> is not in the image of <math>A</math>, no local minimum value || The negative definite case corresponds to <math>a < 0</math>: Here, the local maximum value of <math>c - \frac{b^2}{4a}</math> is attained at <math>\frac{-b}{2a}</math> (consistent with the matrix formulation)<br>The positive definite case corresponds to <math>a > 0</math>, and there is no maximum in this case.
| [[local maximum value]] and points of attainment || If the matrix <math>A</math> is negative definite, then <math>c - \frac{1}{4}\vec{b}^TA^{-1}\vec{b}</math>, attained at <math>\frac{-1}{2}A^{-1}\vec{b}</math><br>If <math>A</math> is negative semidefinite but not negative definite, it depends on whether <math>\vec{b}</math> is in the image of <math>A</math>. If yes, replace <math>A^{-1}\vec{b}</math> with the solution <math>\vec{v}</math> to <math>A\vec{v} = \vec{b}</math>, so we get a local minimum of <math>c - \frac{1}{4}\vec{b}^T\vec{v}</math> attained at <math>\frac{-1}{2}\vec{v}</math><br>If <math>A</math> is not negative semidefinite or if <math>\vec{b}</math> is not in the image of <math>A</math>, no local minimum value || The negative definite case corresponds to <math>a < 0</math>: Here, the local maximum value of <math>c - \frac{b^2}{4a}</math> is attained at <math>\frac{-b}{2a}</math> (consistent with the matrix formulation)<br>The positive definite case corresponds to <math>a > 0</math>, and there is no maximum in this case.
|-
|-
| [[gradient vector]] function (analogous to the [[derivative]]) || <math>\vec{x} \mapsto 2A\vec{x} + \vec{b}</math> || the [[derivative]] is <math>x \mapsto 2ax + b</math> (consistent with the matrix formulation)
| [[gradient vector]] function (analogous to the [[derivative]]) || <math>\vec{x} \mapsto 2A\vec{x} + \vec{b}</math> || the [[derivative]] is <math>x \mapsto 2ax + b</math> (consistent with the matrix formulation)
Line 37: Line 37:
===Partial derivatives and gradient vector===
===Partial derivatives and gradient vector===


====Case of general matrix====
The [[partial derivative]] with respect to the variable <math>x_i</math>, and therefore also the <math>i^{th}</math> coordinate of the [[gradient vector]], is given by:
The [[partial derivative]] with respect to the variable <math>x_i</math>, and therefore also the <math>i^{th}</math> coordinate of the [[gradient vector]], is given by:


Line 45: Line 46:
<math>(\nabla f)(\vec{x}) = (A + A^T)\vec{x} + \vec{b}</math>
<math>(\nabla f)(\vec{x}) = (A + A^T)\vec{x} + \vec{b}</math>


In the case that <math>A</math> is a symmetric matrix, this simplifies to:
====Case of symmetric matrix====
 
In the case that <math>A</math> is a symmetric matrix, the above expressions simplify as follows.
 
Since <math>a_{ij} = a_{ji}</math> for all <math>i,j</math>, the expression for the partial derivative becomes:
 
<math>\frac{\partial f}{\partial x_i} = \left(\sum_{j=1}^n 2a_{ij} x_j\right) + b_i</math>
 
The expression for the gradient vector becomes:


<math>(\nabla f)(\vec{x}) = 2A\vec{x} + \vec{b}</math>
<math>(\nabla f)(\vec{x}) = 2A\vec{x} + \vec{b}</math>


===Hessian matrix===
====Case <math>n = 1</math>====
 
A sanity check for the above expressions is that in the case <math>n = 1</math>, where <math>A = (a), \vec{b} = b</math>, we get the same answers as for the [[quadratic function]] <math>f(x) = ax^2 + bx + c</math>.
 
This is indeed the case. The only partial derivative here is the ordinary derivative, and this also is the gradient vector, and has expression:
 
<math>f'(x) = 2ax + b</math>
 
This agrees with both the expression for <math>\partial f/\partial x_i</math> and the expression for <math>(\nabla f)(\vec{x})</math>.
 
===Second-order partial derivatives and Hessian matrix===
 
====Case of general matrix====


The [[Hessian matrix]] of the quadratic function is the matrix <math>A + A^T</math>. In the case that <math>A</math> is symmetric, this simplifies to <math>2A</math>.
Recall that we had obtained (we replace the dummy variable <math>j</math> by <math>k</math> to facilitate differentiation with respect to <math>j</math> in the next step):
 
<math>\frac{\partial f}{\partial x_i} = \left(\sum_{k=1}^n (a_{ik} + a_{ki})x_k\right) + b_i</math>
 
Differentiating both sides with respect to <math>x_j</math> (note that <math>j</math> may be equal to <math>i</math> or different from <math>i</math>) we find that the only term with a nonzero derivative is the term where <math>k = j</math>. In this case, the derivative is the coefficient of <math>x_j</math>. Therefore, we obtain:
 
<math>\frac{\partial^2 f}{\partial x_j \partial x_i} = a_{ij} + a_{ji}</math>
 
Thus, the [[Hessian matrix]] of the quadratic function is given as:
 
<math>H(f)(\vec{x}) = A + A^T</math>
 
Note that this is independent of the choice of <math>\vec{x}</math>. This fact is true only because of the nature of the function: for more general functional forms, the Hessian matrix varies with the choice of input vector.
 
We can also see this in matrix form directly. The gradient function is:
 
<math>(\nabla f)(\vec{x}) = (A + A^T)\vec{x} + \vec{b}</math>
 
This is a linear transformation, and the [[Jacobian matrix]] of this linear transformation computes the Hessian that we want. We can use the well-known fact that the Jacobian matrix of a linear transformation coincides with the matrix describing the linear part of the transformation, and therefore the Hessian is:
 
<math>H(f)(\vec{x}) = A + A^T</math>
 
====Case of symmetric matrix====
 
We can either plug into the formulas for the general case or perform similar calculations to get the formulas in the case that <math>A</math> is a symmetric matrix:
 
<math>\frac{\partial^2 f}{\partial x_j \partial x_i} = 2a_{ij}</math>
 
<math>H(f)(\vec{x}) = 2A</math>
 
====Case <math>n = 1</math>====
 
A sanity check for the above expressions is that in the case <math>n = 1</math>, where <math>A = (a), \vec{b} = b</math>, we get the same answers as for the [[quadratic function]] <math>f(x) = ax^2 + bx + c</math>.
 
This is indeed the case. The only second-order partial derivative is <math>f''(x) = 2a</math>. This agrees both with the formula for the second-order partial derivative and with the formula for the Hessian matrix.


===Higher derivatives===
===Higher derivatives===


All the higher derivative tensors are zero.
All higher order partial derivatives (pure or mixed) are zero. This can be seen directly from the fact that the second-order partial derivatives are all constants, so differentiating them further (with respect to any variable) gives zero.
 
Therefore, the higher derivative tensors (the higher-order analogues of the gradient vector and Hessian matrix) are also identically zero.
 
==Points and intervals of interest==
 
For the discussion here, assume that <math>A</math> is symmetric. If it is not, replace <math>A</math> by the matrix <math>(A + A^T)/2</math>.
 
===Critical points===
 
====Case that the matrix <math>A</math> is invertible====
 
To find the critical points, we need to set the gradient vector equal to zero. This gives the vector equation:
 
<math>2A\vec{x} + \vec{b} = \vec{0}</math>
 
In other words:
 
<math>A \vec{x} = \frac{-1}{2}\vec{b}</math>
 
Under the assumption that <math>A</math> is invertible (or equivalently, that all its eigenvalues are nonzero), we can left-multiply both sides by <math>A^{-1}</math> and obtain:
 
<math>\vec{x} = \frac{-1}{2} A^{-1} \vec{b}</math>
 
We thus have a ''unique'' critical point as described above.
 
====Case that the matrix <math>A</math> is non-invertible====
 
We still need to solve the same linear system:
 
<math>2A \vec{x} + \vec{b}</math>
 
However, since <math>A</math> is no longer invertible (i.e., it has zero as an eigenvalue, or equivalently, it has a nonzero kernel), two cases arise:
 
* No solution exists. This happens if the vector <math>\vec{b}</math> is not in the image of the linear transformation defined by <math>A</math>. Conceptually, what is happening is that the linear part of <math>f</math> is not constrained by the quadratic part, and therefore the function is unbounded.
* A solution exists, but it is not unique. This happens if the vector <math>\vec{b}</math> is in the image of the linear transformation defined by <math>A</math>. In fact, the solution set is an affine space of dimension equal to the nullity of <math>A</math>. Thus, we get an affine space's worth of critical points.
 
===Determination of local extremum behavior at critical points===
 
====Case that the matrix <math>A</math> is invertible====
 
Recall that the [[Hessian matrix]] of <math>f</math> is <math>2A</math>. Therefore, by the [[second derivative test for a function of multiple variables]], we obtain the following:
 
* If <math>A</math> is a [[linear:symmetric positive-definite matrix|symmetric positive-definite matrix]], i.e., all its eigenvalues are positive, then the unique critical point <math>\frac{-1}{2}A^{-1}\vec{b}</math> is a point of local minimum and is the unique point of absolute minimum (an alternate derivation of this fact is later in the page).
* If <math>A</math> is negative-definite matrix, i.e., all its eigenvalues are negative, then the unique critical point <math>\frac{-1}{2}A^{-1}\vec{b}</math> is a point of local maximum and is the unique point of absolute maximum.
* If <math>A</math> has both positive and negative eigenvalues, then the unique critical point <math>\frac{-1}{2}A^{-1}\vec{b}</math> is neither a point of local minimum nor a point of local maximum. In fact, it gives a saddle point for the function. There is no point of local extremum and the range of the function is all of <math>\R</math>.
 
====Case that the matrix <math>A</math> is non-invertible====
 
In this case, the Hessian matrix, <math>2A</math>, is also non-invertible and in particular has zero as an eigenvalue.
 
In the case that there are no critical points, there is nothing to say.
 
In the case that there are critical points, we note that, through any critical point, there is a direction along which the second derivative is zero. In fact, what's happening geometrically is that the set of critical points form an affine subspace and the quadratic function <math>f</math> is constant on that affine subspace. How the behavior of that affine subspace compares with the values of the function elsewhere depends on the nonzero eigenvalues.
 
* In the case that <math>A</math> is a symmetric positive-semidefinite matrix, i.e., all its nonzero eigenvalues are positive, the function attains a local minimum and also its absolute minimum at all its critical points. Note that, since the function is constant at this minimum value on the affine space, none of these is a point of ''strict'' local minimum.
* In the case that <math>A</math> is a symmetric negative-semidefinite matrix, i.e., all its nonzero eigenvalues are negative, the function attains a local maximum and also its absolute maximum value at all its critical points. Note that, since the function is constant at this minimum value on the affine space, none of these is a point of ''strict'' local minimum.
 
==Geometry of the function==
 
The geometry of the quadratic function is largely determined by the spectrum of the matrix <math>A</math> (or equivalently, the spectrum of the matrix <math>2A</math>. As before, we shall assume that <math>A</math> is a symmetric matrix. If not, replace <math>2A</math> in the discussion below by <math>A + A^T</math>.
 
===Eigenvalues of the Hessian===
 
Since <math>2A</math> is a symmetric real matrix, it can be written in the form:
 
<math>2A = USU^{-1}</math>
 
where <math>U</math> is an [[linear:orthogonal matrix|orthogonal matrix]] and <math>S</math> is a diagonal matrix with real entries. The diagonal entries of <math>S</math> are the eigenvalues of <math>2A</math>. Another way of thinking of the above is that with a change of basis that preserves the Euclidean norm, we can convert the Hessian to a diagonal transformation.
 
The following cases are of interest:
 
{| class="sortable" border="1"
! Case on <math>S</math> (i.e., the set of eigenvalues) !! Corresponding term for matrix <math>2A</math> (and hence also for <math>A</math>)
|-
| all positive || [[linear:symmetric positive-definite matrix|symmetric positive-definite matrix]]
|-
| all nonnegative || symmetric positive-semidefinite matrix
|-
| all negative || symmetric negative-definite matrix
|-
| all nonpositive || symmetric negative-semidefinite matrix
|-
| all nonzero || symmetric invertible matrix
|-
| some positive, some negative (maybe some zero) || indefinite matrix
|}


==Cases==
==Alternate analysis of extreme values==


For the discussion of cases, assume that <math>A</math> is a symmetric matrix. If <math>A</math> is not symmetric, replace it by the symmetric matrix <math>(A + A^T)/2</math>.
For the discussion of cases, assume that <math>A</math> is a symmetric matrix. If <math>A</math> is not symmetric, replace it by the symmetric matrix <math>(A + A^T)/2</math>.

Latest revision as of 16:12, 26 May 2014

Definition

Consider variables . A quadratic function of the variables is a function of the form:

In vector form, if we denote by the column vector with coordinates , then we can write the function as:

where is a matrix with entries and is the column vector with entries .

Note that the matrix is non-unique: if then we could replace by . Therefore, we could choose to replace by the matrix and have the advantage of working with a symmetric matrix.

Key data

For the discussion here, assume that has been made a symmetric matrix.

Item Value Consistency with the case , where , (a matrix), (a 1-dimensional vector)
default domain the whole of the whole of
range If the matrix is not positive semidefinite or negative semidefinite, the range is all of .
If the matrix is positive definite or ( is positive semidefinite and is in its image), the range is where is the minimum value. If the matrix is negative definite or ( is negative semidefinite and is in its image), the range is where is the maximum value.
The case of "not positive semidefinite or negative semidefinite" does not arise for . Moreover, all the semidefinite cases must be definite, so we only have to consider the positive definite case and the negative definite case.
The positive definite case corresponds to
The negative definite case corresponds to
local minimum value and points of attainment If the matrix is positive definite, then , attained at
If is positive semidefinite but not positive definite, it depends on whether is in the image of . If yes, replace with the solution to , so we get a local minimum of attained at
If is not positive semidefinite or if is not in the image of , no local minimum value
The positive definite case corresponds to : Here, the local minimum value of is attained at (consistent with the matrix formulation)
The negative definite case corresponds to , and there is no minimum in this case.
local maximum value and points of attainment If the matrix is negative definite, then , attained at
If is negative semidefinite but not negative definite, it depends on whether is in the image of . If yes, replace with the solution to , so we get a local minimum of attained at
If is not negative semidefinite or if is not in the image of , no local minimum value
The negative definite case corresponds to : Here, the local maximum value of is attained at (consistent with the matrix formulation)
The positive definite case corresponds to , and there is no maximum in this case.
gradient vector function (analogous to the derivative) the derivative is (consistent with the matrix formulation)
Hessian matrix (analogous to the second derivative) (constant matrix-valued function) the second derivative is the constant function (consistent with the matrix formulation)

Differentiation

Partial derivatives and gradient vector

Case of general matrix

The partial derivative with respect to the variable , and therefore also the coordinate of the gradient vector, is given by:

In terms of the matrix and vector notation, the gradient vector, expressed as a column vector, is:

Case of symmetric matrix

In the case that is a symmetric matrix, the above expressions simplify as follows.

Since for all , the expression for the partial derivative becomes:

The expression for the gradient vector becomes:

Case

A sanity check for the above expressions is that in the case , where , we get the same answers as for the quadratic function .

This is indeed the case. The only partial derivative here is the ordinary derivative, and this also is the gradient vector, and has expression:

This agrees with both the expression for and the expression for .

Second-order partial derivatives and Hessian matrix

Case of general matrix

Recall that we had obtained (we replace the dummy variable by to facilitate differentiation with respect to in the next step):

Differentiating both sides with respect to (note that may be equal to or different from ) we find that the only term with a nonzero derivative is the term where . In this case, the derivative is the coefficient of . Therefore, we obtain:

Thus, the Hessian matrix of the quadratic function is given as:

Note that this is independent of the choice of . This fact is true only because of the nature of the function: for more general functional forms, the Hessian matrix varies with the choice of input vector.

We can also see this in matrix form directly. The gradient function is:

This is a linear transformation, and the Jacobian matrix of this linear transformation computes the Hessian that we want. We can use the well-known fact that the Jacobian matrix of a linear transformation coincides with the matrix describing the linear part of the transformation, and therefore the Hessian is:

Case of symmetric matrix

We can either plug into the formulas for the general case or perform similar calculations to get the formulas in the case that is a symmetric matrix:

Case

A sanity check for the above expressions is that in the case , where , we get the same answers as for the quadratic function .

This is indeed the case. The only second-order partial derivative is . This agrees both with the formula for the second-order partial derivative and with the formula for the Hessian matrix.

Higher derivatives

All higher order partial derivatives (pure or mixed) are zero. This can be seen directly from the fact that the second-order partial derivatives are all constants, so differentiating them further (with respect to any variable) gives zero.

Therefore, the higher derivative tensors (the higher-order analogues of the gradient vector and Hessian matrix) are also identically zero.

Points and intervals of interest

For the discussion here, assume that is symmetric. If it is not, replace by the matrix .

Critical points

Case that the matrix is invertible

To find the critical points, we need to set the gradient vector equal to zero. This gives the vector equation:

In other words:

Under the assumption that is invertible (or equivalently, that all its eigenvalues are nonzero), we can left-multiply both sides by and obtain:

We thus have a unique critical point as described above.

Case that the matrix is non-invertible

We still need to solve the same linear system:

However, since is no longer invertible (i.e., it has zero as an eigenvalue, or equivalently, it has a nonzero kernel), two cases arise:

  • No solution exists. This happens if the vector is not in the image of the linear transformation defined by . Conceptually, what is happening is that the linear part of is not constrained by the quadratic part, and therefore the function is unbounded.
  • A solution exists, but it is not unique. This happens if the vector is in the image of the linear transformation defined by . In fact, the solution set is an affine space of dimension equal to the nullity of . Thus, we get an affine space's worth of critical points.

Determination of local extremum behavior at critical points

Case that the matrix is invertible

Recall that the Hessian matrix of is . Therefore, by the second derivative test for a function of multiple variables, we obtain the following:

  • If is a symmetric positive-definite matrix, i.e., all its eigenvalues are positive, then the unique critical point is a point of local minimum and is the unique point of absolute minimum (an alternate derivation of this fact is later in the page).
  • If is negative-definite matrix, i.e., all its eigenvalues are negative, then the unique critical point is a point of local maximum and is the unique point of absolute maximum.
  • If has both positive and negative eigenvalues, then the unique critical point is neither a point of local minimum nor a point of local maximum. In fact, it gives a saddle point for the function. There is no point of local extremum and the range of the function is all of .

Case that the matrix is non-invertible

In this case, the Hessian matrix, , is also non-invertible and in particular has zero as an eigenvalue.

In the case that there are no critical points, there is nothing to say.

In the case that there are critical points, we note that, through any critical point, there is a direction along which the second derivative is zero. In fact, what's happening geometrically is that the set of critical points form an affine subspace and the quadratic function is constant on that affine subspace. How the behavior of that affine subspace compares with the values of the function elsewhere depends on the nonzero eigenvalues.

  • In the case that is a symmetric positive-semidefinite matrix, i.e., all its nonzero eigenvalues are positive, the function attains a local minimum and also its absolute minimum at all its critical points. Note that, since the function is constant at this minimum value on the affine space, none of these is a point of strict local minimum.
  • In the case that is a symmetric negative-semidefinite matrix, i.e., all its nonzero eigenvalues are negative, the function attains a local maximum and also its absolute maximum value at all its critical points. Note that, since the function is constant at this minimum value on the affine space, none of these is a point of strict local minimum.

Geometry of the function

The geometry of the quadratic function is largely determined by the spectrum of the matrix (or equivalently, the spectrum of the matrix . As before, we shall assume that is a symmetric matrix. If not, replace in the discussion below by .

Eigenvalues of the Hessian

Since is a symmetric real matrix, it can be written in the form:

where is an orthogonal matrix and is a diagonal matrix with real entries. The diagonal entries of are the eigenvalues of . Another way of thinking of the above is that with a change of basis that preserves the Euclidean norm, we can convert the Hessian to a diagonal transformation.

The following cases are of interest:

Case on (i.e., the set of eigenvalues) Corresponding term for matrix (and hence also for )
all positive symmetric positive-definite matrix
all nonnegative symmetric positive-semidefinite matrix
all negative symmetric negative-definite matrix
all nonpositive symmetric negative-semidefinite matrix
all nonzero symmetric invertible matrix
some positive, some negative (maybe some zero) indefinite matrix

Alternate analysis of extreme values

For the discussion of cases, assume that is a symmetric matrix. If is not symmetric, replace it by the symmetric matrix .

Positive definite case

First, we consider the case where is a symmetric positive definite matrix. In other words, we can write in the form:

where is a invertible matrix.

We can "complete the square" for this function:

In other words:

This is minimized when the expression whose norm we are measuring is zero, so that it is minimized when we have:

Simplifying, we obtain that we minimum occurs at:

Moreover, the value of the minimum is: