Quadratic function of multiple variables: Difference between revisions

Latest revision as of 16:12, 26 May 2014

Definition

Consider variables $x_{1},x_{2},\dots ,x_{n}$ . A quadratic function of the variables $x_{1},x_{2},\dots ,x_{n}$ is a function of the form:

$\left(\sum _{i=1}^{n}\sum _{j=1}^{n}a_{ij}x_{i}x_{j}\right)+\left(\sum _{i=1}^{n}b_{i}x_{i}\right)+c$

In vector form, if we denote by ${\vec {x}}$ the column vector with coordinates $x_{1},x_{2},\dots ,x_{n}$ , then we can write the function as:

${\vec {x}}^{T}A{\vec {x}}+{\vec {b}}^{T}{\vec {x}}+c$

where $A$ is a $n\times n$ matrix with entries $a_{ij}$ and ${\vec {b}}$ is the column vector with entries $b_{i}$ .

Note that the matrix $A$ is non-unique: if $A+A^{T}=F+F^{T}$ then we could replace $A$ by $F$ . Therefore, we could choose to replace $A$ by the matrix $(A+A^{T})/2$ and have the advantage of working with a symmetric matrix.

Key data

For the discussion here, assume that $A$ has been made a symmetric matrix.

Item	Value	Consistency with the case $n=1$ , where $f(x)=ax^{2}+bx+c$ , $A=(a)$ (a $1\times 1$ matrix), ${\vec {b}}=(b)$ (a 1-dimensional vector)
default domain	the whole of $\mathbb {R} ^{n}$	the whole of $\mathbb {R}$
range	If the matrix $A$ is not positive semidefinite or negative semidefinite, the range is all of $\mathbb {R}$ . If the matrix $A$ is positive definite or ( $A$ is positive semidefinite and ${\vec {b}}$ is in its image), the range is $[m,\infty )$ where $m$ is the minimum value. If the matrix $A$ is negative definite or ( $A$ is negative semidefinite and ${\vec {b}}$ is in its image), the range is $(-\infty ,m]$ where $m$ is the maximum value.	The case of "not positive semidefinite or negative semidefinite" does not arise for $n=1$ . Moreover, all the semidefinite cases must be definite, so we only have to consider the positive definite case and the negative definite case. The positive definite case corresponds to $a>0$ The negative definite case corresponds to $a<0$
local minimum value and points of attainment	If the matrix $A$ is positive definite, then $c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}$ , attained at ${\frac {-1}{2}}A^{-1}{\vec {b}}$ If $A$ is positive semidefinite but not positive definite, it depends on whether ${\vec {b}}$ is in the image of $A$ . If yes, replace $A^{-1}{\vec {b}}$ with the solution ${\vec {v}}$ to $A{\vec {v}}={\vec {b}}$ , so we get a local minimum of $c-{\frac {1}{4}}{\vec {b}}^{T}{\vec {v}}$ attained at ${\frac {-1}{2}}{\vec {v}}$ If $A$ is not positive semidefinite or if ${\vec {b}}$ is not in the image of $A$ , no local minimum value	The positive definite case corresponds to $a>0$ : Here, the local minimum value of $c-{\frac {b^{2}}{4a}}$ is attained at ${\frac {-b}{2a}}$ (consistent with the matrix formulation) The negative definite case corresponds to $a<0$ , and there is no minimum in this case.
local maximum value and points of attainment	If the matrix $A$ is negative definite, then $c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}$ , attained at ${\frac {-1}{2}}A^{-1}{\vec {b}}$ If $A$ is negative semidefinite but not negative definite, it depends on whether ${\vec {b}}$ is in the image of $A$ . If yes, replace $A^{-1}{\vec {b}}$ with the solution ${\vec {v}}$ to $A{\vec {v}}={\vec {b}}$ , so we get a local minimum of $c-{\frac {1}{4}}{\vec {b}}^{T}{\vec {v}}$ attained at ${\frac {-1}{2}}{\vec {v}}$ If $A$ is not negative semidefinite or if ${\vec {b}}$ is not in the image of $A$ , no local minimum value	The negative definite case corresponds to $a<0$ : Here, the local maximum value of $c-{\frac {b^{2}}{4a}}$ is attained at ${\frac {-b}{2a}}$ (consistent with the matrix formulation) The positive definite case corresponds to $a>0$ , and there is no maximum in this case.
gradient vector function (analogous to the derivative)	${\vec {x}}\mapsto 2A{\vec {x}}+{\vec {b}}$	the derivative is $x\mapsto 2ax+b$ (consistent with the matrix formulation)
Hessian matrix (analogous to the second derivative)	${\vec {x}}\mapsto 2A$ (constant matrix-valued function)	the second derivative is the constant function $x\mapsto 2a$ (consistent with the matrix formulation)

Differentiation

Partial derivatives and gradient vector

Case of general matrix

The partial derivative with respect to the variable $x_{i}$ , and therefore also the $i^{th}$ coordinate of the gradient vector, is given by:

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{j=1}^{n}(a_{ij}+a_{ji})x_{j}\right)+b_{i}$

In terms of the matrix and vector notation, the gradient vector, expressed as a column vector, is:

$(\nabla f)({\vec {x}})=(A+A^{T}){\vec {x}}+{\vec {b}}$

Case of symmetric matrix

In the case that $A$ is a symmetric matrix, the above expressions simplify as follows.

Since $a_{ij}=a_{ji}$ for all $i,j$ , the expression for the partial derivative becomes:

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{j=1}^{n}2a_{ij}x_{j}\right)+b_{i}$

The expression for the gradient vector becomes:

$(\nabla f)({\vec {x}})=2A{\vec {x}}+{\vec {b}}$

Case $n=1$

A sanity check for the above expressions is that in the case $n=1$ , where $A=(a),{\vec {b}}=b$ , we get the same answers as for the quadratic function $f(x)=ax^{2}+bx+c$ .

This is indeed the case. The only partial derivative here is the ordinary derivative, and this also is the gradient vector, and has expression:

$f'(x)=2ax+b$

This agrees with both the expression for $\partial f/\partial x_{i}$ and the expression for $(\nabla f)({\vec {x}})$ .