Quadratic function of multiple variables

Definition

Consider variables $x_{1},x_{2},\dots ,x_{n}$ . A quadratic function of the variables $x_{1},x_{2},\dots ,x_{n}$ is a function of the form:

$\left(\sum _{i=1}^{n}\sum _{j=1}^{n}a_{ij}x_{i}x_{j}\right)+\left(\sum _{i=1}^{n}b_{i}x_{i}\right)+c$

In vector form, if we denote by ${\vec {x}}$ the column vector with coordinates $x_{1},x_{2},\dots ,x_{n}$ , then we can write the function as:

${\vec {x}}^{T}A{\vec {x}}+{\vec {b}}^{T}{\vec {x}}+c$

where $A$ is a $n\times n$ matrix with entries $a_{ij}$ and ${\vec {b}}$ is the column vector with entries $b_{i}$ .

Note that the matrix $A$ is non-unique: if $A+A^{T}=F+F^{T}$ then we could replace $A$ by $F$ . Therefore, we could choose to replace $A$ by the matrix $(A+A^{T})/2$ and have the advantage of working with a symmetric matrix.

Key data

For the discussion here, assume that $A$ has been made a symmetric matrix.

Item	Value	Consistency with the case $n=1$ , where $f(x)=ax^{2}+bx+c$ , $A=(a)$ (a $1\times 1$ matrix), ${\vec {b}}=(b)$ (a 1-dimensional vector)
default domain	the whole of $\mathbb {R} ^{n}$	the whole of $\mathbb {R}$
range	If the matrix $A$ is not positive semidefinite or negative semidefinite, the range is all of $\mathbb {R}$ . If the matrix $A$ is positive definite or ( $A$ is positive semidefinite and ${\vec {b}}$ is in its image), the range is $[m,\infty )$ where $m$ is the minimum value. If the matrix $A$ is negative definite or ( $A$ is negative semidefinite and ${\vec {b}}$ is in its image), the range is $(-\infty ,m]$ where $m$ is the maximum value.	The case of "not positive semidefinite or negative semidefinite" does not arise for $n=1$ . Moreover, all the semidefinite cases must be definite, so we only have to consider the positive definite case and the negative definite case. The positive definite case corresponds to $a>0$ The negative definite case corresponds to $a<0$
local minimum value and points of attainment	If the matrix $A$ is positive definite, then $c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}$ , attained at ${\frac {-1}{2}}A^{-1}{\vec {b}}$ If $A$ is positive semidefinite but not positive definite, it depends on whether ${\vec {b}}$ is in the image of $A$ . If yes, replace $A^{-1}{\vec {b}}$ with the solution ${\vec {v}}$ to $A{\vec {v}}={\vec {b}}$ , so we get a local minimum of $c-{\frac {1}{4}}{\vec {b}}^{T}{\vec {v}}$ attained at ${\frac {-1}{2}}{\vec {v}}$ If $A$ is not positive semidefinite or if ${\vec {b}}$ is not in the image of $A$ , no local minimum value	The positive definite case corresponds to $a>0$ : Here, the local minimum value of $c-{\frac {b^{2}}{4a}}$ is attained at ${\frac {-b}{2a}}$ (consistent with the matrix formulation) The negative definite case corresponds to $a<0$ , and there is no minimum in this case.
local maximum value and points of attainment	If the matrix $A$ is negative definite, then $c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}$ , attained at ${\frac {-1}{2}}A^{-1}{\vec {b}}$ If $A$ is negative semidefinite but not negative definite, it depends on whether ${\vec {b}}$ is in the image of $A$ . If yes, replace $A^{-1}{\vec {b}}$ with the solution ${\vec {v}}$ to $A{\vec {v}}={\vec {b}}$ , so we get a local minimum of $c-{\frac {1}{4}}{\vec {b}}^{T}{\vec {v}}$ attained at ${\frac {-1}{2}}{\vec {v}}$ If $A$ is not negative semidefinite or if ${\vec {b}}$ is not in the image of $A$ , no local minimum value	The negative definite case corresponds to $a<0$ : Here, the local maximum value of $c-{\frac {b^{2}}{4a}}$ is attained at ${\frac {-b}{2a}}$ (consistent with the matrix formulation) The positive definite case corresponds to $a>0$ , and there is no maximum in this case.
gradient vector function (analogous to the derivative)	${\vec {x}}\mapsto 2A{\vec {x}}+{\vec {b}}$	the derivative is $x\mapsto 2ax+b$ (consistent with the matrix formulation)
Hessian matrix (analogous to the second derivative)	${\vec {x}}\mapsto 2A$ (constant matrix-valued function)	the second derivative is the constant function $x\mapsto 2a$ (consistent with the matrix formulation)

Differentiation

Partial derivatives and gradient vector

Case of general matrix

The partial derivative with respect to the variable $x_{i}$ , and therefore also the $i^{th}$ coordinate of the gradient vector, is given by:

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{j=1}^{n}(a_{ij}+a_{ji})x_{j}\right)+b_{i}$

In terms of the matrix and vector notation, the gradient vector, expressed as a column vector, is:

$(\nabla f)({\vec {x}})=(A+A^{T}){\vec {x}}+{\vec {b}}$

Case of symmetric matrix

In the case that $A$ is a symmetric matrix, the above expressions simplify as follows.

Since $a_{ij}=a_{ji}$ for all $i,j$ , the expression for the partial derivative becomes:

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{j=1}^{n}2a_{ij}x_{j}\right)+b_{i}$

The expression for the gradient vector becomes:

$(\nabla f)({\vec {x}})=2A{\vec {x}}+{\vec {b}}$

Case $n=1$

A sanity check for the above expressions is that in the case $n=1$ , where $A=(a),{\vec {b}}=b$ , we get the same answers as for the quadratic function $f(x)=ax^{2}+bx+c$ .

This is indeed the case. The only partial derivative here is the ordinary derivative, and this also is the gradient vector, and has expression:

$f'(x)=2ax+b$

This agrees with both the expression for $\partial f/\partial x_{i}$ and the expression for $(\nabla f)({\vec {x}})$ .

Second-order partial derivatives and Hessian matrix

Case of general matrix=

Recall that we had obtained (we replace the dummy variable $j$ by $k$ to facilitate differentiation with respect to $j$ in the next step):

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{k=1}^{n}(a_{ik}+a_{ki})x_{k}\right)+b_{i}$

Differentiating both sides with respect to $x_{j}$ (note that $j$ may be equal to $i$ or different from $i$ ) we find that the only term with a nonzero derivative is the term where $k=j$ . In this case, the derivative is the coefficient of $x_{j}$ . Therefore, we obtain:

${\frac {\partial ^{2}f}{\partial x_{j}\partial x_{i}}}=a_{ij}+a_{ji}$

Thus, the Hessian matrix of the quadratic function is given as:

$H(f)({\vec {x}})=A+A^{T}$

Note that this is independent of the choice of ${\vec {x}}$ . This fact is true only because of the nature of the function: for more general functional forms, the Hessian matrix varies with the choice of input vector.

We can also see this in matrix form directly. The gradient function is:

$(\nabla f)({\vec {x}})=(A+A^{T}){\vec {x}}+{\vec {b}}$

This is a linear transformation, and the Jacobian matrix of this linear transformation computes the Hessian that we want. We can use the well-known fact that the Jacobian matrix of a linear transformation coincides with the matrix describing the linear part of the transformation, and therefore the Hessian is:

$H(f)({\vec {x}})=A+A^{T}$

Case of symmetric matrix

We can either plug into the formulas for the general case or perform similar calculations to get the formulas in the case that $A$ is a symmetric matrix:

${\frac {\partial ^{2}f}{\partial x_{j}\partial x_{i}}}=2a_{ij}$

$H(f)({\vec {x}})=2A$

Case $n=1$

A sanity check for the above expressions is that in the case $n=1$ , where $A=(a),{\vec {b}}=b$ , we get the same answers as for the quadratic function $f(x)=ax^{2}+bx+c$ .

This is indeed the case. The only second-order partial derivative is $f''(x)=2a$ . This agrees both with the formula for the second-order partial derivative and with the formula for the Hessian matrix.

Higher derivatives

All the higher derivative tensors are zero.

Cases

For the discussion of cases, assume that $A$ is a symmetric matrix. If $A$ is not symmetric, replace it by the symmetric matrix $(A+A^{T})/2$ .

Positive definite case

First, we consider the case where $A$ is a symmetric positive definite matrix. In other words, we can write $A$ in the form:

$A=M^{T}M$

where $M$ is a $n\times n$ invertible matrix.

We can "complete the square" for this function:

$f({\vec {x}})=\left(M{\vec {x}}+{\frac {1}{2}}(M^{T})^{-1}{\vec {b}}\right)^{T}\left(M{\vec {x}}+{\frac {1}{2}}(M^{T})^{-1}{\vec {b}}\right)+\left(c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}\right)$

In other words:

$f({\vec {x}})=\left\|M{\vec {x}}+{\frac {1}{2}}(M^{T})^{-1}{\vec {b}}\right\|^{2}+\left(c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}\right)$

This is minimized when the expression whose norm we are measuring is zero, so that it is minimized when we have:

$M{\vec {x}}+{\frac {1}{2}}(M^{T})^{-1}{\vec {b}}={\vec {0}}$

Simplifying, we obtain that we minimum occurs at:

${\vec {x}}=-{\frac {1}{2}}A^{-1}{\vec {b}}$

Moreover, the value of the minimum is:

$c-{\frac {1}{4}}{\vec {b}}^{T}A^{-1}{\vec {b}}$

Definition

Key data

Differentiation

Partial derivatives and gradient vector

Case of general matrix

Case of symmetric matrix

Case n = 1 {\displaystyle n=1}

Second-order partial derivatives and Hessian matrix

Case of general matrix=

Case of symmetric matrix

Case n = 1 {\displaystyle n=1}

Higher derivatives

Cases

Positive definite case

Case $n=1$

Case $n=1$