Quadratic function of multiple variables

Definition

Consider variables $x_{1}, x_{2}, \dots, x_{n}$ . A quadratic function of the variables $x_{1}, x_{2}, \dots, x_{n}$ is a function of the form:

$(\sum_{i = 1}^{n} \sum_{j = 1}^{n} a_{i j} x_{i} x_{j}) + (\sum_{i = 1}^{n} b_{i} x_{i}) + c$

In vector form, if we denote by $\vec{x}$ the column vector with coordinates $x_{1}, x_{2}, \dots, x_{n}$ , then we can write the function as:

${\vec{x}}^{T} A \vec{x} + {\vec{b}}^{T} \vec{x} + c$

where $A$ is a $n \times n$ matrix with entries $a_{i j}$ and $\vec{b}$ is the column vector with entries $b_{i}$ .

Note that the matrix $A$ is non-unique: if $A + A^{T} = F + F^{T}$ then we could replace $A$ by $F$ . Therefore, we could choose to replace $A$ by the matrix $(A + A^{T}) / 2$ and have the advantage of working with a symmetric matrix.

Key data

For the discussion here, assume that $A$ has been made a symmetric matrix.

Item	Value	Consistency with the case $n = 1$ , where $f (x) = a x^{2} + b x + c$ , $A = (a)$ (a $1 \times 1$ matrix), $\vec{b} = (b)$ (a 1-dimensional vector)
default domain	the whole of $R^{n}$	the whole of $R$
range	If the matrix $A$ is not positive semidefinite or negative semidefinite, the range is all of $R$ . If the matrix $A$ is positive definite or ( $A$ is positive semidefinite and $\vec{b}$ is in its image), the range is $[m, \infty)$ where $m$ is the minimum value. If the matrix $A$ is negative definite or ( $A$ is negative semidefinite and $\vec{b}$ is in its image), the range is $(- \infty, m]$ where $m$ is the maximum value.	The case of "not positive semidefinite or negative semidefinite" does not arise for $n = 1$ . Moreover, all the semidefinite cases must be definite, so we only have to consider the positive definite case and the negative definite case. The positive definite case corresponds to $a > 0$ The negative definite case corresponds to $a < 0$
local minimum value and points of attainment	If the matrix $A$ is positive definite, then $c - \frac{1}{4} {\vec{b}}^{T} A^{- 1} \vec{b}$ , attained at $\frac{- 1}{2} A^{- 1} \vec{b}$ If $A$ is positive semidefinite but not positive definite, it depends on whether $\vec{b}$ is in the image of $A$ . If yes, replace $A^{- 1} \vec{b}$ with the solution $\vec{v}$ to $A \vec{v} = \vec{b}$ , so we get a local minimum of $c - \frac{1}{4} {\vec{b}}^{T} \vec{v}$ attained at $\frac{- 1}{2} \vec{v}$ If $A$ is not positive semidefinite or if $\vec{b}$ is not in the image of $A$ , no local minimum value	The positive definite case corresponds to $a > 0$ : Here, the local minimum value of $c - \frac{b^{2}}{4 a}$ is attained at $\frac{- b}{2 a}$ (consistent with the matrix formulation) The negative definite case corresponds to $a < 0$ , and there is no minimum in this case.
local maximum value and points of attainment	If the matrix $A$ is negative definite, then $c - \frac{1}{4} {\vec{b}}^{T} A^{- 1} \vec{b}$ , attained at $\frac{- 1}{2} A^{- 1} \vec{b}$ If $A$ is negative semidefinite but not negative definite, it depends on whether $\vec{b}$ is in the image of $A$ . If yes, replace $A^{- 1} \vec{b}$ with the solution $\vec{v}$ to $A \vec{v} = \vec{b}$ , so we get a local minimum of $c - \frac{1}{4} {\vec{b}}^{T} \vec{v}$ attained at $\frac{- 1}{2} \vec{v}$ If $A$ is not negative semidefinite or if $\vec{b}$ is not in the image of $A$ , no local minimum value	The negative definite case corresponds to $a < 0$ : Here, the local maximum value of $c - \frac{b^{2}}{4 a}$ is attained at $\frac{- b}{2 a}$ (consistent with the matrix formulation) The positive definite case corresponds to $a > 0$ , and there is no maximum in this case.
gradient vector function (analogous to the derivative)	$\vec{x} \mapsto 2 A \vec{x} + \vec{b}$	the derivative is $x \mapsto 2 a x + b$ (consistent with the matrix formulation)
Hessian matrix (analogous to the second derivative)	$\vec{x} \mapsto 2 A$ (constant matrix-valued function)	the second derivative is the constant function $x \mapsto 2 a$ (consistent with the matrix formulation)

Differentiation

Partial derivatives and gradient vector

Case of general matrix

The partial derivative with respect to the variable $x_{i}$ , and therefore also the $i^{t h}$ coordinate of the gradient vector, is given by:

$\frac{\partial f}{\partial x_{i}} = (\sum_{j = 1}^{n} (a_{i j} + a_{j i}) x_{j}) + b_{i}$

In terms of the matrix and vector notation, the gradient vector, expressed as a column vector, is:

$(\nabla f) (\vec{x}) = (A + A^{T}) \vec{x} + \vec{b}$

Case of symmetric matrix

In the case that $A$ is a symmetric matrix, the above expressions simplify as follows.

Since $a_{i j} = a_{j i}$ for all $i, j$ , the expression for the partial derivative becomes:

$\frac{\partial f}{\partial x_{i}} = (\sum_{j = 1}^{n} 2 a_{i j} x_{j}) + b_{i}$

The expression for the gradient vector becomes:

$(\nabla f) (\vec{x}) = 2 A \vec{x} + \vec{b}$

Case $n = 1$

A sanity check for the above expressions is that in the case $n = 1$ , where $A = (a), \vec{b} = b$ , we get the same answers as for the quadratic function $f (x) = a x^{2} + b x + c$ .

This is indeed the case. The only partial derivative here is the ordinary derivative, and this also is the gradient vector, and has expression:

$f^{'} (x) = 2 a x + b$

This agrees with both the expression for $\partial f / \partial x_{i}$ and the expression for $(\nabla f) (\vec{x})$ .

Second-order partial derivatives and Hessian matrix

Case of general matrix=

Recall that we had obtained (we replace the dummy variable $j$ by $k$ to facilitate differentiation with respect to $j$ in the next step):

$\frac{\partial f}{\partial x_{i}} = (\sum_{k = 1}^{n} (a_{i k} + a_{k i}) x_{k}) + b_{i}$

Differentiating both sides with respect to $x_{j}$ (note that $j$ may be equal to $i$ or different from $i$ ) we find that the only term with a nonzero derivative is the term where $k = j$ . In this case, the derivative is the coefficient of $x_{j}$ . Therefore, we obtain:

$\frac{\partial^{2} f}{\partial x_{j} \partial x_{i}} = a_{i j} + a_{j i}$

Thus, the Hessian matrix of the quadratic function is given as:

$H (f) (\vec{x}) = A + A^{T}$

Note that this is independent of the choice of $\vec{x}$ . This fact is true only because of the nature of the function: for more general functional forms, the Hessian matrix varies with the choice of input vector.

We can also see this in matrix form directly. The gradient function is:

$(\nabla f) (\vec{x}) = (A + A^{T}) \vec{x} + \vec{b}$

This is a linear transformation, and the Jacobian matrix of this linear transformation computes the Hessian that we want. We can use the well-known fact that the Jacobian matrix of a linear transformation coincides with the matrix describing the linear part of the transformation, and therefore the Hessian is:

$H (f) (\vec{x}) = A + A^{T}$

Case of symmetric matrix

We can either plug into the formulas for the general case or perform similar calculations to get the formulas in the case that $A$ is a symmetric matrix:

$\frac{\partial^{2} f}{\partial x_{j} \partial x_{i}} = 2 a_{i j}$

$H (f) (\vec{x}) = 2 A$

Case $n = 1$

A sanity check for the above expressions is that in the case $n = 1$ , where $A = (a), \vec{b} = b$ , we get the same answers as for the quadratic function $f (x) = a x^{2} + b x + c$ .

This is indeed the case. The only second-order partial derivative is $f^{″} (x) = 2 a$ . This agrees both with the formula for the second-order partial derivative and with the formula for the Hessian matrix.

Higher derivatives

All higher order partial derivatives (pure or mixed) are zero. This can be seen directly from the fact that the second-order partial derivatives are all constants, so differentiating them further (with respect to any variable) gives zero.

Therefore, the higher derivative tensors (the higher-order analogues of the gradient vector and Hessian matrix) are also identically zero.

Cases

For the discussion of cases, assume that $A$ is a symmetric matrix. If $A$ is not symmetric, replace it by the symmetric matrix $(A + A^{T}) / 2$ .

Positive definite case

First, we consider the case where $A$ is a symmetric positive definite matrix. In other words, we can write $A$ in the form:

$A = M^{T} M$

where $M$ is a $n \times n$ invertible matrix.

We can "complete the square" for this function:

$f (\vec{x}) = {(M \vec{x} + \frac{1}{2} (M^{T})^{- 1} \vec{b})}^{T} (M \vec{x} + \frac{1}{2} (M^{T})^{- 1} \vec{b}) + (c - \frac{1}{4} {\vec{b}}^{T} A^{- 1} \vec{b})$

In other words:

$f (\vec{x}) = {‖ M \vec{x} + \frac{1}{2} (M^{T})^{- 1} \vec{b} ‖}^{2} + (c - \frac{1}{4} {\vec{b}}^{T} A^{- 1} \vec{b})$

This is minimized when the expression whose norm we are measuring is zero, so that it is minimized when we have:

$M \vec{x} + \frac{1}{2} (M^{T})^{- 1} \vec{b} = \vec{0}$

Simplifying, we obtain that we minimum occurs at:

$\vec{x} = - \frac{1}{2} A^{- 1} \vec{b}$

Moreover, the value of the minimum is:

$c - \frac{1}{4} {\vec{b}}^{T} A^{- 1} \vec{b}$

Definition

Key data

Differentiation

Partial derivatives and gradient vector

Case of general matrix

Case of symmetric matrix

Case n=1

Second-order partial derivatives and Hessian matrix

Case of general matrix=

Case of symmetric matrix

Case n=1

Higher derivatives

Cases

Positive definite case

Case $n = 1$

Case $n = 1$