L1-regularized quadratic function of multiple variables

Definition

A $L^{1}$ -regularized quadratic function of the variables $x_{1}, x_{2}, \dots, x_{n}$ is a function of the form (satisfying the positive definiteness condition below):

$f (x_{1}, x_{2}, \dots, x_{n}) : = (\sum_{i = 1}^{n} \sum_{j = 1}^{n} a_{i j} x_{i} x_{j}) + (\sum_{i = 1}^{n} b_{i} x_{i}) + λ \sum_{i = 1}^{n} | x_{i} | + c$

In vector form, if we denote by $\vec{x}$ the column vector with coordinates $x_{1}, x_{2}, \dots, x_{n}$ , then we can write the function as:

${\vec{x}}^{T} A \vec{x} + {\vec{b}}^{T} \vec{x} + λ | \vec{x} |_{1} + c$

where $A$ is the $n \times n$ matrix with entries $a_{i j}$ and $\vec{b}$ is the column vector with entries $b_{i}$ .

Note that the matrix $A$ is non-unique: if $A + A^{T} = F + F^{T}$ then we could replace $A$ by $F$ . Therefore, we could choose to replace $A$ by the matrix $(A + A^{T}) / 2$ . We will thus assume that $A$ is a symmetric matrix.

We impose the further restriction that the matrix $A$ be a symmetric positive definite matrix.

Key data

Item	Value
default domain	the whole of $R^{n}$

Differentiation

Partial derivatives and gradient vector

The partial derivative with respect to the variable $x_{i}$ , and therefore also the $i^{t h}$ coordinate of the gradient vector (if it exists), is given as follows when $x_{i} \neq 0$ :

$\frac{\partial f}{\partial x_{i}} = (\sum_{j = 1}^{n} (a_{i j} + a_{j i}) x_{j}) + b_{i} + λ s g n (x_{i})$

By the symmetry assumption, this becomes:

$\frac{\partial f}{\partial x_{i}} = (\sum_{j = 1}^{n} 2 a_{i j} x_{j}) + b_{i} + λ s g n (x_{i})$

The partial derivative is undefined when $x_{i} = 0$ .

The gradient vector exists if and only if all the coordinates are nonzero.

In vector notation, the gradient vector is as follows for all $\vec{x}$ with all coordinates nonzero:

$\nabla f (\vec{x}) = 2 A \vec{x} + \vec{b} + λ \overset{g}{s} (\vec{x})$

where $\overset{g}{s}$ is the signum vector function.

Hessian matrix

The Hessian matrix of the function, defined wherever all the coordinates are nonzero, is the matrix $2 A$ .