L1-regularized quadratic function of multiple variables: Difference between revisions

Revision as of 19:28, 11 May 2014

Definition

A $L^{1}$ -regularized quadratic function of the variables $x_{1},x_{2},\dots ,x_{n}$ is a function of the form (satisfying the positive definiteness condition below):

$f(x_{1},x_{2},\dots ,x_{n}):=\left(\sum _{i=1}^{n}\sum _{j=1}^{n}a_{ij}x_{i}x_{j}\right)+\left(\sum _{i=1}^{n}b_{i}x_{i}\right)+\lambda \sum _{i=1}^{n}|x_{i}|+c$

In vector form, if we denote by ${\vec {x}}$ the column vector with coordinates $x_{1},x_{2},\dots ,x_{n}$ , then we can write the function as:

${\vec {x}}^{T}A{\vec {x}}+{\vec {b}}^{T}{\vec {x}}+\lambda \|{\vec {x}}\|_{1}+c$

where $A$ is the $n\times n$ matrix with entries $a_{ij}$ and ${\vec {b}}$ is the column vector with entries $b_{i}$ .

Note that the matrix $A$ is non-unique: if $A+A^{T}=F+F^{T}$ then we could replace $A$ by $F$ . Therefore, we could choose to replace $A$ by the matrix $(A+A^{T})/2$ . We will thus assume that $A$ is a symmetric matrix.

We impose the further restriction that the matrix $A$ be a symmetric positive definite matrix.

Key data

Item	Value
default domain	the whole of $\mathbb {R} ^{n}$

Differentiation

Partial derivatives and gradient vector

The partial derivative with respect to the variable $x_{i}$ , and therefore also the $i^{th}$ coordinate of the gradient vector (if it exists), is given as follows when $x_{i}\neq 0$ :

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{j=1}^{n}(a_{ij}+a_{ji})x_{j}\right)+b_{i}+\lambda \operatorname {sgn} (x_{i})$

By the symmetry assumption, this becomes:

${\frac {\partial f}{\partial x_{i}}}=\left(\sum _{j=1}^{n}2a_{ij}x_{j}\right)+b_{i}+\lambda \operatorname {sgn} (x_{i})$

The partial derivative is undefined when $x_{i}=0$ .

The gradient vector exists if and only if all the coordinates are nonzero.

In vector notation, the gradient vector is as follows for all ${\vec {x}}$ with all coordinates nonzero:

$\nabla f({\vec {x}})=2A{\vec {x}}+{\vec {b}}+\lambda {\overline {\operatorname {sgn} }}({\vec {x}})$

where ${\overline {\operatorname {sgn} }}$ is the signum vector function.

Hessian matrix

The Hessian matrix of the function, defined wherever all the coordinates are nonzero, is the matrix $2A$ .

@@ Line 43: / Line 43: @@
 where <math>\overline{\operatorname{sgn}}</math> is the [[signum vector function]].
+===Hessian matrix===
+The Hessian matrix of the function, defined wherever all the coordinates are nonzero, is the matrix <math>2A</math>.