Logistic log-loss function of one variable: Difference between revisions

Revision as of 21:28, 14 September 2014

Definition

The logistic log-loss function of one variable is obtained by composing the logarithmic cost function with the logistic function, and it is of importance in the analysis of logistic regression.

Explicitly, the function has the form:

$f (x) = - (p \ln (g (x)) + (1 - p) \ln (1 - g (x)))$

where $g$ is the logistic function and $\ln$ denotes the natural logarithm. Explicitly, $g (x) = \frac{1}{1 + e^{- x}}$ .

Note that $1 - g (x) = g (- x)$ , so the above can be written as:

$f (x) = - (p \ln (g (x)) + (1 - p) \ln (g (- x)))$

We restrict $p$ to the interval $(0, 1)$ . Conceptually, $p$ is the corresponding probability.

More explicitly, $f$ is the function:

$f (x) = p \ln (1 + e^{- x}) + (1 - p) \ln (1 + e^{x})$

Key data

Item	Value
default domain	all of $R$ , i.e., all reals
range	$[m, \infty)$ where $m$ is the minimum value, given as $- (p \ln p + (1 - p) \ln (1 - p))$ .
local maximum value and points of attainment	No local maximum values
local minimum value and points of attainment	Local minimum value $- (p \ln p + (1 - p) \ln (1 - p))$ , attained at $x = \ln (\frac{p}{1 - p})$ .
derivative	$g (x) - p$ where $g$ is the logistic function
second derivative	$g (x) (1 - g (x)) = g (x) g (- x)$
third derivative	$g (x) g (- x) (g (x) - g (- x)) = g (x) (1 - g (x)) (1 - 2 g (x))$

Differentiation

WHAT WE USE: chain rule for differentiation, Logistic function#First derivative

First derivative

We use that:

$g^{'} (x) = g (x) (1 - g (x)) = g (x) g (- x)$

or equivalently:

$\frac{d}{d x} (\ln (g (x)) = 1 - g (x) = g (- x)$

Similarly:

$\frac{d}{d x} (\ln (g (- x)) = - g (x)$

Plugging these in, we get:

$f^{'} (x) = - (p (1 - g (x)) + (1 - p) (- g (x)))$

This simplifies to:

$f^{'} (x) = g (x) - p$

Second derivative

Using the first derivative and the expression for $g^{'}$ , we obtain:

$f^{″} (x) = g (x) (1 - g (x)) = g (x) g (- x)$

Note that the second derivative is independent of $p$ .

Points and intervals of interest

Critical points

The expression for the derivative is:

$f^{'} (x) = g (x) - p$

This is always defined, and is zero iff $g (x) = p$ . This happens if and only if $x = g^{- 1} (p)$ , i.e.:

$x = g^{- 1} (p) = \ln (\frac{p}{1 - p})$

The value of $f$ at this point is:

$f (x) = - (p \ln p + (1 - p) \ln (1 - p))$

Intervals of increase and decrease

We have that $f^{'} (x) < 0$ for $g (x) < p$ and $f^{'} (x) > 0$ for $g (x) > p$ . Since $g$ is increasing, we get that $f^{'} (x) < 0$ for $x < g^{- 1} (p)$ and $f^{'} (x) > 0$ for $x > g^{- 1} (p)$ . Thus, $f$ is decreasing to the left of its unique critical point and increasing to the right of the critical point.

Thus, the critical point $g^{- 1} (p)$ is the unique point of local and absolute minimum for $f$ .

Subtle point about how the result is independent of the arithmetic form of the logistic function

The fact that the minimum occurs at $g^{- 1} (p)$ and has value $- (p \ln p + (1 - p) \ln (1 - p))$ is actually independent of $g$ being a logistic function. If we replaced $g$ by any increasing function with range $(0, 1)$ , the minimum would occur at $g^{- 1} (p)$ . The logistic function is particularly nice for other theoretical and practical reasons -- for instance, the expression for the derivative is very simple for the logistic function, but can be complicated for other functions.

The general observation that the minimum occurs at $g^{- 1} (p)$ is based on the fact that logarithmic scoring is a proper scoring function.

Intervals of concave up and concave down

The second derivative is:

$f^{″} (x) = g^{'} (x) = g (x) g (- x) = g (x) (1 - g (x))$

This is always positive, so the function is a convex function and its graph is concave up everywhere. In particular, there are no points of inflection.

@@ Line 13: / Line 13: @@
 <math>f(x) = -(p \ln (g(x)) + (1 - p) \ln (g(-x)))</math>
-We restrict <math>p</math> to the interval <math>[0,1]</math>. Conceptually, <math>p</math> is the corresponding probability.
+We restrict <math>p</math> to the interval <math>(0,1)</math>. Conceptually, <math>p</math> is the corresponding probability.
 More explicitly, <math>f</math> is the function: