Logistic log-loss function of one variable

From Calculus

Definition

The logistic log-loss function of one variable is obtained by composing the logarithmic cost function with the logistic function, and it is of importance in the analysis of logistic regression.

Explicitly, the function has the form:

where is the logistic function and denotes the natural logarithm. Explicitly, .

Note that , so the above can be written as:

We restrict to the interval . Conceptually, is the corresponding probability.

More explicitly, is the function:

Key data

Item Value
default domain all of , i.e., all reals
range where is the minimum value, given as .
local maximum value and points of attainment No local maximum values
local minimum value and points of attainment Local minimum value , attained at .
derivative where is the logistic function
second derivative
third derivative

Differentiation

WHAT WE USE: chain rule for differentiation, Logistic function#First derivative

First derivative

We use that:

or equivalently:

Similarly:

Plugging these in, we get:

This simplifies to:

Second derivative

Using the first derivative and the expression for , we obtain:

Note that the second derivative is independent of .

Points and intervals of interest

Critical points

The expression for the derivative is:

This is always defined, and is zero iff . This happens if and only if , i.e.:

The value of at this point is:

Intervals of increase and decrease

We have that for and for . Since is increasing, we get that for and for . Thus, is decreasing to the left of its unique critical point and increasing to the right of the critical point.

Thus, the critical point is the unique point of local and absolute minimum for .

Subtle point about how the result is independent of the arithmetic form of the logistic function

The fact that the minimum occurs at and has value is actually independent of being a logistic function. If we replaced by any increasing function with range , the minimum would occur at . The logistic function is particularly nice for other theoretical and practical reasons -- for instance, the expression for the derivative is very simple for the logistic function, but can be complicated for other functions.

The general observation that the minimum occurs at is based on the fact that logarithmic scoring is a proper scoring function.

Intervals of concave up and concave down

The second derivative is:

This is always positive, so the function is a convex function and its graph is concave up everywhere. In particular, there are no points of inflection.

Optimization methods

We will denote the point of minimum, , as .

Method Domain of convergence Order of convergence (general case) Convergence rate (general case) Convergence rate (special cases)
Gradient descent with constant learning rate for a logistic log-loss function of one variable, with learning rate If with the point of minimum, convergence occurs from any starting point, but can be quite slow. Since we don't know , setting is a safe choice. Linear convergence
If we set , we get .
Case and : quadratic convergence with convergence rate .
Case that : cubic convergence with convergence rate
Newton's method for optimization of a logistic log-loss function of one variable Not all reals. However, the domain includes the interval between 0 and , and convergence is monotone on the interval. Quadratic convergence. Case that : cubic convergence with convergence rate