Logarithmic scoring rule is proper
Consider a random variable that can take distinct values . Suppose we estimate probabilities for these values respectively (with , ). The logarithmic scoring rule works as follows: for every instance of the random variable , we assign score equal to the negative of the logarithm of the corresponding probability . Explicitly, if the instances are , the total score is:
The claim is that, if the actual probabilities are , then the assignment that minimizes the expected value of the score is .
- Logarithmic scoring rule is the only proper scoring rule up to affine transformations in case of more than two classes
Reduction to one random instance
Under the assumption that the instances are independent of each other, it suffices to show the result for one instance.
Reduction to the case that all probabilities are strictly between zero and one
We now show that if any particular , the corresponding must equal 0, and if any , the corresponding must equal 1.
Fill this in later
Proof for one instance and where all the actual probabilities are nonzero and less than one
The expected value for one instance is:
In words, we weight each score by the probability that that score is attained.
We are constrained to lie on the codimension one hyperplane given by . We can therefore use the idea of Lagrange multipliers to find the optima. The gradient vector of the expected value function is the vector with coordinates:
The normal vector to the hyperplane is given as the gradient vector of the function , and is the vector:
By the theory of Lagrange multipliers, we have that at any local extreme value, there exists a value such that:
for all . In other words:
for all . Adding up, we get:
We have that as well (these are the actual probabilities, so they add up to 1), so we get:
so . Plugging back, we get that the only point that could potentially be a point of local extremum satisfies for all .
We can now verify that this is indeed a point of local minimum. Fill this in later
We can also verify that the absolute minimum does not occur at the boundary: if but we set , then our expected score is , because there's a nonzero probability of paying an infinite cost, namely, in the case that the random variable takes the value .