Logarithmic scoring rule

Definition

The logarithmic scoring rule is a scoring rule used to measure how well a given assignment of probabilities to values of a random variable performs on some real-world instances of the random variable. Explicitly, consider a random variable $X$ that can take $n$ distinct values $1,2,\dots ,n$ . Suppose we estimate probabilities $p_{1},p_{2},\dots ,p_{n}$ for these values respectively (with $p_{i}\in [0,1]$ , $\sum _{i=1}^{n}p_{i}=1$ ). The logarithmic scoring rule works as follows: for every instance of the random variable $X$ , we assign score equal to the negative of the logarithm of the corresponding probability $p_{i}$ . Explicitly, if the instances are $X_{1},X_{2},\dots ,X_{m}$ , the total score is:

$\sum _{j=1}^{m}-\ln(p_{X_{j}})$

Note that the base of the logarithms does not matter: it could be any fixed number greater than 1.

The smaller the value of the score with the logarithmic scoring rule, the better the assignment of probabilities has performed according to the rule.

Facts

Logarithmic scoring rule is proper: In expectation, if the actual probabilities of the values are $q_{1},q_{2},\dots ,q_{n}$ respectively, then the choice of values that minimizes our expected score is $p_{1}=q_{1},p_{2}=q_{2},\dots ,p_{n}=q_{n}$ .

Relation with logarithmic loss functions in logistic regression

The logarithmic loss function used in logistic regression relies on the logarithmic scoring rule, but with the following twist: the probabilities are not fixed numbers, but themselves depend on parameters that vary with the random instances that we are presented.