Logarithmic scoring rule is the only proper scoring rule up to affine transformations in case of more than two classes

From Calculus

Statement

Let be a proper scoring rule for a binary classification problem into classes, .

Explicitly, consider a random variable that can take distinct values . Suppose we estimate probabilities for these values respectively (with , ). The logarithmic scoring rule works as follows: for every instance of the random variable , we assign score of . Explicitly, if the instances are , the total score is:

is required to satisfy the condition that, if the actual probabilities are , then the assignment that minimizes the expected value of the score is .

Then, we must have of the form:

where and is any real number. In other words, must be the same as the logarithmic scoring rule, up to affine transformations.

Related facts