Properization: constructing proper scoring rules via Bayes acts

Brehmer, Jonas R.; Gneiting, Tilmann

doi:10.1007/s10463-019-00705-7

Properization: constructing proper scoring rules via Bayes acts

Published: 22 February 2019

Volume 72, pages 659–673, (2020)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Jonas R. Brehmer¹ &
Tilmann Gneiting^2,3

786 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

Scoring rules serve to quantify predictive performance. A scoring rule is proper if truth telling is an optimal strategy in expectation. Subject to customary regularity conditions, every scoring rule can be made proper, by applying a special case of the Bayes act construction studied by Grünwald and Dawid (Ann Stat 32:1367–1433, 2004) and Dawid (Ann Inst Stat Math 59:77–93, 2007), to which we refer as properization. We discuss examples from the recent literature and apply the construction to create new types, and reinterpret existing forms, of proper scoring rules and consistent scoring functions. In an abstract setting, we formulate sufficient conditions under which Bayes acts exist and scoring rules can be made proper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Credences and Trustworthiness: a Calibrationist Account

Article 10 February 2024

John Wilcox

The uniqueness of local proper scoring rules: the logarithmic family

Article 02 November 2019

Jingni Yang

Scoring, truthlikeness, and value

Article 28 April 2021

Igor Douven

Notes

As noted by Parry (2016), the improper score ${S}_1$ shares its (concave) expected score function $P \mapsto {S}_1(P,P)$ with the proper Brier score. This illustrates the importance of the second condition in Theorem 1 of Gneiting and Raftery (2007): For a scoring rule ${S}$, the (strict) concavity of the expected score function $ G(P) := {S}(P,P)$ is equivalent to the (strict) propriety of ${S}$ only if, furthermore, $- {S}(P,\cdot )$ is a subtangent of $- G$ at P.
See, e.g., http://www.fharrell.com/post/class-damage/ and http://www.fharrell.com/post/classification/.

References

Aliprantis, C. D., Border, K. C. (2006). Infinite dimensional analysis third ed. Berlin: Springer.
Christensen, H. M., Moroz, I. M., Palmer, T. N. (2014). Evaluation of ensemble forecast uncertainty using a new proper score: Application to medium-range and seasonal forecasts. Quarterly Journal of the Royal Meteorological Society, 141, 538–549.
Article Google Scholar
Dawid, A. P. (1986). Probability forecasting. In S. Kotz, N. L. Johnson, C. B. Read (Eds.), Encyclopedia of statistical sciences, Vol. 7, pp. 210–218. New York: Wiley.
Dawid, A. P. (2007). The geometry of proper scoring rules. Annals of the Institute of Statistical Mathematics, 59, 77–93.
Article MathSciNet Google Scholar
Dawid, A. P., Musio, M. (2014). Theory and applications of proper scoring rules. Metron, 72, 169–183.
Article MathSciNet Google Scholar
Diks, C., Panchenko, V., van Dijk, D. (2011). Likelihood-based scoring rules for comparing density forecasts in tails. Journal of Econometrics, 163, 215–230.
Article MathSciNet Google Scholar
Ebert, E., Brown, B., Göber, M., Haiden, T., Mittermaier, M., Nurmi, P., Wilson, L., Jackson, S., Johnston, P., Schuster, D. (2018). The WMO challenge to develop and demonstrate the best new user-oriented forecast verification metric. Meteorologische Zeitschrift, 27, 435–440.
Article Google Scholar
Ebert, E., Wilson, L., Weigel, A., Mittermaier, M., Nurmi, P., Gill, P., Göber, M., Joslyn, S., Brown, B., Fowler, T., Watkins, A. (2013). Progress and challenges in forecast verification. Meteorological Applications, 20, 130–139.
Article Google Scholar
Ehm, W., Gneiting, T., Jordan, A., Krüger, F. (2016). Of quantiles and expectiles: Consistent scoring functions, Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B. Statistical Methodology, 78, 505–562.
Article MathSciNet Google Scholar
Ferguson, T. S. (1967). Mathematical statistics: A decision theoretic approach. Probability and mathematical statistics, Vol. 1. New York: Academic Press.
MATH Google Scholar
Ferri, C., Hernández-Orallo, J., Modroiu, R. (2009). An experimental comparison of performance measures for classification. Pattern Recognition Letters, 30, 27–38.
Article Google Scholar
Ferro, C. A. T. (2017). Measuring forecast performance in the presence of observation error. Quarterly Journal of the Royal Meteorological Society, 143, 2665–2676.
Article Google Scholar
Fissler, T., Ziegel, J. F. (2016). Higher order elicitability and Osband’s principle. The Annals of Statistics, 44, 1680–1707.
Article MathSciNet Google Scholar
Friederichs, P., Thorarinsdottir, T. L. (2012). Forecast verification for extreme value distributions with an application to probabilistic peak wind prediction. Environmetrics, 23, 579–594.
Article MathSciNet Google Scholar
Gelfand, A. E., Ghosh, S. K. (1998). Model choice: A minimum posterior predictive loss approach. Biometrika, 85, 1–11.
Article MathSciNet Google Scholar
Gelman, A., Hwang, J., Vehtari, A. (2014). Understanding predictive information criteria for Bayesian models. Statistics and Computing, 24, 997–1016.
Article MathSciNet Google Scholar
Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106, 746–762.
Article MathSciNet Google Scholar
Gneiting, T., Katzfuss, M. (2014). Probabilistic forecasting. Annual Review of Statistics and Its Application, 1, 125–151.
Article Google Scholar
Gneiting, T., Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, 359–378.
Article MathSciNet Google Scholar
Gneiting, T., Ranjan, R. (2011). Comparing density forecasts using threshold- and quantile-weighted scoring rules. Journal of Business & Economic Statistics, 29, 411–422.
Article MathSciNet Google Scholar
Granger, C. W., Machina, M. J. (2006). Forecasting and decision theory. In G. Elliott, C. Granger, A. Timmermann (Eds.), Handbook of economic forecasting, Vol. 1, pp. 81–98. Amsterdam: Elsevier.
Granger, C. W. J., Pesaran, M. H. (2000). Economic and statistical measures of forecast accuracy. Journal of Forecasting, 19, 537–560.
Article Google Scholar
Grünwald, P. D., Dawid, A. P. (2004). Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory. The Annals of Statistics, 32, 1367–1433.
Article MathSciNet Google Scholar
Harrell, F. E, Jr. (2015). Regression modeling strategies. Springer series in statistics 2nd ed. Cham: Springer.
Holzmann, H., Klar, B. (2017). Focusing on regions of interest in forecast evaluation. The Annals of Applied Statistics, 11, 2404–2431.
Article MathSciNet Google Scholar
Laud, P. W., Ibrahim, J. G. (1995). Predictive model selection. Journal of the Royal Statistical Society Series B. Methodological, 57, 247–262.
MathSciNet MATH Google Scholar
M4 Team. (2018). M4 competitor’s guide: Prizes and rules. Available online at https://www.m4.unic.ac.cy/wp-content/uploads/2018/03/M4-Competitors-Guide.pdf. Accessed 13 Dec 2018.
Makridakis, S., Spiliotis, E., Assimakopoulos, V. (2018). The M4 competition: Results, findings, conclusion and way forward. International Journal of Forecasting, 34, 802–808.
Article Google Scholar
Müller, W. A., Appenzeller, C., Doblas-Reyes, F. J., Liniger, M. A. (2005). A debiased ranked probability skill score to evaluate probabilistic ensemble forecasts with small ensemble sizes. Journal of Climate, 18, 1513–1523.
Article Google Scholar
Parry, M. (2016). Linear scoring rules for probabilistic binary classification. Electronic Journal of Statistics, 10, 1596–1607.
Article MathSciNet Google Scholar
Reid, M. D., Williamson, R. C. (2010). Composite binary losses. Journal of Machine Learning Research, 11, 2387–2422.
MathSciNet MATH Google Scholar
van Erven, T., Reid, M. D., Williamson, R. C. (2012). Mixability is Bayes risk curvature relative to log loss. Journal of Machine Learning Research, 13, 1639–1663.
MathSciNet MATH Google Scholar
Werner, D. (2018). Funktionalanalysis 8th ed. Berlin: Springer.
Williamson, R. C., Vernet, E., Reid, M. D. (2016). Composite multiclass losses. Journal of Machine Learning Research, 17, 1–52.
MathSciNet MATH Google Scholar
Wilson, L. J., Burrows, W. R., Lanzinger, A. (1999). A strategy for verification of weather element forecasts from an ensemble prediction system. Monthly Weather Review, 127, 956–970.
Article Google Scholar
Zamo, M., Naveau, P. (2018). Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Mathematical Geosciences, 50, 209–234.
Article MathSciNet Google Scholar

Download references

Acknowledgements

Tilmann Gneiting is grateful for funding by the Klaus Tschira Foundation and by the European Union Seventh Framework Programme under grant agreement 290976. Part of his research leading to these results has been done within subproject C7 “Statistical postprocessing and stochastic physics for ensemble predictions” of the Transregional Collaborative Research Center SFB / TRR 165 “Waves to Weather” (www.wavestoweather.de) funded by the German Research Foundation (DFG). Jonas Brehmer gratefully acknowledges support by DFG through Research Training Group RTG 1953. We thank Tobias Fissler, Rafael Frongillo, Alexander Jordan, and Matthew Parry for instructive discussions, and we are grateful to the editor and two anonymous referees for thoughtful comments and suggestions.

Author information

Authors and Affiliations

Institute for Mathematics, University of Mannheim, A5, 6, 68131, Mannheim, Germany
Jonas R. Brehmer
Institute for Stochastics, Karlsruhe Institute of Technology (KIT), Englerstraße 2, 76131, Karlsruhe, Germany
Tilmann Gneiting
Computational Statistics Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118, Heidelberg, Germany
Tilmann Gneiting

Authors

Jonas R. Brehmer
View author publications
You can also search for this author in PubMed Google Scholar
Tilmann Gneiting
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonas R. Brehmer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Here, we present detailed arguments for the technical claims in Examples 5, 6, 7, and 9 as well as the proofs of Theorems 2 and 3.

1.1 Details for Example 5

We fix some distribution P and start with the case $\alpha > 1$. An application of Fubini’s theorem gives

$$\begin{aligned} {S}_\alpha (Q,P) = \int \int \vert Q(x) - \mathbb {1}\left( y \le x\right) \vert ^\alpha \,\mathrm{d}P(y) \,\mathrm{d}x . \end{aligned}$$

(8)

Given $x \in \mathbb {R}$, we seek the value $Q(x) \in [0,1]$ that minimizes the inner integral in (8). If x is such that $P(x) \in \lbrace 0, 1 \rbrace $, the equality $\mathbb {1}\left( y \le x\right) = P(x)$ holds for P-almost all y, hence $Q(x)= P(x)$ is the unique minimizer. If x satisfies $P(x) \in (0,1)$, define the function

$$\begin{aligned} g_{x,P}(q) := \int \vert q - \mathbb {1}\left( y \le x\right) \vert ^\alpha \,\mathrm{d}P(y) = (1-P(x)) q^\alpha + P(x) (1-q)^\alpha , \end{aligned}$$

which is strictly convex in $q \in (0,1)$ with derivative

$$\begin{aligned} g_{x,P}'(q) = \alpha (1- P(x)) q^{\alpha - 1} - \alpha P(x) (1- q)^{\alpha - 1} \end{aligned}$$

and a unique minimum at $q = q_{x,P}^* \in (0,1)$. As a consequence, the minimizing value Q(x) is given by

$$\begin{aligned} Q(x) = q_{x,P}^* = \left( 1 + \left( \frac{1-P(x)}{P(x)} \right) ^{1/(\alpha - 1)} \right) ^{-1}. \end{aligned}$$

The function Q defined by the minimizers Q(x), $x \in \mathbb {R}$ is a minimizer of ${S}_\alpha ( \cdot ,P)$ and if ${S}_\alpha (Q, P)$ is finite, it is unique Lebesgue almost surely. Since $\alpha >1$, the function Q has the properties of a distribution function, and hence, $P^*$ defined by (4) is a Bayes act for P. Moreover, Eq. (4) shows that the relation between P and $P^*$ is one-to-one.

It remains to be checked under which conditions the properization of ${S}_\alpha $ is not only proper but strictly proper. The representation (4) along with two Taylor expansions implies that $P^*$ behaves like $P^{1/(\alpha -1)}$ in the tails. This has two consequences. At first, the above arguments show that for ${S}_\alpha (P^*, P)$ to be finite $x \mapsto g_{x,P} (P^*(x))$ has to be integrable with respect to Lebesgue measure. Hence, the tail behavior of $P^*$ and the inequality $\alpha /(\alpha - 1) > 1$ for $\alpha > 1$ show that ${S}_\alpha (P^*, P)$ is finite for $P \in \mathscr {P}_1$. Second, $P^*$ has a lighter tail than P for $\alpha \in (1,2)$ and a heavier tail for $\alpha > 2$. In the latter case, $P \in \mathscr {P}_1$ does not necessarily imply $P^* \in \mathscr {P}_1$. Hence, without additional assumptions, strict propriety of the properized score (3) can only be ensured relative to $\mathscr {P}_\mathrm {c}$ for $\alpha > 2$ and relative to the class $\mathscr {P}_1$ for $\alpha \in (1, 2]$.

We now turn to $\alpha \in (0,1)$. In this case, the function $g_{x,P}$ is strictly concave, and its unique minimum is at $q = 0$ for $P(x) < \frac{1}{2}$ and at $q = 1$ for $P(x) > \frac{1}{2}$. If $P(x) = \frac{1}{2}$, then both 0 and 1 are minima. Arguing as above, every Bayes act $P^*$ is a Dirac measure in a median of P.

Finally, $\alpha = 1$ implies that $g_{x,P}$ is linear, thus, as for $\alpha \in (0,1)$, every Dirac measure in a median of P is a Bayes act. The only difference to the case $\alpha \in (0,1)$ is that if there is more than one median, there are Bayes acts other than Dirac measures, since $g_{x,P}$ is constant for all x satisfying $P(x) = \frac{1}{2}$.

1.2 Details for Example 6

Let P, Q and $\varPhi $ be distribution functions. By the definition of the convolution operator

$$\begin{aligned} \int \mathbb {1}\left( y \le x\right) \,\mathrm{d}(Q * \varPhi ) (y) = \int \varPhi (x-y) \,\mathrm{d}Q(y) \end{aligned}$$

holds for $x \in \mathbb {R}$. Using this identity and Fubini’s theorem leads to

$$\begin{aligned} {S}_\varPhi (P,Q)&= \int \! \int \left( P(x)^2 - 2 P(x) \varPhi (x-y) + \varPhi (x-y)^2 \right) \,\mathrm{d}Q(y) \,\mathrm{d}x \\&= \int \! \int \left( P(x)^2 - 2 P(x) \mathbb {1}\left( y \le x\right) + \mathbb {1}\left( y \le x\right) \right) \,\mathrm{d}(Q * \varPhi )(y) \,\mathrm{d}x \\&\quad + \int \! \int \varPhi (x-y) (\varPhi (x-y) - 1) \,\mathrm{d}Q(y) \,\mathrm{d}x \\&= \int \! \int (P(x) - \mathbb {1}\left( y \le x\right) )^2 \,\mathrm{d}x \,\mathrm{d}(Q * \varPhi )(y) - \int \varPhi (x) (1- \varPhi (x)) \,\mathrm{d}x, \end{aligned}$$

which verifies equality in (5). Moreover, the strict propriety of the CRPS relative to the class $\mathscr {P}_1$ gives ${S}_\varPhi (P, Q) < \infty $ for $P, Q, \varPhi \in \mathscr {P}_1$, thereby demonstrating that the Bayes act is unique in this situation.

1.3 Details for Example 7

For distributions $P, Q \in \mathscr {P}$ and $c > 0$, the Fubini–Tonelli theorem and the definition of the convolution operator give

$$\begin{aligned} {S}^\varphi (P,Q)&= - \int \int \varphi (x-y) {S}(P,x) \,\mathrm{d}Q(y) \,\mathrm{d}x \\&= \int \int \varphi (x-y) \,\mathrm{d}Q(y) \, {S}(P,x) \,\mathrm{d}x = {S}(P, Q * \varPhi ), \end{aligned}$$

so the stated (unique) Bayes act under ${S}^\varphi $ follows from the (strict) propriety of ${S}$. Proceeding as in the details for Example 6, we verify identity (6).

For $P \in \mathscr {L}$, the same calculations as above show that the probability score satisfies

$$\begin{aligned} \mathrm {PS}_c(P,Q) = 2c \int \frac{Q(x + c) - Q(x - c)}{2c} \, \mathrm {LinS}(P,x) \,\mathrm{d}x, \end{aligned}$$

where $\mathrm {LinS}(P,y) = - p(y)$ is the linear score. Consequently, to demonstrate that Theorem 1 is neither applicable to $\mathrm {PS}_c$ nor to $\mathrm {LinS}$, it suffices to show that there is a distribution Q such that $P \mapsto \mathrm {LinS}(P,Q)$ does not have a minimizer. We use an argument that generalizes the construction in Section 4.1 of Gneiting and Raftery (2007) who show that $\mathrm {LinS}$ is improper. Let q be a density, symmetric around zero and strictly increasing on $(-\infty , 0)$. Let $\epsilon > 0$ and define the interval $I_k := ((2k - 1) \epsilon , (2k + 1) \epsilon ]$ for $k \in \mathbb {Z}$. Suppose p is a density with positive mass on some interval $I_k$ for $k \ne 0$. Due to the properties of q, the score $\mathrm {LinS}(P,Q)$ can be reduced by substituting the density defined by

$$\begin{aligned} {\tilde{p}}(x) := p(x) - \mathbb {1}\left( x \in I_k\right) \, p(x) + \mathbb {1}\left( x + 2k \epsilon \in I_k\right) \, p(x + 2k \epsilon ) \end{aligned}$$

for p, i.e., by shifting the entire probability mass from $I_k$ to the modal interval $I_0$. Repeating this argument for any $\epsilon > 0$ shows that no density p can be a minimizer of the expected score $\mathrm {LinS}(P,Q)$. Note that the assumptions on q are stronger than necessary in order to facilitate the argument. They can be relaxed at the cost of a more elaborate proof.

1.4 Details for Example 9

For any probability distribution P and $x \in \mathbb {R}$, we obtain

$$\begin{aligned} s(x,P) = \int \frac{\vert x - y \vert }{\vert x \vert + \vert y \vert } \mathbb {1}\left( x \ne y\right) \,\mathrm{d}P(y) , \end{aligned}$$

which immediately gives $s(0,P) = P(\mathbb {R}\backslash \lbrace 0 \rbrace )$. This representation together with the dominated convergence theorem imply the continuity of $x \mapsto s(x,P)$ in $\mathbb {R}\backslash \lbrace 0 \rbrace $ as well as the limits given in (7).

1.5 Proof of Theorem 2

Let $(a_n)_{n \in \mathbb {N}} \subset \mathscr {A}$ be a sequence with $ a := \lim _{n \rightarrow \infty } a_n$. Since s is lower semicontinuous in its first component and uniformly bounded from below by g, Fatou’s lemma gives

$$\begin{aligned} \liminf _{n \rightarrow \infty } \int s(a_n, \omega ) \,\mathrm{d}P(\omega ) \ge \int \liminf _{n \rightarrow \infty } s(a_n,\omega ) \,\mathrm{d}P(\omega ) \ge s(a,P) \end{aligned}$$

for any $P \in \mathscr {P}$. Hence, $a \mapsto s(a, P)$ is a lower semicontinuous function for any $P \in \mathscr {P}$ and due to the assumed compactness of $\mathscr {A}$, the result now follows from Theorem 2.43 in Aliprantis and Border (2006).$\square $

1.6 Proof of Theorem 3

The same arguments as in the proof of Theorem 2 show that $a \mapsto s(a, P)$ is a weakly lower semicontinuous function for any $P \in \mathscr {P}$. If $P \in \mathscr {P}$ is such that this function is also coercive, we conclude by proceeding as in the proof of Satz III.5.8 in Werner (2018): In case $\inf _{a \in \mathscr {A}} s(a, P) = \infty $, there is nothing to prove. Otherwise, if $(a_n)_{n \in \mathbb {N}} \subset \mathscr {A}$ is a sequence such that $\lim _{n \rightarrow \infty } s(a_n, P) = \inf _{a \in \mathscr {A}} s(a, P)$ holds, the coercivity of $a \mapsto s(a,P)$ implies that this sequence is bounded. Together with the assumption that $\mathscr {A}$ is a subset of a reflexive Banach space, we obtain a subsequence $(a_{n_k})_{k \in \mathbb {N}}$ of $(a_n)_{n \in \mathbb {N}}$ which weakly converges to some element $a^*$; see, e.g., Theorem 6.25 in Aliprantis and Border (2006). Since $\mathscr {A}$ is weakly closed, it contains $a^*$ and weak lower semicontinuity gives $s(a^*,P) \le \lim _{k \rightarrow \infty } s(a_{n_k}, P) = \inf _{a \in \mathscr {A}} s(a, P)$, concluding the proof.$\square $

About this article

Cite this article

Brehmer, J.R., Gneiting, T. Properization: constructing proper scoring rules via Bayes acts. Ann Inst Stat Math 72, 659–673 (2020). https://doi.org/10.1007/s10463-019-00705-7

Download citation

Received: 17 August 2018
Revised: 14 December 2018
Published: 22 February 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10463-019-00705-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Properization: constructing proper scoring rules via Bayes acts

Abstract

Access this article

Similar content being viewed by others

Credences and Trustworthiness: a Calibrationist Account

The uniqueness of local proper scoring rules: the logarithmic family

Scoring, truthlikeness, and value

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

1.1 Details for Example 5

1.2 Details for Example 6

1.3 Details for Example 7

1.4 Details for Example 9

1.5 Proof of Theorem 2

1.6 Proof of Theorem 3

About this article

Cite this article

Keywords

Navigation

Properization: constructing proper scoring rules via Bayes acts

Abstract

Access this article

Similar content being viewed by others

Credences and Trustworthiness: a Calibrationist Account

The uniqueness of local proper scoring rules: the logarithmic family

Scoring, truthlikeness, and value

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

1.1 Details for Example 5

1.2 Details for Example 6

1.3 Details for Example 7

1.4 Details for Example 9

1.5 Proof of Theorem 2

1.6 Proof of Theorem 3

About this article

Cite this article

Share this article

Keywords

Search

Navigation