A Bayesian interpretation of the confusion matrix

  • Olivier Caelen


We propose a way to infer distributions of any performance indicator computed from the confusion matrix. This allows us to evaluate the variability of an indicator and to assess the importance of an observed difference between two performance indicators. We will assume that the values in a confusion matrix are observations coming from a multinomial distribution. Our method is based on a Bayesian approach in which the unknown parameters of the multinomial probability function themselves are assumed to be generated from a random vector. We will show that these unknown parameters follow a Dirichlet distribution. Thanks to the Bayesian approach, we also benefit from an elegant way of injecting prior knowledge into the distributions. Experiments are done on real and synthetic data sets and assess our method’s ability to construct accurate distributions.


Confusion matrix Classification Bayesian statistics 

Mathematics Subject Classfication (2010)

68T01 62G07 62F15 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240. ACM, New York (2006)Google Scholar
  2. 2.
    Efron, B.: Bootstrap methods: another look at the jackknife. In: Breakthroughs in Statistics, pp. 569–593. Springer, Berlin (1992)Google Scholar
  3. 3.
    Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978. Lawrence Erlbaum Associates Ltd (2001)Google Scholar
  4. 4.
    Forbes, C., Evans, M., Hastings, N., Peacock, B.: Statistical distributions. Wiley, Hoboken (2011)zbMATHGoogle Scholar
  5. 5.
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian data analysis, vol. 2. Chapman & Hall/CRC Boca Raton, FL (2014)zbMATHGoogle Scholar
  6. 6.
    Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Advances in Information Retrieval, pp. 345–359. Springer, Berlin (2005)Google Scholar
  7. 7.
    James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning, vol. 6. Springer, Berlin (2013)CrossRefzbMATHGoogle Scholar
  8. 8.
    Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002). Google Scholar
  9. 9.
    Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness markedness and correlation (2011)Google Scholar
  10. 10.
    Wackerly, D., Mendenhall, W., Scheaffer, R.: Mathematical statistics with applications. Cengage Learning, Boston (2008)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.R & D, High Processing and VolumeWorldline S.A.Belgium

Personalised recommendations