Probabilistic Performance Evaluation for Multiclass Classification Using the Posterior Balanced Accuracy

  • Henry Carrillo
  • Kay H. Brodersen
  • José A. Castellanos
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 252)


An important problem in robotics is the empirical evaluation of classification algorithms that allow a robotic system to make accurate categorical predictions about its environment. Current algorithms are often assessed using sample statistics that can be difficult to interpret correctly and do not always provide a principled way of comparing competing algorithms. In this paper, we present a probabilistic alternative based on a Bayesian framework for inferring on balanced accuracies. Using the proposed probabilistic evaluation, it is possible to assess the balanced accuracy’s posterior distribution of binary and multiclass classifiers. In addition, competing classifiers can be compared based on their respective posterior distributions. We illustrate the practical utility of our scheme and its properties by reanalyzing the performance of a recently published algorithm in the domain of visual action detection and on synthetic data. To facilitate its use, we provide an open-source MATLAB implementation.


multiclass classifiers accuracy balanced accuracy probabilistic performance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Akbani, R., Kwek, S.S., Japkowicz, N.: Applying Support Vector Machines to Imbalanced Datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Aksoy, E., Abramov, A., Worgotter, F., Dellen, B.: Categorizing Object-action Relations from Semantic Scene Graphs. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 398–405 (May 2010)Google Scholar
  3. 3.
    Andreopoulos, A., Hasler, S., Wersing, H., Janssen, H., Tsotsos, J., Korner, E.: Active 3D Object Localization Using a Humanoid Robot. IEEE Transactions on Robotics 27(1), 47–64 (2011)CrossRefGoogle Scholar
  4. 4.
    Berger, J.O.: Could fisher, jeffreys and neyman have agreed on testing? Statistical Science 18(1), 1–32 (2003)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag New York, Inc., Secaucus (2006)Google Scholar
  6. 6.
    Brodersen, K.H., Mathys, C., Chumbley, J.R., Daunizeau, J., Ong, C.S., Buhmann, J.M., Stephan, K.E.: Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets. Journal of Machine Learning Research 13, 3133–3176 (2012)MathSciNetGoogle Scholar
  7. 7.
    Brodersen, K.H., Ong, C.S., Stephan, K.E., Buhmann, J.M.: The Balanced Accuracy and Its Posterior Distribution. In: 2010 20th International Conference on Pattern Recognition (ICPR), pp. 3121–3124 (August 2010)Google Scholar
  8. 8.
    Cadena, C., Galvez-Lopez, D., Tardos, J.D., Neira, J.: Robust Place Recognition With Stereo Sequences. IEEE Transactions on Robotics 28(4), 871–885 (2012)CrossRefGoogle Scholar
  9. 9.
    Carrillo, H.: GBAC (2013),
  10. 10.
    Carrillo, H., Latif, Y., Neira, J., Castellanos, J.A.: Fast Minimum Uncertainty Search on a Graph Map Representation. In: IEEE / RSJ International Conference on Intelligent Robots and Systems (IROS 2012), Vilamoura, Algarve, Portugal (October 2012)Google Scholar
  11. 11.
    Carrillo, H., Reid, I., Castellanos, J.A.: On the Comparison of Uncertainty Criteria for Active SLAM. In: IEEE International Conference on Robotics and Automation, pp. 2080–2087 (2012)Google Scholar
  12. 12.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research (JAIR) 16, 321–357 (2002)MATHGoogle Scholar
  13. 13.
    Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37–46 (1960)CrossRefGoogle Scholar
  14. 14.
    Ess, A., Schindler, K., Leibe, B., Van Gool, L.: Object Detection and Tracking for Autonomous Navigation in Dynamic Environments. The International Journal of Robotics Research 29(14), 1707–1725 (2010)CrossRefGoogle Scholar
  15. 15.
    Galvez-Lopez, D., Tardos, J.D.: Bags of Binary Words for Fast Place Recognition in Image Sequences. IEEE Transactions on Robotics 28(5), 1188–1197 (2012)CrossRefGoogle Scholar
  16. 16.
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian data analysis. CRC press (2003)Google Scholar
  17. 17.
    Granstrm, K., Schn, T.B., Nieto, J.I., Ramos, F.T.: Learning to close loops from range data. The International Journal of Robotics Research 30(14), 1728–1754 (2011)CrossRefGoogle Scholar
  18. 18.
    Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A systematic Study. Intelligent Data Analysis 6(5), 429–449 (2002)MATHGoogle Scholar
  19. 19.
    Kerman, J.: Neutral noninformative and informative conjugate beta and gamma prior distributions. Electronic Journal of Statistics 5, 1450–1470 (2011)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Kruschke, J.K.: Doing Bayesian Data Analysis: A Tutorial with R and BUGS, 1st edn. Academic Press / Elsevier, Amsterdam (2011)Google Scholar
  21. 21.
    Landgrebe, T., Duin, R.: Efficient Multiclass ROC Approximation by Decomposition via Confusion Matrix Perturbation Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(5), 810–822 (2008)CrossRefGoogle Scholar
  22. 22.
    Leon-Garcia, A.: Probability and Random Processes for Electrical Engineers, 2nd edn. Addison-Wesley, Reading (1994)Google Scholar
  23. 23.
    Luo, G., Bergstrom, N., Ek, C., Kragic, D.: Representing Actions with Kernels. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2028–2035 (September 2011)Google Scholar
  24. 24.
    Murphy, K.P.: Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning series. The MIT Press, Cambridge (2012)Google Scholar
  25. 25.
    Nishii, R., Tanaka, S.: Accuracy and inaccuracy assessments in land-cover classification. IEEE Transactions on Geoscience and Remote Sensing 37(1), 491–498 (1999)CrossRefGoogle Scholar
  26. 26.
    Rijsbergen, C.J.V.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton (1979)Google Scholar
  27. 27.
    Siagian, C., Itti, L.: Biologically Inspired Mobile Robot Vision Localization. IEEE Transactions on Robotics 25(4), 861–873 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Henry Carrillo
    • 1
  • Kay H. Brodersen
    • 2
  • José A. Castellanos
    • 1
  1. 1.Instituto de Investigación en Ingeniería de AragónUniversidad de ZaragozaZaragozaSpain
  2. 2.Translational Neuromodeling Unit, Department of Information Technology and Electrical EngineeringSwiss Federal Institute of Technology (ETH Zurich)ZurichSwitzerland

Personalised recommendations