Statistical Tests for Joint Analysis of Performance Measures

  • Alessio Benavoli
  • Cassio P. de CamposEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9505)


Recently there has been an increasing interest in the development of new methods using Pareto optimality to deal with multi-objective criteria (for example, accuracy and architectural complexity). Once one has learned a model based on their devised method, the problem is then how to compare it with the state of art. In machine learning, algorithms are typically evaluated by comparing their performance on different data sets by means of statistical tests. Unfortunately, the standard tests used for this purpose are not able to jointly consider performance measures. The aim of this paper is to resolve this issue by developing statistical procedures that are able to account for multiple competing measures at the same time. In particular, we develop two tests: a frequentist procedure based on the generalized likelihood-ratio test and a Bayesian procedure based on a multinomial-Dirichlet conjugate model. We further extend them by discovering conditional independences among measures to reduce the number of parameter of such models, as usually the number of studied cases is very reduced in such comparisons. Real data from a comparison among general purpose classifiers is used to show a practical application of our tests.


Generalized Likelihood Ratio Test (GLRT) Bayesian Test Original Multi-objective Problem Bayesian Networks (BN) Bayesian Hypothesis Testing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dehuri, S., Cho, S.-B.: Multi-criterion pareto based particle swarm optimized polynomial neural network for classification: a review and state-of-the-art. Comput. Sci. Rev. 3(1), 19–40 (2009)CrossRefGoogle Scholar
  2. 2.
    Cai, W., Chen, S., Zhang, D.: A multiobjective simultaneous learning framework for clustering and classification. IEEE Trans. Neural Netw. 21(2), 185–200 (2010)CrossRefGoogle Scholar
  3. 3.
    Shi, C., Kong, X., Philip, S.Y., Wang, B.: Multi-objective multi-label classification. In: SIAM International Conference on Data Mining, pp. 355–366. SIAM (2012)CrossRefGoogle Scholar
  4. 4.
    Hsiao, K.J., Xu, K., Calder, J., Hero, A.O.: Multi-criteria anomaly detection using pareto depth analysis. In: Advances in Neural Information Processing Systems, vol. 25, pp. 845–853. Curran Associates Inc (2012)Google Scholar
  5. 5.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Wilks, S.S.: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9, 60–62 (1938)CrossRefGoogle Scholar
  7. 7.
    Rice, J.: Mathematical Statistics and Data Analysis. Cengage Learning, Belmont (2006)Google Scholar
  8. 8.
    de Campos, Cassio P., Tong, Yan, Ji, Qiang: Constrained maximum likelihood learning of bayesian networks for facial action recognition. In: Forsyth, David, Torr, Philip, Zisserman, Andrew (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 168–181. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  9. 9.
    DasGupta, A.: Asymptotic Theory of Statistics and Probability, 1st edn. Springer, New York (2008) zbMATHGoogle Scholar
  10. 10.
    Walley, P.: Inferences from multinomial data: learning about a bag of marbles. J. R. Statist. Soc. B 58(1), 3–57 (1996)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Buntine, W.: Theory refinement on Bayesian networks. In: Conference on Uncertainty in Artificial Intelligence, pp. 52–60. Morgan Kaufmann (1991)Google Scholar
  12. 12.
    Cooper, G.F., Herskovits, E.: A bayesian method for the induction of probabilistic networks from data. Mach. Learn. 9, 309–347 (1992)zbMATHGoogle Scholar
  13. 13.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995)zbMATHGoogle Scholar
  14. 14.
    de Campos, C.P., Ji, Q.: Roperties of Bayesian Dirichlet scores to learn Bayesian network structures. In: AAAI Conference on Artificial Intelligence, pp. 431–436. AAAI Press, 2010Google Scholar
  15. 15.
    Silander, T., Myllymaki, P.: A simple approach for finding the globally optimal bayesian network structure. In: Conference on Uncertainty in Artificial Intelligence, pp. 445–452. AUAI (2006)Google Scholar
  16. 16.
    Barlett, M., Cussens, J.: Advances in Bayesian network learning using integer programming. In: Conference on Uncertainty in Artificial Intelligence, pp. 182–191. AUAI (2013)Google Scholar
  17. 17.
    de Campos, C.P., Ji, Q.: Efficient structure learning of Bayesian networks using constraints. J. Mach. Learn. Res. 12, 663–689 (2011)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Yuan, C., Malone, B.: Learning optimal Bayesian networks: a shortest path perspective. J. Artif. Intell. Res. 48, 23–65 (2013)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Istituto Dalle Molle di Studi Sull’Intelligenza Artificiale (IDSIA)Scuola Universitaria Professionale Della Svizzera Italiana (SUPSI)MannoSwitzerland
  2. 2.Università Della Svizzera Italiana (USI)LuganoSwitzerland
  3. 3.Queen’s University BelfastBelfastUK

Personalised recommendations