Skip to main content

Multivariate Statistical Tests for Comparing Classification Algorithms

  • Conference paper
Learning and Intelligent Optimization (LION 2011)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6683))

Included in the following conference series:

Abstract

The misclassification error which is usually used in tests to compare classification algorithms, does not make a distinction between the sources of error, namely, false positives and false negatives. Instead of summing these in a single number, we propose to collect multivariate statistics and use multivariate tests on them. Information retrieval uses the measures of precision and recall, and signal detection uses true positive rate (tpr) and false positive rate (fpr) and a multivariate test can also use such two values instead of combining them in a single value, such as error or average precision. For example, we can have bivariate tests for (precision, recall) or (tpr, fpr). We propose to use the pairwise test based on Hotelling’s multivariate T 2 test to compare two algorithms or multivariate analysis of variance (MANOVA) to compare L > 2 algorithms. In our experiments, we show that the multivariate tests have higher power than the univariate error test, that is, they can detect differences that the error test cannot, and we also discuss how the decisions made by different multivariate tests differ, to be able to point out where to use which. We also show how multivariate or univariate pairwise tests can be used as post-hoc tests after MANOVA to find cliques of algorithms, or order them along separate dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning classifiers. Neural Computation 10, 1895–1923 (1998)

    Article  Google Scholar 

  2. Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the International Conference on Machine Learning, ICML 2004, pp. 137–144 (2004)

    Google Scholar 

  3. Seliya, N., Khoshgoftaar, T.M., Hulse, J.V.: Aggregating performance metrics for classifier evaluation. In: Proceedings of the 10th IEEE International Conference on Information Reuse and Integration (2009)

    Google Scholar 

  4. Rencher, A.C.: Methods of Multivariate Analysis. Wiley and Sons, New York (1995)

    MATH  Google Scholar 

  5. Blake, C., Merz, C.: UCI repository of machine learning databases (2000)

    Google Scholar 

  6. Hinton, G.H.: Delve project, data for evaluating learning in valid experiments (1996)

    Google Scholar 

  7. Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)

    Article  Google Scholar 

  8. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001)

    Google Scholar 

  9. Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, vol. 148, pp. 233–240 (2006)

    Google Scholar 

  10. Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52, 239–281 (2003)

    Article  MATH  Google Scholar 

  11. Bouckaert, R., Frank, E.: Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 3–12. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yıldız, O.T., Aslan, Ö., Alpaydın, E. (2011). Multivariate Statistical Tests for Comparing Classification Algorithms. In: Coello, C.A.C. (eds) Learning and Intelligent Optimization. LION 2011. Lecture Notes in Computer Science, vol 6683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25566-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25566-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25565-6

  • Online ISBN: 978-3-642-25566-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics