Multivariate Statistical Tests for Comparing Classification Algorithms

Yıldız, Olcay Taner; Aslan, Özlem; Alpaydın, Ethem

doi:10.1007/978-3-642-25566-3_1

Olcay Taner Yıldız¹⁷,
Özlem Aslan¹⁸ &
Ethem Alpaydın¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6683))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

5170 Accesses
13 Citations

Abstract

The misclassification error which is usually used in tests to compare classification algorithms, does not make a distinction between the sources of error, namely, false positives and false negatives. Instead of summing these in a single number, we propose to collect multivariate statistics and use multivariate tests on them. Information retrieval uses the measures of precision and recall, and signal detection uses true positive rate (tpr) and false positive rate (fpr) and a multivariate test can also use such two values instead of combining them in a single value, such as error or average precision. For example, we can have bivariate tests for (precision, recall) or (tpr, fpr). We propose to use the pairwise test based on Hotelling’s multivariate T ² test to compare two algorithms or multivariate analysis of variance (MANOVA) to compare L > 2 algorithms. In our experiments, we show that the multivariate tests have higher power than the univariate error test, that is, they can detect differences that the error test cannot, and we also discuss how the decisions made by different multivariate tests differ, to be able to point out where to use which. We also show how multivariate or univariate pairwise tests can be used as post-hoc tests after MANOVA to find cliques of algorithms, or order them along separate dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning classifiers. Neural Computation 10, 1895–1923 (1998)
Article Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of the International Conference on Machine Learning, ICML 2004, pp. 137–144 (2004)
Google Scholar
Seliya, N., Khoshgoftaar, T.M., Hulse, J.V.: Aggregating performance metrics for classifier evaluation. In: Proceedings of the 10th IEEE International Conference on Information Reuse and Integration (2009)
Google Scholar
Rencher, A.C.: Methods of Multivariate Analysis. Wiley and Sons, New York (1995)
MATH Google Scholar
Blake, C., Merz, C.: UCI repository of machine learning databases (2000)
Google Scholar
Hinton, G.H.: Delve project, data for evaluating learning in valid experiments (1996)
Google Scholar
Statnikov, A., Aliferis, C., Tsamardinos, I., Hardin, D., Levy, S.: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 21, 631–643 (2005)
Article Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, vol. 148, pp. 233–240 (2006)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52, 239–281 (2003)
Article MATH Google Scholar
Bouckaert, R., Frank, E.: Evaluating the replicability of significance tests for comparing learning algorithms. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 3–12. Springer, Heidelberg (2004)
Chapter Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Işık University, TR-34980, Istanbul, Turkey
Olcay Taner Yıldız
Dept. of Computer Engineering, Boğaziçi University, TR-34342, Istanbul, Turkey
Özlem Aslan & Ethem Alpaydın

Authors

Olcay Taner Yıldız
View author publications
You can also search for this author in PubMed Google Scholar
Özlem Aslan
View author publications
You can also search for this author in PubMed Google Scholar
Ethem Alpaydın
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departmento de Computation, Centro de Investigacion y de Estudios, Avanzados del Instituto Politecnico Nacional (CINVESTAV-IPN), Av. IPN No. 2508, Col. San Pedro Zacatenco, D.F. 0360, Mexico, Mexico
Carlos A. Coello Coello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yıldız, O.T., Aslan, Ö., Alpaydın, E. (2011). Multivariate Statistical Tests for Comparing Classification Algorithms. In: Coello, C.A.C. (eds) Learning and Intelligent Optimization. LION 2011. Lecture Notes in Computer Science, vol 6683. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25566-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-25566-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25565-6
Online ISBN: 978-3-642-25566-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics