Deriving Consensus Rankings from Benchmarking Experiments

Hornik, Kurt; Meyer, David

doi:10.1007/978-3-540-70981-7_19

Kurt Hornik³ &
David Meyer⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

3809 Accesses
11 Citations

Abstract

Whereas benchmarking experiments are very frequently used to investigate the performance of statistical or machine learning algorithms for supervised and unsupervised learning tasks, overall analyses of such experiments are typically only carried out on a heuristic basis, if at all. We suggest to determine winners, and more generally, to derive a consensus ranking of the algorithms, as the linear order on the algorithms which minimizes average symmetric distance (Kemeny-Snell distance) to the performance relations on the individual benchmark data sets. This leads to binary programming problems which can typically be solved reasonably efficiently. We apply the approach to a medium-scale benchmarking experiment to assess the performance of Support Vector Machines in regression and classification problems, and compare the obtained consensus ranking with rankings obtained by simple scoring and Bradley-Terry modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BARTHÉLEMY, J.-P. and MONJARDET, B. (1981): The Median Procedure in Cluster Analysis and Social Choice Theory. Mathematical Social Sciences, 1, 235–267.
Article MATH Google Scholar
solve. Version 5.5.0.7.
Google Scholar
BLAKE, C.L. and MERZ, C.J. (1998): UCI Repository of Machine Learning Databases.
Google Scholar
BORDA, J.C. (1781): Mémoire sur les Élections au Scrutin. Histoire de l’Académie Royale des Sciences.
Google Scholar
BRADLEY, R.A. and TERRY, M.E. (1952): Rank Analysis of Incomplete Block Designs I: The Method of Paired Comparisons. Biometrika, 39, 324–245.
MathSciNet MATH Google Scholar
BRUNK, H.D. (1960): Mathematical Models for Ranking from Paired Comparison. Journal of the American Statistical Association, 55,291, 503–520.
Article MathSciNet MATH Google Scholar
BUTTREY, S.E. (2005): Calling the lp_solve Linear Program Software from R, S-PLUS and Excel. Journal of Statistical Software, 14, 4.
Article Google Scholar
CONDORCET, M.J.A. (1785): Essai sur l’Application de l’Analyse à la Probabilité des dÉcisions Rendues à la Pluralité des Voix. Paris.
Google Scholar
DAY, W.H.E. and MCMORRIS, F.R. (2003): Axiomatic Choice Theory in Group Choice and Bioconsensus. SIAM, Philadelphia.
Book Google Scholar
DECANI, J.S. (1969): Maximum Likelihood Paired Comparison Ranking by Linear Programming. Biometrika, 56,3, 537–545.
Article MathSciNet Google Scholar
GRÖTSCHEL, M. and WAKABAYASHI, Y. (1989): A Cutting Plane Algorithm for a Clustering Problem. Mathematical Programming, 45, 59–96.
Article MathSciNet MATH Google Scholar
HOTHORN, T., LEISCH, F., ZEILEIS, A. and HORNIK, K. (2005): The Design and Analysis of Benchmark Experiments. Journal of Computational and Graphical Statistics, 14,3, 675–699.
Article MathSciNet Google Scholar
KEMENY, J.G. and SNELL, J.L. (1962): Mathematical Models in the Social Sciences, Chapter Preference Rankings: An Axiomatic Approach. MIT Press, Cambridge.
MATH Google Scholar
MAKHORIN, A. (2006): GNU Linear Programming Kit (GLPK). Version 4.9.
Google Scholar
MARCOTORCHINO, F. and MICHAUD, P. (1982): Agregation de Similarites en Classification Automatique. Revue de Statistique Appliquée, XXX, 21–44.
MathSciNet Google Scholar
MEYER, D., LEISCH, F. and HORNIK, K. (2003): The Support Vector Machine under Test. Neurocomputing, 55, 169–186.
Article Google Scholar
MONJARDET, B. (1981): Metrics on Partially Ordered Set: A Survey. Discrete Mathematics, 35, 173–184.
Article MathSciNet MATH Google Scholar
NEWMAN, D.J., HETTICH, S., BLAKE, C.L. and MERZ, C.J. (1998): UCI Repository of Machine Learning Databases.
Google Scholar
R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Google Scholar
RÉGNIER, S. (1965): Sur Quelques Aspects Mathématiques des Problèmes de Classification Automatique. ICC Bulletin, 175–191.
Google Scholar
WAKABAYASHI, Y. (1998): The Complexity of Computing Medians of Relations. Resenhas, 3,3, 323–349.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, A-1090, Wien, Austria
Kurt Hornik
Department of Information Systems and Operations, Wirtschaftsuniversität Wien, A-1090, Wien, Austria
David Meyer

Authors

Kurt Hornik
View author publications
You can also search for this author in PubMed Google Scholar
David Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration and Economics, Bielefeld University, Universitätsstr. 25, 33501, Bielefeld, Germany
Reinhold Decker
Department of Economics, Freie Universität Berlin, Garystraße 21, 14195, Berlin, Germany
Hans -J. Lenz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hornik, K., Meyer, D. (2007). Deriving Consensus Rankings from Benchmarking Experiments. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-70981-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70980-0
Online ISBN: 978-3-540-70981-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics