Skip to main content

Deriving Consensus Rankings from Benchmarking Experiments

  • Conference paper
Advances in Data Analysis

Abstract

Whereas benchmarking experiments are very frequently used to investigate the performance of statistical or machine learning algorithms for supervised and unsupervised learning tasks, overall analyses of such experiments are typically only carried out on a heuristic basis, if at all. We suggest to determine winners, and more generally, to derive a consensus ranking of the algorithms, as the linear order on the algorithms which minimizes average symmetric distance (Kemeny-Snell distance) to the performance relations on the individual benchmark data sets. This leads to binary programming problems which can typically be solved reasonably efficiently. We apply the approach to a medium-scale benchmarking experiment to assess the performance of Support Vector Machines in regression and classification problems, and compare the obtained consensus ranking with rankings obtained by simple scoring and Bradley-Terry modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BARTHÉLEMY, J.-P. and MONJARDET, B. (1981): The Median Procedure in Cluster Analysis and Social Choice Theory. Mathematical Social Sciences, 1, 235–267.

    Article  MATH  Google Scholar 

  • solve. Version 5.5.0.7.

    Google Scholar 

  • BLAKE, C.L. and MERZ, C.J. (1998): UCI Repository of Machine Learning Databases.

    Google Scholar 

  • BORDA, J.C. (1781): Mémoire sur les Élections au Scrutin. Histoire de l’Académie Royale des Sciences.

    Google Scholar 

  • BRADLEY, R.A. and TERRY, M.E. (1952): Rank Analysis of Incomplete Block Designs I: The Method of Paired Comparisons. Biometrika, 39, 324–245.

    MathSciNet  MATH  Google Scholar 

  • BRUNK, H.D. (1960): Mathematical Models for Ranking from Paired Comparison. Journal of the American Statistical Association, 55,291, 503–520.

    Article  MathSciNet  MATH  Google Scholar 

  • BUTTREY, S.E. (2005): Calling the lp_solve Linear Program Software from R, S-PLUS and Excel. Journal of Statistical Software, 14, 4.

    Article  Google Scholar 

  • CONDORCET, M.J.A. (1785): Essai sur l’Application de l’Analyse à la Probabilité des dÉcisions Rendues à la Pluralité des Voix. Paris.

    Google Scholar 

  • DAY, W.H.E. and MCMORRIS, F.R. (2003): Axiomatic Choice Theory in Group Choice and Bioconsensus. SIAM, Philadelphia.

    Book  Google Scholar 

  • DECANI, J.S. (1969): Maximum Likelihood Paired Comparison Ranking by Linear Programming. Biometrika, 56,3, 537–545.

    Article  MathSciNet  Google Scholar 

  • GRÖTSCHEL, M. and WAKABAYASHI, Y. (1989): A Cutting Plane Algorithm for a Clustering Problem. Mathematical Programming, 45, 59–96.

    Article  MathSciNet  MATH  Google Scholar 

  • HOTHORN, T., LEISCH, F., ZEILEIS, A. and HORNIK, K. (2005): The Design and Analysis of Benchmark Experiments. Journal of Computational and Graphical Statistics, 14,3, 675–699.

    Article  MathSciNet  Google Scholar 

  • KEMENY, J.G. and SNELL, J.L. (1962): Mathematical Models in the Social Sciences, Chapter Preference Rankings: An Axiomatic Approach. MIT Press, Cambridge.

    MATH  Google Scholar 

  • MAKHORIN, A. (2006): GNU Linear Programming Kit (GLPK). Version 4.9.

    Google Scholar 

  • MARCOTORCHINO, F. and MICHAUD, P. (1982): Agregation de Similarites en Classification Automatique. Revue de Statistique Appliquée, XXX, 21–44.

    MathSciNet  Google Scholar 

  • MEYER, D., LEISCH, F. and HORNIK, K. (2003): The Support Vector Machine under Test. Neurocomputing, 55, 169–186.

    Article  Google Scholar 

  • MONJARDET, B. (1981): Metrics on Partially Ordered Set: A Survey. Discrete Mathematics, 35, 173–184.

    Article  MathSciNet  MATH  Google Scholar 

  • NEWMAN, D.J., HETTICH, S., BLAKE, C.L. and MERZ, C.J. (1998): UCI Repository of Machine Learning Databases.

    Google Scholar 

  • R DEVELOPMENT CORE TEAM (2005): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.

    Google Scholar 

  • RÉGNIER, S. (1965): Sur Quelques Aspects Mathématiques des Problèmes de Classification Automatique. ICC Bulletin, 175–191.

    Google Scholar 

  • WAKABAYASHI, Y. (1998): The Complexity of Computing Medians of Relations. Resenhas, 3,3, 323–349.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hornik, K., Meyer, D. (2007). Deriving Consensus Rankings from Benchmarking Experiments. In: Decker, R., Lenz, H.J. (eds) Advances in Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70981-7_19

Download citation

Publish with us

Policies and ethics