The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

Survey Paper

DOI: 10.1007/s10115-016-1004-2

Cite this article as:
Kriegel, HP., Schubert, E. & Zimek, A. Knowl Inf Syst (2016). doi:10.1007/s10115-016-1004-2


Any paper proposing a new algorithm should come with an evaluation of efficiency and scalability (particularly when we are designing methods for “big data”). However, there are several (more or less serious) pitfalls in such evaluations. We would like to point the attention of the community to these pitfalls. We substantiate our points with extensive experiments, using clustering and outlier detection methods with and without index acceleration. We discuss what we can learn from evaluations, whether experiments are properly designed, and what kind of conclusions we should avoid. We close with some general recommendations but maintain that the design of fair and conclusive experiments will always remain a challenge for researchers and an integral part of the scientific endeavor.


Methodology Efficiency evaluation Runtime experiments Implementation matters 

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.Institute for InformaticsLudwig-Maximilians-Universität MünchenMunichGermany
  2. 2.Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdense MDenmark

Personalised recommendations