The (black) art of runtime evaluation: Are we comparing algorithms or implementations?
- First Online:
- Cite this article as:
- Kriegel, HP., Schubert, E. & Zimek, A. Knowl Inf Syst (2016). doi:10.1007/s10115-016-1004-2
Any paper proposing a new algorithm should come with an evaluation of efficiency and scalability (particularly when we are designing methods for “big data”). However, there are several (more or less serious) pitfalls in such evaluations. We would like to point the attention of the community to these pitfalls. We substantiate our points with extensive experiments, using clustering and outlier detection methods with and without index acceleration. We discuss what we can learn from evaluations, whether experiments are properly designed, and what kind of conclusions we should avoid. We close with some general recommendations but maintain that the design of fair and conclusive experiments will always remain a challenge for researchers and an integral part of the scientific endeavor.