Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results

  • Tomas Kalibera
  • Petr Tuma
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4054)


Benchmarking as a method of assessing software performance is known to suffer from random fluctuations that distort the observed performance. In this paper, we focus on the fluctuations caused by compilation. We show that the design of a benchmarking experiment must reflect the existence of the fluctuations if the performance observed during the experiment is to be representative of reality.

We present a new statistical model of a benchmark experiment that reflects the presence of the fluctuations in compilation, execution and measurement. The model describes the observed performance and makes it possible to calculate the optimum dimensions of the experiment that yield the best precision within a given amount of time.

Using a variety of benchmarks, we evaluate the model within the context of regression benchmarking. We show that the model significantly decreases the number of erroneously detected performance changes in regression benchmarking.


performance evaluation benchmark precision random effects regression benchmarking 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Smith, C.U., Williams, L.G.: Performance Solutions: A Practical Guide to Creating Responsive, Scalable Software. Addison–Wesley, Reading (2001)Google Scholar
  2. 2.
    Kalibera, T., Bulej, L., Tuma, P.: Benchmark precision and random initial state. In: Proceedings of SPECTS 2005, SCS, pp. 853–862 (2005)Google Scholar
  3. 3.
    Kalibera, T., Bulej, L., Tuma, P.: Automated detection of performance regressions: The Mono experience. In: MASCOTS, pp. 183–190. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  4. 4.
    Bulej, L., Kalibera, T., Tuma, P.: Repeated results analysis for middleware regression benchmarking. Performance Evaluation 60, 345–358 (2005)CrossRefGoogle Scholar
  5. 5.
    Lo, S.L., Grisby, D., Riddoch, D., Weatherall, J., Scott, D., Richardson, T., Carroll, E., Evers, D., Meerwald, C.: Free high performance orb. (2006),
  6. 6.
    Novell, Inc.: The Mono Project (2006),
  7. 7.
    ECMA: ECMA-335: Common Language Infrastructure (CLI). ECMA (2002)Google Scholar
  8. 8.
    Distributed Systems Research Group: Mono regression benchmarking (2005),
  9. 9.
    Free Software Foundation: The gnu compiler collection (2006),
  10. 10.
    Gu, D., Verbrugge, C., Gagnon, E.: Code layout as a source of noise in JVM performance. In: Component And Middleware Performance Workshop, OOPSLA 2004 (2004)Google Scholar
  11. 11.
    Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer, New York (2004)zbMATHGoogle Scholar
  12. 12.
    Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley–Interscience, New York (1991)zbMATHGoogle Scholar
  13. 13.
    Buble, A., Bulej, L., Tuma, P.: CORBA benchmarking: A course with hidden obstacles. In: IPDPS, p. 279. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  14. 14.
    DOC Group: TAO performance scoreboard (2006),
  15. 15.
    Prochazka, M., Madan, A., Vitek, J., Liu, W.: RTJBench: A Real-Time Java Benchmarking Framework. In: Component And Middleware Performance Workshop, OOPSLA 2004 (2004)Google Scholar
  16. 16.
    Weisstein, E.W.: Mathworld–a wolfram web resource (2006),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tomas Kalibera
    • 1
  • Petr Tuma
    • 1
  1. 1.Distributed Systems Research Group, Department of Software Engineering, Faculty of Mathematics and PhysicsCharles UniversityPragueCzech Republic

Personalised recommendations