Part of the Lecture Notes in Computer Science book series (LNCS, volume 9232)
Benchmarking and Resource Measurement
Proper benchmarking and resource measurement is an important topic, because benchmarking is a widely-used method for the comparative evaluation of tools and algorithms in many research areas. It is essential for researchers, tool developers, and users, as well as for competitions. We formulate a set of requirements that are indispensable for reproducible benchmarking and reliable resource measurement of automatic solvers, verifiers, and similar tools, and discuss limitations of existing methods and benchmarking tools. Fulfilling these requirements in a benchmarking framework is complex and can (on Linux) currently only be done by using the cgroups feature of the kernel. We provide , a ready-to-use, tool-independent, and free implementation of a benchmarking framework that fulfills all presented requirements, making reproducible benchmarking and reliable resource measurement easy. Our framework is able to work with a wide range of different tools and has proven its reliability and usefulness in the International Competition on Software Verification.
KeywordsMemory Usage Satisfiability Modulo Theory Memory Page Child Process Physical Core
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
We thank Hubert Garavel, Jiri Slaby, and Aaron Stump for their helpful comments regarding BenchKit, cgroups, and StarExec, respectively.
- 1.Balint, A., Belov, A., Heule, M., Järvisalo, M.: Proceedings of SAT competition 2013: Solver and benchmark descriptions. Technical report B-2013-1, University of Helsinki (2013)Google Scholar
- 3.Beyer, D.: Software verification and verifiable witnesses. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 401–416. Springer, Heidelberg (2015) Google Scholar
- 4.Beyer, D., Dresler, G., Wendler, P.: Software verification in the Google App-Engine Cloud. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 327–333. Springer, Heidelberg (2014) Google Scholar
- 5.Charwat, G., Ianni, G., Krennwallner, T., Kronegger, M., Pfandler, A., Redl, C., Schwengerer, M., Spendier, L.K., Wallner, J.P., Xiao, G.: VCWC: a versioning competition workflow compiler. In: Cabalar, P., Son, T.C. (eds.) LPNMR 2013. LNCS, vol. 8148, pp. 233–238. Springer, Heidelberg (2013) CrossRefGoogle Scholar
- 6.Handigol, N., Heller, B., Jeyakumar, V., Lantz, B., McKeown, N.: Reproducible network experiments using container-based emulation. In: CoNEXT 2012. pp. 253–264. ACM, New York (2012). http://www.dblp.org/rec/bibtex/conf/conext/HandigolHJLM12
- 7.JCGM Working Group 2. International vocabulary of metrology - basic and general concepts and associated terms (VIM), 3rd edn. Technical report JCGM 200:2012, BIPM (2012)Google Scholar
- 8.Kordon, F., Hulin-Hubard, F.: BenchKit, a tool for massive concurrent benchmarking. In: ACSD 2014. pp. 159–165. IEEE (2014)Google Scholar
- 9.Mytkowicz, T., Diwan, A., Hauswirth, M., Sweeney, P.F.: Producing wrong data without doing anything obviously wrong! In: ASPLOS, pp. 265–276. ACM, New York (2009). http://www.dblp.org/rec/bibtex/conf/asplos/MytkowiczDHS09
- 11.Singh, B., Srinivasan, V.: Containers: challenges with the memory resource controller and its performance. In: Ottawa Linux Symposium (OLS), p. 209. (2007)Google Scholar
- 12.Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: a cross-community infrastructure for logic solving. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) IJCAR 2014. LNCS, vol. 8562, pp. 367–373. Springer, Heidelberg (2014) Google Scholar
© Springer International Publishing Switzerland 2015