Run-Time Automatic Performance Tuning for Multicore Applications

  • Thomas Karcher
  • Victor Pankratius
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6852)


Multicore hardware and system software have become complex and differ from platform to platform. Parallel application performance optimization and portability are now a real challenge. In practice, the effects of tuning parameters are hard to predict. Programmers face even more difficulties when several applications run in parallel and influence each other indirectly. We tackle these problems with Perpetuum, a novel operating-system-based auto-tuner that is capable of tuning applications while they are running. We go beyond tuning one application in isolation and are the first to employ OS-based auto-tuning to improve system-wide application performance. Our fully functional auto-tuner extends the Linux kernel, and the application tuning process does not require any user involvement. General multicore applications are automatically re-tuned on new platforms while they are executing, which makes portability easy. Extensive case studies with real applications demonstrate the feasibility and efficiency of our approach. Perpetuum realizes a first milestone in our vision to make every performance-critical multicore application auto-tuned by default.


Execution Time Block Size Parallel Application Performance Tune Tuning Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abudiab, I.: Online-tunable parallel edge detection in video streams. Student project thesis. Karlsruhe Institute of Technology (2010)Google Scholar
  2. 2.
    Agakov, F., et al.: Using machine learning to focus iterative optimization. In: CGO 2006, p. 11 (2006)Google Scholar
  3. 3.
    Agrawal, K., et al.: Adaptive scheduling with parallelism feedback. In: PPoPP 2006, p. 1 (2006)Google Scholar
  4. 4.
    Azimi, R., et al.: Enhancing operating system support for multicore processors by using hardware performance monitoring. SIGOPS Oper. Syst. Rev. 43(2), 56 (2009)CrossRefGoogle Scholar
  5. 5.
    Cavazos, J., Moss, J.E.B., O’Boyle, M.: Hybrid optimizations: Which optimization algorithm to use? In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 124–138. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Ţăpuş, C., et al.: Active harmony: towards automated performance tuning. In: SC 2002, p. 44 (2002)Google Scholar
  7. 7.
    Frigo, M., Johnson, S.: FFTW: an adaptive software architecture for the FFT. In: Proc. IEEE ICASSP 1998, vol. 3, p. 1381 (1998)Google Scholar
  8. 8.
    Goedegebure, S., et al.: Big buck bunny. An open source movie (April 2008), (last accessed May 2011)
  9. 9.
    Hartono, A., Ponnuswamy, S.: Annotation-based empirical performance tuning using Orio. In: IPDPS 2009, p. 1 (2009)Google Scholar
  10. 10.
    Intel: Threading building blocks (August 2006),
  11. 11.
    Karcher, T., et al.: Auto-tuning support for manycore applications: perspectives for operating systems and compilers. SIGOPS Oper. Syst. Rev. 43(2), 96 (2009); Special Iss. on the Interaction among the OS, Compilers, and Multicore ProcessorsCrossRefGoogle Scholar
  12. 12.
    Karcher, T., Pankratius, V.: Auto-Tuning Multicore Applications at Run-Time with a Cooperative Tuner. Karlsruhe Reports in Informatics 2011-4 (February 2011)Google Scholar
  13. 13.
    Mars, J., Hundt, R.: Scenario based optimization: A framework for statically enabling online optimizations. In: Proc. CGO 2009, p. 169 (2009)Google Scholar
  14. 14.
    Mars, J., et al.: Contention aware execution: online contention detection and response. In: Proc. CGO 2010, p. 257 (2010)Google Scholar
  15. 15.
    Morajko, A., et al.: Mate: Monitoring, analysis and tuning environment for parallel & distributed applications: Research articles. Concurr. Comput.: Pract. Exper. 19(11), 1517 (2007)CrossRefGoogle Scholar
  16. 16.
    Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Pankratius, V., et al.: Parallelizing bzip2: A case study in multicore software engineering. IEEE Software 26(6), 70 (2009)CrossRefGoogle Scholar
  18. 18.
    Puschel, M., et al.: Spiral: code generation for dsp transforms. Proceedings of the IEEE 93(2), 232 (2005)CrossRefGoogle Scholar
  19. 19.
    Schwedes, S.: Operating system integration of an automatic performance optimizer for parallel applications. Master’s thesis, Karlsruhe Institute of Technology (2009)Google Scholar
  20. 20.
    Seward, J.: Bzip2 (2011),
  21. 21.
    Tabatabaee, V., Hollingsworth, J.K.: Automatic software interference detection in parallel applications. In: SC 2007, vol. 1, p. 14 (2007)Google Scholar
  22. 22.
    Tabatabaee, V., et al.: Parallel parameter tuning for applications with performance variability. In: SC 2005, p. 57 (2005)Google Scholar
  23. 23.
    Tiwari, A., et al.: Tuning parallel applications in parallel. Parallel Comput. 35(8-9), 475 (2009)CrossRefGoogle Scholar
  24. 24.
    Whaley, C.R., et al.: Automated empirical optimizations of software and the atlas project. Parallel Computing 27(1-2), 3 (2001)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Thomas Karcher
    • 1
  • Victor Pankratius
    • 1
  1. 1.IPDKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations