Applicability of the ECM Performance Model to Explicit ODE Methods on Current Multi-core Processors

  • Johannes Seiferth
  • Christie Alappat
  • Matthias Korch
  • Thomas Rauber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10876)


To support the portability of efficiency when bringing an application from scientific computing to a new HPC system, autotuning techniques are promising approaches. Ideally, these approaches are able to derive an efficient implementation for a specific HPC system by applying suitable program transformations. Often, a large number of implementations results, and the most efficient of these variants should be selected. In this article, we investigate performance modelling and prediction techniques which can support the selection process. These techniques may significantly reduce the selection effort, compared to extensive runtime tests. We apply the execution-cache-memory (ECM) performance model to numerical solution methods for ordinary differential equations (ODEs). In particular, we consider the question whether it is possible to obtain a performance prediction for the resulting implementation variants to support the variant selection. We investigate the accuracy of the prediction for different ODEs and different hardware platforms and show that the prediction is able to reliably select a set of fast variants and, thus, to limit the search space for possible later empirical tuning.


Performance model ECM model Performance prediction Variant selection Multicore 



This work is supported by the German Ministry of Science and Education (BMBF) under project number 01IH16012A. Discussions with Julian Hammer (RRZE) are gratefully acknowledged.


  1. 1.
    Whaley, R.C., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. Technical report, University of Tennessee (1999)Google Scholar
  2. 2.
    Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable high-performance, ANSI C coding methodology. In: Proceedings of the 11th International Conference on Supercomputing (ICS 1997), pp. 340–347. ACM (1997)Google Scholar
  3. 3.
    Tiwari, A., Hollingsworth, J.K.: Online adaptive code generation and tuning. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 879–892. IEEE (2011)Google Scholar
  4. 4.
    Gerndt, M., César, E., Benkner, S. (eds.): Automatic Tuning of HPC Applications - The Periscope Tuning Framework. Shaker Verlag, Aachen (2015)Google Scholar
  5. 5.
    Hairer, E., Nørsett, S., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd edn. Springer, Heidelberg (2000). Scholar
  6. 6.
    Tikir, M.M., Hollingsworth, J.K.: Using hardware counters to automatically improve memory performance. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2004, p. 46. IEEE Computer Society (2004)Google Scholar
  7. 7.
    Tallent, N.R., Mellor-Crummey, J.M.: Effective performance measurement and analysis of multithreaded applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice Parallel Programming, PPoPP 2009, pp. 229–240. ACM (2009)Google Scholar
  8. 8.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  9. 9.
    Tang, L., Hu, X.S., Barrett, R.F.: Perdome: a performance model for heterogeneous computing systems. In: Proceedings of the Symposium on High Performance Computing, HPC 2015, pp. 225–232. Society for Computer Simulation International (2015)Google Scholar
  10. 10.
    Treibig, J., Hager, G.: Introducing a performance model for bandwidth-limited loop kernels. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2009. LNCS, vol. 6067, pp. 615–624. Springer, Heidelberg (2010). Scholar
  11. 11.
    Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS 2015, pp. 207–216. ACM (2015)Google Scholar
  12. 12.
    Luszczek, P., Gates, M., Kurzak, J., Danalis, A., Dongarra, J.: Search space generation and pruning system for autotuners. In: IEEE International Parallel Distributed Processing Symposium on Workshops, IPDPSW 2016, pp. 1545–1554, May 2016Google Scholar
  13. 13.
    Feng, W., Abdelrahman, T.S.: A sampling based strategy to automatic performance tuning of GPU programs. In: IEEE International Parallel Distributed Processing Symposium on Workshops, IPDPSW 2017, pp. 1342–1349. IEEE Computer Society, May 2017Google Scholar
  14. 14.
    Luo, Y., Tan, G., Mo, Z., Sun, N.: FAST: a fast stencil autotuning framework based on an optimal-solution space model. In: Proceedings of the 29th ACM International Conference on Supercomupting, ICS 2015, pp. 187–196. ACM, June 2015Google Scholar
  15. 15.
    Bei, Z., Yu, Z., Zhang, H., Xiong, W., Xu, C., Eeckhout, L., Feng, S.: RFHOC: a random-forest approach to auto-tuning Hadoop’s configuration. IEE Trans. Parallel Distrib. Syst. 27(5), 1470–1483 (2016)CrossRefGoogle Scholar
  16. 16.
    Nørsett, S.P., Simonsen, H.H.: Aspects of parallel Runge-Kutta methods. In: Bellen, A., Gear, C.W., Russo, E. (eds.) Numerical Methods for Ordinary Differential Equations. LNM, vol. 1386, pp. 103–117. Springer, Heidelberg (1989). Scholar
  17. 17.
    van der Houwen, P.J., Sommeijer, B.P.: Parallel iteration of high-order Runge-Kutta methods with stepsize control. J. Comput. Appl. Math. 29, 111–127 (1990)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Burrage, K.: Parallel and Sequential Methods for Ordinary Differential Equations. Oxford Science Publications, Oxford (1995)zbMATHGoogle Scholar
  19. 19.
    Schmitt, B.A.: Peer methods with improved embedded sensitivities for parameter-dependent ODEs. J. Comput. Appl. Math. 256, 242–253 (2014)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation High Performance Computing Systems, PMBS 2015, pp. 4:1–4:11. ACM (2015)Google Scholar
  21. 21.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the 39th International Conference on Parallel Processing Workshops, ICPPW 2010, pp. 207–216. IEEE Computer Society (2010)Google Scholar
  22. 22.
    Israel, H., Gideon, S.: Intel architecture code analysis.
  23. 23.
    Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, 2 rev. edn. Springer, Heidelberg (2002). Scholar
  24. 24.
    Bartel, A., Günther, M., Pulch, R., Rentrop, P.: Numerical techniques for different time scales in electric circuit simulation. In: Breuer, M., Durst, F., Zenger, C. (eds.) High Performance Scientific and Engineering Computing. LNCSE, vol. 21, pp. 343–360. Springer, Heidelberg (2002). Scholar
  25. 25.
    Mazzia, F., Magherini, C., Kierzenka, J.: Test Set for Initial Value Problem Solvers, Release 2.4, February 2008.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Johannes Seiferth
    • 1
  • Christie Alappat
    • 2
  • Matthias Korch
    • 1
  • Thomas Rauber
    • 1
  1. 1.Department of Computer ScienceUniversity of BayreuthBayreuthGermany
  2. 2.Erlangen Regional Computing Center (RRZE)Friedrich-Alexander University of Erlangen-NurembergErlangenGermany

Personalised recommendations