Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels

Hammer, Julian; Eitzinger, Jan; Hager, Georg; Wellein, Gerhard

doi:10.1007/978-3-319-56702-0_1

Julian Hammer⁷,
Jan Eitzinger⁷,
Georg Hager⁷ &
…
Gerhard Wellein⁷

650 Accesses
15 Citations

Abstract

Achieving optimal program performance requires deep insight into the interaction between hardware and software. For software developers without an in-depth background in computer architecture, understanding and fully utilizing modern architectures is close to impossible. Analytic loop performance modeling is a useful way to understand the relevant bottlenecks of code execution based on simple machine models. The Roofline Model and the Execution-Cache-Memory (ECM) model are proven approaches to performance modeling of loop nests. In comparison to the Roofline model, the ECM model can also describes the single-core performance and saturation behavior on a multicore chip.We give an introduction to the Roofline and ECM models, and to stencil performance modeling using layer conditions (LC). We then present Kerncraft, a tool that can automatically construct Roofline and ECM models for loop nests by performing the required code, data transfer, and LC analysis. The layer condition analysis allows to predict optimal spatial blocking factors for loop nests. Together with the models it enables an ab-initio estimate of the potential benefits of loop blocking optimizations and of useful block sizes. In cases where LC analysis is not easily possible, Kerncraft supports a cache simulator as a fallback option. Using a 25-point long-range stencil we demonstrate the usefulness and predictive power of the Kerncraft tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/RRZE-HPC/kerncraft/tree/master/examples/kernels.
2.
https://github.com/RRZE-HPC/kerncraft/tree/master/examples/machine-files.
3.
Kerncraft currently only supports Intel Xeon and Core architectures, but pycachesim has been developed with other architectures in mind.

References

Djoudi, L., Barthou, D., Carribault, P., Lemuet, C., Acquaviva, J.T., Jalby, W., et al.: MAQAO: modular assembler quality analyzer and optimizer for itanium 2. In: The 4th Workshop on EPIC architectures and compiler technology, San Jose (2005). http://www.prism.uvsq.fr/users/bad/Research/ps/maqao.pdf
Evans, C., Ingerson, B., Ben-Kiki, O.: YAML Ain’t Markup Language (2001). http://yaml.org
Google Scholar
Grosser, T., Groesslinger, A., Lengauer, C.: Polly – performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22 (04), 1250010 (2012). doi:10.1142/S0129626412500107
Article MathSciNet Google Scholar
Hammer, J.: Layer conditions (2016). https://rrze-hpc.github.io/layer-condition/
Google Scholar
Hammer, J.: pycachesim – a single-core cache hierarchy simulator written in python (2015). https://github.com/RRZE-HPC/pycachesim
Hammer, J., Hager, G., Eitzinger, J., Wellein, G.: Automatic loop kernel analysis and performance modeling with kerncraft. In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, PMBS ’15, pp. 4:1–4:11. ACM, New York (2015). doi:10.1145/2832087.2832092
Hockney, R.W., Curington, I.J.: f _1∕2: a parameter to characterize memory and communication bottlenecks. Parallel Comput. 10 (3), 277–286 (1989). doi:10.1016/0167-8191(89)90100-2
Hofmann, J., Fey, D., Riedmann, M., Eitzinger, J., Hager, G., Wellein, G.: Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors. Concurr. Comput. Pract. Exper. (2016). doi:10.1002/cpe.3921
Google Scholar
Intel Architecture Code Analyzer. https://software.intel.com/en-us/articles/intel-architecture-code-analyzer. https://software.intel.com/en-us/articles/intel-architecture-code-analyzer
ISO: ISO C Standard 1999. Technical Report (1999). http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf. ISO/IEC 9899:1999 draft
Kerncraft toolkit (2015). https://github.com/RRZE-HPC/kerncraft
Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems. Int. J. Parallel Prog. 1–27 (2016). doi:10.1007/s10766-016-0464-z
Lo, Y., Williams, S., Van Straalen, B., Ligocki, T., Cordery, M., Wright, N., Hall, M., Oliker, L.: Roofline model toolkit: a practical tool for architectural and program analysis. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. Lecture Notes in Computer Science, vol. 8966, pp. 129–148. Springer International Publishing, Berlin (2015). doi: 10.1007/978-3-319-17248-4_7
Google Scholar
McCalpin, J.D.: STREAM: sustainable memory bandwidth in high performance computers. Technical Report, University of Virginia, Charlottesville, VA (1991–2007). http://www.cs.virginia.edu/stream/. A continually updated technical report
Narayanan, S.H.K., Norris, B., Hovland, P.D.: Generating performance bounds from source code. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 197–206 (2010). doi:10.1109/ICPPW.2010.37
Rivera, G., Tseng, C.W.: Tiling optimizations for 3D scientific computations. In: Supercomputing, ACM/IEEE 2000 Conference, pp. 32–32 (2000). doi:10.1109/SC.2000.10015
Stengel, H., Treibig, J., Hager, G., Wellein, G.: Quantifying performance bottlenecks of stencil computations using the execution-cache-memory model. In: Proceedings of the 29th ACM International Conference on Supercomputing, ICS ’15, pp. 207–216. ACM, New York (2015). doi:10.1145/2751205.2751240
SymPy Development Team: SymPy: python library for symbolic mathematics (2016). http://www.sympy.org
Treibig, J., Hager, G., Wellein, G.: Likwid: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, San Diego, CA (2010)
Google Scholar
Unat, D., Chan, C., Zhang, W., Williams, S., Bachan, J., Bell, J., Shalf, J.: ExaSAT: an exascale co-design tool for performance modeling. Int. J. High Perform. Comput. Appl. 29 (2), 209–232 (2015). doi:10.1177/1094342014568690
Article Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52 (4), 65–76 (2009). doi:10.1145/1498765.1498785
Article Google Scholar
Wittmann, M., Hager, G., Zeiser, T., Treibig, J., Wellein, G.: Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations. Concurrency Comput. Pract. Exper. 28 (7), 2295–2315 (2016). doi:10.1002/cpe.3489
Article Google Scholar

Download references

Acknowledgements

This work was in part funded by the German Academic Exchange Service’s (DAAD) FITweltweit program and the Federal Ministry of Education and Research (BMBF) SKAMPY grant.

Author information

Authors and Affiliations

Erlangen Regional Computing Center, Erlangen, Germany
Julian Hammer, Jan Eitzinger, Georg Hager & Gerhard Wellein

Authors

Julian Hammer
View author publications
You can also search for this author in PubMed Google Scholar
Jan Eitzinger
View author publications
You can also search for this author in PubMed Google Scholar
Georg Hager
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Wellein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Hammer .

Editor information

Editors and Affiliations

Höchstleistungszentrum Stuttgart (HLRS), Universität Stuttgart , Stuttgart, Germany
Christoph Niethammer
Höchstleistungszentrum Stuttgart (HLRS), Universität Stuttgart , Stuttgart, Germany
José Gracia
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden , Dresden, Germany
Tobias Hilbrich
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Andreas Knüpfer
Höchstleistungszentrum Stuttgart (HLRS), Universität Stuttgart , Stuttgart, Germany
Michael M. Resch
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden , Dresden, Germany
Wolfgang E. Nagel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hammer, J., Eitzinger, J., Hager, G., Wellein, G. (2017). Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels. In: Niethammer, C., Gracia, J., Hilbrich, T., Knüpfer, A., Resch, M., Nagel, W. (eds) Tools for High Performance Computing 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-56702-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-56702-0_1
Published: 09 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56701-3
Online ISBN: 978-3-319-56702-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics