Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures

Kowarschik, Markus; Rüde, Ulrich; Thürey, Nils; Weiß, Christian

doi:10.1007/3-540-48051-X_31

Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures

Markus Kowarschik⁵,
Ulrich Rüde⁵,
Nils Thürey⁵ &
…
Christian Weiß⁶

Conference paper
First Online: 01 January 2002

497 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2367))

Abstract

Today’s computer architectures employ fast cache memories in order to hide both the low main memory bandwidth and the latency of main memory accesses, which is slow in contrast to the floating-point performance of the CPUs. Efficient program execution can only be achieved, if the codes respect the hierarchical memory design. Iterative methods for linear systems of equations are characterized by successive sweeps over data sets, which are much too large to fit in cache. Standard implementations of these methods thus do not perform efficiently on cache-based machines. In this paper we present techniques to enhance the cache utilization of multigrid methods on regular mesh structures in 3D as well as various performance results. Most of these techniques extend our previous work on 2D problems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? In Proceedings of the 16th ACM Symposium on Operating System Principles, pages 1–14, St. Malo, France, October 1997.
Google Scholar
F. Bassetti, K. Davis, and D. Quinlan. Temporal Locality Optimizations for Stencil Operations within Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures. In Proc. of the International Conf. on Parallel and Distributed Computing and Systems, pages 145–153, Las Vegas, Nevada, USA, October 1998.
Google Scholar
C.C. Douglas. Caching in With Multigrid Algorithms: Problems in Two Dimensions. Parallel Algorithms and Applications, 9:195–204, 1996.
Article MATH Google Scholar
C.C. Douglas, J. Hu, M. Kowarschik, U. Rüde, and C. Weiß. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21–40, February 2000.
Google Scholar
M. Frigo and S.G. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), volume 3, pages 1381–1384, May 1998.
Google Scholar
W.D. Gropp, D.K. Kaushik, D.E. Keyes, and B.F. Smith. High Performance Parallel Implicit CFD. Parallel Computing, 27(4):337–362, March 2001.
Google Scholar
J. Handy. The Cache Memory Book. Academic Press, second edition, 1998.
Google Scholar
M. Kowarschik and C. Weiß. DiMEPACK — A Cache-Optimized Multigrid Library. In H.R. Arabnia, editor, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), volume I, pages 425–430, Las Vegas, NV, USA, June 2001. CSREA Press.
Google Scholar
M. Kowarschik, C. Weiß, and U. Rüde. Data Layout Optimizations for Variable Coefficient Multigrid. In Proceedings of the 2002 International Conference on Computational Science, Lecture Notes in Computer Science, Amsterdam, The Netherlands, April 2002. Springer. to appear.
Google Scholar
D. Loshin. Efficient Memory Programming. McGraw-Hill, 1998.
Google Scholar
H. Lötzbeyer and U. Rüde. Patch-Adaptive Multilevel Iteration. BIT, 37(3):739–758, 1997.
Article MathSciNet MATH Google Scholar
G. Rivera. Compiler Optimizations for Avoiding Cache Conflict Misses. PhD thesis, Dept. of Computer Science, University of Maryland, College Park, MD, USA, 2001.
Google Scholar
G. Rivera and C.-W. Tseng. Tiling Optimizations for 3D Scientific Computation. In Proceedings of the ACM/IEEE Supercomputing 2000 Conference (SC2000), Dallas, TX, USA, November 2000.
Google Scholar
U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, 2001.
Google Scholar
C. Weiß. Data Locality Optimizations for Multigrid Methods on Structured Grids. PhD thesis, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Munich, Germany, December 2001.
Google Scholar
C. Weiß, W. Karl, M. Kowarschik, and U. Rüde. Memory Characteristics of Iterative Methods. In Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, November 1999.
Google Scholar
R.C. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software. In Proceedings of the International Conference on Supercomputing, Orlando, Florida, USA, November 1998.
Google Scholar
M.E. Wolf and M.S. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the SIGPLAN’91 Symposium on Programming Language Design and Implementation, volume 26 of SIGPLAN Notices, pages 33–44, Toronto, Canada, June 1991.
Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Systemsimulation (Informatik 10) Institut für Informatik, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
Markus Kowarschik, Ulrich Rüde & Nils Thürey
Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM) Fakultät für Informatik, Technische Universität München, Germany
Christian Weiß

Authors

Markus Kowarschik
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Rüde
View author publications
You can also search for this author in PubMed Google Scholar
Nils Thürey
View author publications
You can also search for this author in PubMed Google Scholar
Christian Weiß
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CSC, P.O. Box 405, 02101, Espoo, Finland
Juha Fagerholm , Juha Haataja , Jari Järvinen , Mikko Lyly , Peter Råback & Ville Savolainen , , , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kowarschik, M., Rüde, U., Thürey, N., Weiß, C. (2002). Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures. In: Fagerholm, J., Haataja, J., Järvinen, J., Lyly, M., Råback, P., Savolainen, V. (eds) Applied Parallel Computing. PARA 2002. Lecture Notes in Computer Science, vol 2367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48051-X_31

Download citation

DOI: https://doi.org/10.1007/3-540-48051-X_31
Published: 04 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43786-4
Online ISBN: 978-3-540-48051-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics