Abstract
Today’s computer architectures employ fast cache memories in order to hide both the low main memory bandwidth and the latency of main memory accesses, which is slow in contrast to the floating-point performance of the CPUs. Efficient program execution can only be achieved, if the codes respect the hierarchical memory design. Iterative methods for linear systems of equations are characterized by successive sweeps over data sets, which are much too large to fit in cache. Standard implementations of these methods thus do not perform efficiently on cache-based machines. In this paper we present techniques to enhance the cache utilization of multigrid methods on regular mesh structures in 3D as well as various performance results. Most of these techniques extend our previous work on 2D problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? In Proceedings of the 16th ACM Symposium on Operating System Principles, pages 1–14, St. Malo, France, October 1997.
F. Bassetti, K. Davis, and D. Quinlan. Temporal Locality Optimizations for Stencil Operations within Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures. In Proc. of the International Conf. on Parallel and Distributed Computing and Systems, pages 145–153, Las Vegas, Nevada, USA, October 1998.
C.C. Douglas. Caching in With Multigrid Algorithms: Problems in Two Dimensions. Parallel Algorithms and Applications, 9:195–204, 1996.
C.C. Douglas, J. Hu, M. Kowarschik, U. Rüde, and C. Weiß. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21–40, February 2000.
M. Frigo and S.G. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), volume 3, pages 1381–1384, May 1998.
W.D. Gropp, D.K. Kaushik, D.E. Keyes, and B.F. Smith. High Performance Parallel Implicit CFD. Parallel Computing, 27(4):337–362, March 2001.
J. Handy. The Cache Memory Book. Academic Press, second edition, 1998.
M. Kowarschik and C. Weiß. DiMEPACK — A Cache-Optimized Multigrid Library. In H.R. Arabnia, editor, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), volume I, pages 425–430, Las Vegas, NV, USA, June 2001. CSREA Press.
M. Kowarschik, C. Weiß, and U. Rüde. Data Layout Optimizations for Variable Coefficient Multigrid. In Proceedings of the 2002 International Conference on Computational Science, Lecture Notes in Computer Science, Amsterdam, The Netherlands, April 2002. Springer. to appear.
D. Loshin. Efficient Memory Programming. McGraw-Hill, 1998.
H. Lötzbeyer and U. Rüde. Patch-Adaptive Multilevel Iteration. BIT, 37(3):739–758, 1997.
G. Rivera. Compiler Optimizations for Avoiding Cache Conflict Misses. PhD thesis, Dept. of Computer Science, University of Maryland, College Park, MD, USA, 2001.
G. Rivera and C.-W. Tseng. Tiling Optimizations for 3D Scientific Computation. In Proceedings of the ACM/IEEE Supercomputing 2000 Conference (SC2000), Dallas, TX, USA, November 2000.
U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, 2001.
C. Weiß. Data Locality Optimizations for Multigrid Methods on Structured Grids. PhD thesis, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Munich, Germany, December 2001.
C. Weiß, W. Karl, M. Kowarschik, and U. Rüde. Memory Characteristics of Iterative Methods. In Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, November 1999.
R.C. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software. In Proceedings of the International Conference on Supercomputing, Orlando, Florida, USA, November 1998.
M.E. Wolf and M.S. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the SIGPLAN’91 Symposium on Programming Language Design and Implementation, volume 26 of SIGPLAN Notices, pages 33–44, Toronto, Canada, June 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kowarschik, M., Rüde, U., Thürey, N., Weiß, C. (2002). Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures. In: Fagerholm, J., Haataja, J., Järvinen, J., Lyly, M., Råback, P., Savolainen, V. (eds) Applied Parallel Computing. PARA 2002. Lecture Notes in Computer Science, vol 2367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48051-X_31
Download citation
DOI: https://doi.org/10.1007/3-540-48051-X_31
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43786-4
Online ISBN: 978-3-540-48051-8
eBook Packages: Springer Book Archive