Skip to main content

Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2367))

Abstract

Today’s computer architectures employ fast cache memories in order to hide both the low main memory bandwidth and the latency of main memory accesses, which is slow in contrast to the floating-point performance of the CPUs. Efficient program execution can only be achieved, if the codes respect the hierarchical memory design. Iterative methods for linear systems of equations are characterized by successive sweeps over data sets, which are much too large to fit in cache. Standard implementations of these methods thus do not perform efficiently on cache-based machines. In this paper we present techniques to enhance the cache utilization of multigrid methods on regular mesh structures in 3D as well as various performance results. Most of these techniques extend our previous work on 2D problems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? In Proceedings of the 16th ACM Symposium on Operating System Principles, pages 1–14, St. Malo, France, October 1997.

    Google Scholar 

  2. F. Bassetti, K. Davis, and D. Quinlan. Temporal Locality Optimizations for Stencil Operations within Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures. In Proc. of the International Conf. on Parallel and Distributed Computing and Systems, pages 145–153, Las Vegas, Nevada, USA, October 1998.

    Google Scholar 

  3. C.C. Douglas. Caching in With Multigrid Algorithms: Problems in Two Dimensions. Parallel Algorithms and Applications, 9:195–204, 1996.

    Article  MATH  Google Scholar 

  4. C.C. Douglas, J. Hu, M. Kowarschik, U. Rüde, and C. Weiß. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21–40, February 2000.

    Google Scholar 

  5. M. Frigo and S.G. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), volume 3, pages 1381–1384, May 1998.

    Google Scholar 

  6. W.D. Gropp, D.K. Kaushik, D.E. Keyes, and B.F. Smith. High Performance Parallel Implicit CFD. Parallel Computing, 27(4):337–362, March 2001.

    Google Scholar 

  7. J. Handy. The Cache Memory Book. Academic Press, second edition, 1998.

    Google Scholar 

  8. M. Kowarschik and C. Weiß. DiMEPACK — A Cache-Optimized Multigrid Library. In H.R. Arabnia, editor, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), volume I, pages 425–430, Las Vegas, NV, USA, June 2001. CSREA Press.

    Google Scholar 

  9. M. Kowarschik, C. Weiß, and U. Rüde. Data Layout Optimizations for Variable Coefficient Multigrid. In Proceedings of the 2002 International Conference on Computational Science, Lecture Notes in Computer Science, Amsterdam, The Netherlands, April 2002. Springer. to appear.

    Google Scholar 

  10. D. Loshin. Efficient Memory Programming. McGraw-Hill, 1998.

    Google Scholar 

  11. H. Lötzbeyer and U. Rüde. Patch-Adaptive Multilevel Iteration. BIT, 37(3):739–758, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  12. G. Rivera. Compiler Optimizations for Avoiding Cache Conflict Misses. PhD thesis, Dept. of Computer Science, University of Maryland, College Park, MD, USA, 2001.

    Google Scholar 

  13. G. Rivera and C.-W. Tseng. Tiling Optimizations for 3D Scientific Computation. In Proceedings of the ACM/IEEE Supercomputing 2000 Conference (SC2000), Dallas, TX, USA, November 2000.

    Google Scholar 

  14. U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, 2001.

    Google Scholar 

  15. C. Weiß. Data Locality Optimizations for Multigrid Methods on Structured Grids. PhD thesis, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Munich, Germany, December 2001.

    Google Scholar 

  16. C. Weiß, W. Karl, M. Kowarschik, and U. Rüde. Memory Characteristics of Iterative Methods. In Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, November 1999.

    Google Scholar 

  17. R.C. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software. In Proceedings of the International Conference on Supercomputing, Orlando, Florida, USA, November 1998.

    Google Scholar 

  18. M.E. Wolf and M.S. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the SIGPLAN’91 Symposium on Programming Language Design and Implementation, volume 26 of SIGPLAN Notices, pages 33–44, Toronto, Canada, June 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kowarschik, M., Rüde, U., Thürey, N., Weiß, C. (2002). Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures. In: Fagerholm, J., Haataja, J., Järvinen, J., Lyly, M., Råback, P., Savolainen, V. (eds) Applied Parallel Computing. PARA 2002. Lecture Notes in Computer Science, vol 2367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48051-X_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-48051-X_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43786-4

  • Online ISBN: 978-3-540-48051-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics