Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures

  • Markus Kowarschik
  • Ulrich Rüde
  • Nils Thürey
  • Christian Weiß
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2367)


Today’s computer architectures employ fast cache memories in order to hide both the low main memory bandwidth and the latency of main memory accesses, which is slow in contrast to the floating-point performance of the CPUs. Efficient program execution can only be achieved, if the codes respect the hierarchical memory design. Iterative methods for linear systems of equations are characterized by successive sweeps over data sets, which are much too large to fit in cache. Standard implementations of these methods thus do not perform efficiently on cache-based machines. In this paper we present techniques to enhance the cache utilization of multigrid methods on regular mesh structures in 3D as well as various performance results. Most of these techniques extend our previous work on 2D problems.


Multigrid Method Black Point Data Layout Instruction Cache Adjacent Plane 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J.M. Anderson, L.M. Berc, J. Dean, S. Ghemawat, M.R. Henzinger, S.A. Leung, R.L. Sites, M.T. Vandevoorde, C.A. Waldspurger, and W.E. Weihl. Continuous Profiling: Where Have All the Cycles Gone? In Proceedings of the 16th ACM Symposium on Operating System Principles, pages 1–14, St. Malo, France, October 1997.Google Scholar
  2. 2.
    F. Bassetti, K. Davis, and D. Quinlan. Temporal Locality Optimizations for Stencil Operations within Parallel Object-Oriented Scientific Frameworks on Cache-Based Architectures. In Proc. of the International Conf. on Parallel and Distributed Computing and Systems, pages 145–153, Las Vegas, Nevada, USA, October 1998.Google Scholar
  3. 3.
    C.C. Douglas. Caching in With Multigrid Algorithms: Problems in Two Dimensions. Parallel Algorithms and Applications, 9:195–204, 1996.zbMATHCrossRefGoogle Scholar
  4. 4.
    C.C. Douglas, J. Hu, M. Kowarschik, U. Rüde, and C. Weiß. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transactions on Numerical Analysis, 10:21–40, February 2000.Google Scholar
  5. 5.
    M. Frigo and S.G. Johnson. FFTW: An Adaptive Software Architecture for the FFT. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’98), volume 3, pages 1381–1384, May 1998.Google Scholar
  6. 6.
    W.D. Gropp, D.K. Kaushik, D.E. Keyes, and B.F. Smith. High Performance Parallel Implicit CFD. Parallel Computing, 27(4):337–362, March 2001.Google Scholar
  7. 7.
    J. Handy. The Cache Memory Book. Academic Press, second edition, 1998.Google Scholar
  8. 8.
    M. Kowarschik and C. Weiß. DiMEPACK — A Cache-Optimized Multigrid Library. In H.R. Arabnia, editor, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), volume I, pages 425–430, Las Vegas, NV, USA, June 2001. CSREA Press.Google Scholar
  9. 9.
    M. Kowarschik, C. Weiß, and U. Rüde. Data Layout Optimizations for Variable Coefficient Multigrid. In Proceedings of the 2002 International Conference on Computational Science, Lecture Notes in Computer Science, Amsterdam, The Netherlands, April 2002. Springer. to appear.Google Scholar
  10. 10.
    D. Loshin. Efficient Memory Programming. McGraw-Hill, 1998.Google Scholar
  11. 11.
    H. Lötzbeyer and U. Rüde. Patch-Adaptive Multilevel Iteration. BIT, 37(3):739–758, 1997.MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    G. Rivera. Compiler Optimizations for Avoiding Cache Conflict Misses. PhD thesis, Dept. of Computer Science, University of Maryland, College Park, MD, USA, 2001.Google Scholar
  13. 13.
    G. Rivera and C.-W. Tseng. Tiling Optimizations for 3D Scientific Computation. In Proceedings of the ACM/IEEE Supercomputing 2000 Conference (SC2000), Dallas, TX, USA, November 2000.Google Scholar
  14. 14.
    U. Trottenberg, C. Oosterlee, and A. Schüller. Multigrid. Academic Press, 2001.Google Scholar
  15. 15.
    C. Weiß. Data Locality Optimizations for Multigrid Methods on Structured Grids. PhD thesis, Lehrstuhl für Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Munich, Germany, December 2001.Google Scholar
  16. 16.
    C. Weiß, W. Karl, M. Kowarschik, and U. Rüde. Memory Characteristics of Iterative Methods. In Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, November 1999.Google Scholar
  17. 17.
    R.C. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software. In Proceedings of the International Conference on Supercomputing, Orlando, Florida, USA, November 1998.Google Scholar
  18. 18.
    M.E. Wolf and M.S. Lam. A Data Locality Optimizing Algorithm. In Proceedings of the SIGPLAN’91 Symposium on Programming Language Design and Implementation, volume 26 of SIGPLAN Notices, pages 33–44, Toronto, Canada, June 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Markus Kowarschik
    • 1
  • Ulrich Rüde
    • 1
  • Nils Thürey
    • 1
  • Christian Weiß
    • 2
  1. 1.Lehrstuhl für Systemsimulation (Informatik 10) Institut für InformatikFriedrich-Alexander-Universität Erlangen-NürnbergGermany
  2. 2.Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM) Fakultät für InformatikTechnische Universität MünchenGermany

Personalised recommendations