Data Layout Optimizations for Variable Coefficient Multigrid

  • Markus Kowarschik
  • Ulrich Rüde
  • Christian Weiß
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2331)


Efficient program execution can only be achieved if the codes respect the hierarchical memory design of the underlying architectures; programs must exploit caches to avoid high latencies involved with main memory accesses. However, iterative methods like multigrid are characterized by successive sweeps over data sets, which are commonly too large to fit in cache.

This paper is based on our previous work on data access transformations for multigrid methods for constant coefficient problems. However, the case of variable coefficients, which we consider here, requires more complex data structures.

We focus on data layout techniques to enhance the cache efficiency of multigrid codes for variable coefficient problems on regular meshes. We provide performance results which illustrate the effectiveness of our layout optimizations in conjunction with data access transformations.


Multigrid Method Storage Scheme Data Layout Loop Fusion Padding Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    F. Bassetti, K. Davis, AND D. Quinlan, Temporal Locality Optimizations for Stencil Operations within Parallel Object—Oriented Scientific Frameworks on Cache-Based Architectures, in Proc. of the International Conf. on Parallel and Distributed Computing and Systems, Las Vegas, Nevada, USA, Oct. 1998, pp. 145–153.Google Scholar
  2. 2.
    R. Berrendorf, PCL — The Performance Counter Library: A Common Interface to Access Hardware Performance Counters on Microprocessors (Version 2.0), Forschungszentrum Juelich GmbH, Germany,, Sept. 2000.Google Scholar
  3. 3.
    C. Douglas, Caching in With Multigrid Algorithms: Problems in Two Dimensions, Parallel Algorithms and Applications, 9 (1996), pp. 195–204.zbMATHGoogle Scholar
  4. 4.
    C. Douglas, J. Hu, M. Kowarschik, U. Rude, AND C. Weiss, Cache Optimization for Structured and Unstructured Grid Multigrid, Electronic Transactions on Numerical Analysis, 10 (2000), pp. 21–40.zbMATHMathSciNetGoogle Scholar
  5. 5.
    D. Genius AND S. Lelait, A Case for Array Merging in Memory Hierarchies, in Proceedings of the 9th Workshop on Compilers for Parallel Computers (CPC’01), Edinburgh, Scotland, June 2001.Google Scholar
  6. 6.
    W. Gropp, D. Kaushik, D. Keyes, AND B. Smith, High Performance Parallel Implicit CFD, Parallel Computing, 27 (2001), pp. 337–362.zbMATHCrossRefGoogle Scholar
  7. 7.
    J. L. Hennessy AND D. A. Patterson, Computer Architecture — A Quantitative Approach, Morgan Kaufmann Publishers, second ed., 1996.Google Scholar
  8. 8.
    J. Hu, Cache Based Multigrid on Unstructured Grids in Two and Three Dimensions, PhD thesis, Department of Mathematics, University of Kentucky, 2000.Google Scholar
  9. 9.
    M. Kowarschik, U. Rude, C. Weis, AND W. Karl, Cache—Aware Multigrid Methods for Solving Poisson’s Equation in Two Dimensions, Computing, 64 (2000), pp. 381–399.zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    H. Lotzbeyer AND U. Rüde, Patch—Adaptive Multilevel Iteration, BIT, 37 (1997), pp. 739–758.CrossRefMathSciNetGoogle Scholar
  11. 11.
    H. Pfänder, Cache—optimierte Mehrgitterverfahren mit variablen Koeffizienten auf strukturierten Gittern, Master’s thesis, Department of Computer Science, University of Erlangen-Nuremberg, Germany, 2000.Google Scholar
  12. 12.
    G. Rivera AND C.-W. Tseng, Data Transformations for Eliminating Conflict Misses, in Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98), Montreal, Canada, June 1998.Google Scholar
  13. 13.
    G. Rivera AND C.-W. Tseng, Tiling Optimizations for 3D Scientific Computation, in Proceedings of the ACM/IEEE SC00 Conference, Dallas, Texas, USA, Nov. 2000.Google Scholar
  14. 14.
    U. Rüde, Iterative Algorithms on High Performance Architectures, in Proceedings of the EuroPar97 Conference, Lecture Notes in Computer Science, Springer, Aug. 1997, pp. 26–29.Google Scholar
  15. 15.
    S. Sellappa AND S. Chatterjee, Cache—Efficient Multigrid Algorithms, in Proceedings of the 2001 International Conference on Computational Science (ICCS 2001), vol. 2073 and 2074 of Lecture Notes in Computer Science, San Francisco, California, USA, May 2001, Springer, pp. 107–116.Google Scholar
  16. 16.
    C. Weis, W. Karl, M. Kowarschik, AND U. Rüde, Memory Characteristics of Iterative Methods, in Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, Nov. 1999.Google Scholar
  17. 17.
    R. C. Whaley AND J. Dongarra, Automatically Tuned Linear Algebra Software, in Proceedings of the ACM/IEEE SC98 Conference, Orlando, Florida, USA, Nov. 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Markus Kowarschik
    • 1
  • Ulrich Rüde
    • 1
  • Christian Weiß
    • 2
  1. 1.Lehrstuhl für Systemsimulation (Informatik 10), Institut für InformatikUniversität Erlangen-NürnbergGermany
  2. 2.Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR-TUM), Fakultät für InformatikTechnische Universität MünchenGermany

Personalised recommendations