Data Layout Optimizations for Variable Coefficient Multigrid
Efficient program execution can only be achieved if the codes respect the hierarchical memory design of the underlying architectures; programs must exploit caches to avoid high latencies involved with main memory accesses. However, iterative methods like multigrid are characterized by successive sweeps over data sets, which are commonly too large to fit in cache.
This paper is based on our previous work on data access transformations for multigrid methods for constant coefficient problems. However, the case of variable coefficients, which we consider here, requires more complex data structures.
We focus on data layout techniques to enhance the cache efficiency of multigrid codes for variable coefficient problems on regular meshes. We provide performance results which illustrate the effectiveness of our layout optimizations in conjunction with data access transformations.
KeywordsMultigrid Method Storage Scheme Data Layout Loop Fusion Padding Size
- 1.F. Bassetti, K. Davis, AND D. Quinlan, Temporal Locality Optimizations for Stencil Operations within Parallel Object—Oriented Scientific Frameworks on Cache-Based Architectures, in Proc. of the International Conf. on Parallel and Distributed Computing and Systems, Las Vegas, Nevada, USA, Oct. 1998, pp. 145–153.Google Scholar
- 5.D. Genius AND S. Lelait, A Case for Array Merging in Memory Hierarchies, in Proceedings of the 9th Workshop on Compilers for Parallel Computers (CPC’01), Edinburgh, Scotland, June 2001.Google Scholar
- 7.J. L. Hennessy AND D. A. Patterson, Computer Architecture — A Quantitative Approach, Morgan Kaufmann Publishers, second ed., 1996.Google Scholar
- 8.J. Hu, Cache Based Multigrid on Unstructured Grids in Two and Three Dimensions, PhD thesis, Department of Mathematics, University of Kentucky, 2000.Google Scholar
- 11.H. Pfänder, Cache—optimierte Mehrgitterverfahren mit variablen Koeffizienten auf strukturierten Gittern, Master’s thesis, Department of Computer Science, University of Erlangen-Nuremberg, Germany, 2000.Google Scholar
- 12.G. Rivera AND C.-W. Tseng, Data Transformations for Eliminating Conflict Misses, in Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98), Montreal, Canada, June 1998.Google Scholar
- 13.G. Rivera AND C.-W. Tseng, Tiling Optimizations for 3D Scientific Computation, in Proceedings of the ACM/IEEE SC00 Conference, Dallas, Texas, USA, Nov. 2000.Google Scholar
- 14.U. Rüde, Iterative Algorithms on High Performance Architectures, in Proceedings of the EuroPar97 Conference, Lecture Notes in Computer Science, Springer, Aug. 1997, pp. 26–29.Google Scholar
- 15.S. Sellappa AND S. Chatterjee, Cache—Efficient Multigrid Algorithms, in Proceedings of the 2001 International Conference on Computational Science (ICCS 2001), vol. 2073 and 2074 of Lecture Notes in Computer Science, San Francisco, California, USA, May 2001, Springer, pp. 107–116.Google Scholar
- 16.C. Weis, W. Karl, M. Kowarschik, AND U. Rüde, Memory Characteristics of Iterative Methods, in Proceedings of the ACM/IEEE SC99 Conference, Portland, Oregon, Nov. 1999.Google Scholar
- 17.R. C. Whaley AND J. Dongarra, Automatically Tuned Linear Algebra Software, in Proceedings of the ACM/IEEE SC98 Conference, Orlando, Florida, USA, Nov. 1998.Google Scholar