Abstract
In modern computer architecture the use of memory hierarchies causes a program’s data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. However, sparse matrix computations have non-affine loop bounds and indirect memory references which prohibit the use of compile time loop transformations. This paper describes an algorithm to tile at runtime called serial sparse tiling. We test a runtime tiled version of sparse Gauss-Seidel on 4 different architectures where it exhibits speedups of up to 2.7. The paper also gives a static model for determining tile size and outlines how overhead affects the overall speedup.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Jeff Bilmes, Krste Asanović, Chee whye Chin, and Jim Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proceedings of International Conference on Supercomputing, Vienna, Austria, July 1997.
James W. Demmel, Stanley C. Eisenstat, John R. Gilbert, Xiaoye S. Li, and Joseph W. H. Liu. A supernodal approach to sparse partial pivoting. SIAM Journal on Matrix Analysis and Applications, 20(3):720–755, July 1999.
Chen Ding and Ken Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN’ 99 Conference on Programming Language Design and Implementation, pages 229–241, Atlanta, Georgia, May 1-4, 1999.
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rüde, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21–40, February 2000.
Matteo Frigo and Steven G. Johnson. Fftw: An adaptive software architecture for the fit. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, page 1381, 1998.
Michael R. Garey, David S. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1:237–267, 1976.
Kang Su Gatlin. Portable High Performance Programming via Architecture Cognizant Divide-and-Conquer Algorithms. Ph.d. thesis, University of California, San Diego, September 2000.
Hwansoo Han and Chau-Wen Tseng. Efficient compiler and run-time support for parallel irregular reductions. Parallel Computing, 26(13-14):1861–1887, December 2000.
Michael Holst. Fetk = the finite element toolkit. http://www.fetk.org.
Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.
George Karypis and Vipin Kumar. Metis: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, 1998.
George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96–129, 10 January 1998.
John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pages 425–433, N.Y., June 20-25 1999. ACM Press.
Nicholas Mitchell, Larry Carter, and Jeanne Ferrante. Localizing non-affine array references. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT’ 99), pages 192–202, Newport Beach, California, October 12-16, 1999. IEEE Computer Society Press.
Nick Mitchell. Guiding Program Transformations with Modal Performance Model. Ph.d. thesis, University of California, San Diego, August 2000.
R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Supercomputer 98, 1998.
Michael J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Strout, M.M., Carter, L., Ferrante, J. (2001). Rescheduling for Locality in Sparse Matrix Computations. In: Alexandrov, V.N., Dongarra, J.J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds) Computational Science — ICCS 2001. ICCS 2001. Lecture Notes in Computer Science, vol 2073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45545-0_23
Download citation
DOI: https://doi.org/10.1007/3-540-45545-0_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42232-7
Online ISBN: 978-3-540-45545-5
eBook Packages: Springer Book Archive