Rescheduling for Locality in Sparse Matrix Computations

Strout, Michelle Mills; Carter, Larry; Ferrante, Jeanne

doi:10.1007/3-540-45545-0_23

Michelle Mills Strout⁵,
Larry Carter⁵ &
Jeanne Ferrante⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2073))

Included in the following conference series:

International Conference on Computational Science

2447 Accesses
12 Citations

Abstract

In modern computer architecture the use of memory hierarchies causes a program’s data locality to directly affect performance. Data locality occurs when a piece of data is still in a cache upon reuse. For dense matrix computations, loop transformations can be used to improve data locality. However, sparse matrix computations have non-affine loop bounds and indirect memory references which prohibit the use of compile time loop transformations. This paper describes an algorithm to tile at runtime called serial sparse tiling. We test a runtime tiled version of sparse Gauss-Seidel on 4 different architectures where it exhibits speedups of up to 2.7. The paper also gives a static model for determining tile size and outlines how overhead affects the overall speedup.

Download to read the full chapter text

Chapter PDF

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

Article 12 June 2023

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Just in Time Load Balancing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Jeff Bilmes, Krste Asanović, Chee whye Chin, and Jim Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proceedings of International Conference on Supercomputing, Vienna, Austria, July 1997.
Google Scholar
James W. Demmel, Stanley C. Eisenstat, John R. Gilbert, Xiaoye S. Li, and Joseph W. H. Liu. A supernodal approach to sparse partial pivoting. SIAM Journal on Matrix Analysis and Applications, 20(3):720–755, July 1999.
Article MATH MathSciNet Google Scholar
Chen Ding and Ken Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the ACM SIGPLAN’ 99 Conference on Programming Language Design and Implementation, pages 229–241, Atlanta, Georgia, May 1-4, 1999.
Google Scholar
Craig C. Douglas, Jonathan Hu, Markus Kowarschik, Ulrich Rüde, and Christian Weiss. Cache Optimization for Structured and Unstructured Grid Multigrid. Electronic Transaction on Numerical Analysis, pages 21–40, February 2000.
Google Scholar
Matteo Frigo and Steven G. Johnson. Fftw: An adaptive software architecture for the fit. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, page 1381, 1998.
Google Scholar
Michael R. Garey, David S. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1:237–267, 1976.
Article MATH MathSciNet Google Scholar
Kang Su Gatlin. Portable High Performance Programming via Architecture Cognizant Divide-and-Conquer Algorithms. Ph.d. thesis, University of California, San Diego, September 2000.
Google Scholar
Hwansoo Han and Chau-Wen Tseng. Efficient compiler and run-time support for parallel irregular reductions. Parallel Computing, 26(13-14):1861–1887, December 2000.
Article MATH Google Scholar
Michael Holst. Fetk = the finite element toolkit. http://www.fetk.org.
Eun-Jin Im. Optimizing the Performance of Sparse Matrix-Vector Multiply. Ph.d. thesis, University of California, Berkeley, May 2000.
Google Scholar
George Karypis and Vipin Kumar. Metis: A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and computing Fill-Reducing Orderings of Sparse Matrices Version 4.0, 1998.
Google Scholar
George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96–129, 10 January 1998.
Article MathSciNet Google Scholar
John Mellor-Crummey, David Whalley, and Ken Kennedy. Improving memory hierarchy performance for irregular applications. In Proceedings of the 1999 Conference on Supercomputing, ACM SIGARCH, pages 425–433, N.Y., June 20-25 1999. ACM Press.
Google Scholar
Nicholas Mitchell, Larry Carter, and Jeanne Ferrante. Localizing non-affine array references. In Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques (PACT’ 99), pages 192–202, Newport Beach, California, October 12-16, 1999. IEEE Computer Society Press.
Google Scholar
Nick Mitchell. Guiding Program Transformations with Modal Performance Model. Ph.d. thesis, University of California, San Diego, August 2000.
Google Scholar
R. Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Supercomputer 98, 1998.
Google Scholar
Michael J. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley, 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, San Diego
Michelle Mills Strout, Larry Carter & Jeanne Ferrante

Authors

Michelle Mills Strout
View author publications
You can also search for this author in PubMed Google Scholar
Larry Carter
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne Ferrante
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Cybernetics and Electronic Engineering, University of Reading, Whiteknights, P.O. Box 225, Reading, RG6 6AY, UK
Vassil N. Alexandrov
Innovative Computing Lab, Computer Science Department, University of Tennessee, 1122 Volunteer Blvd, Knoxville, TN, 37996-3450, USA
Jack J. Dongarra
Computer Science Department, California State University, Chico, CA, 95929-0410, USA
Benjoe A. Juliano & René S. Renner &
School of Computer Science, The Queen’s University of Belfast, Belfast, BT7 1NN, Northern Ireland, UK
C. J. Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Strout, M.M., Carter, L., Ferrante, J. (2001). Rescheduling for Locality in Sparse Matrix Computations. In: Alexandrov, V.N., Dongarra, J.J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds) Computational Science — ICCS 2001. ICCS 2001. Lecture Notes in Computer Science, vol 2073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45545-0_23

Download citation

DOI: https://doi.org/10.1007/3-540-45545-0_23
Published: 17 July 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42232-7
Online ISBN: 978-3-540-45545-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Rescheduling for Locality in Sparse Matrix Computations

Abstract

Chapter PDF

Similar content being viewed by others

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Just in Time Load Balancing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Rescheduling for Locality in Sparse Matrix Computations

Abstract

Chapter PDF

Similar content being viewed by others

TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves

Just in Time Load Balancing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation