Addressing Volume and Latency Overheads in 1D-parallel Sparse Matrix-Vector Multiplication
The scalability of sparse matrix-vector multiplication (SpMV) on distributed memory systems depends on multiple factors that involve different communication cost metrics. The irregular sparsity pattern of the coefficient matrix manifests itself as high bandwidth (total and/or maximum volume) and/or high latency (total and/or maximum message count) overhead. In this work, we propose a hypergraph partitioning model which combines two earlier models for one-dimensional partitioning, one addressing total and maximum volume, and the other one addressing total volume and total message count. Our model relies on the recursive bipartitioning paradigm and simultaneously addresses three cost metrics in a single partitioning phase in order to reduce volume and latency overheads. We demonstrate the validity of our model on a large dataset that contains more than 300 matrices. The results indicate that compared to the earlier models, our model significantly improves the scalability of SpMV.
KeywordsCommunication cost Sparse matrix-vector multiplication Hypergraph partitioning One-dimensional partitioning
We acknowledge PRACE for awarding us access to resource Marconi (Lenovo NextScale) based in Italy at CINECA Supercomputing Centre. This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant EEEAG-114E545. This article is also based upon work from COST Action CA 15109 (COSTNET).
- 3.Boman, E.G., Devine, K.D., Rajamanickam, S.: Scalable matrix computations on large scale-free graphs using 2D graph partitioning. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis SC 2013, NY, USA, pp. 50:1–50:12. ACM, New York (2013)Google Scholar
- 5.Çatalyürek, U., Aykanat, C.: A hypergraph-partitioning approach for coarse-grain decomposition. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing SC 2001, NY, USA, pp. 28–28. ACM, New York (2001)Google Scholar
- 8.Kumar, V.: Introduction to Parallel Computing, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)Google Scholar
- 9.Selvitopi, O., Acer, S., Aykanat, C.: A recursive hypergraph bipartitioning framework for reducing bandwidth and latency costs simultaneously. IEEE Trans. Parallel Distrib. Syst. 28(2), 345–358 (2017)Google Scholar
- 10.Slota, G.M., Madduri, K., Rajamanickam, S.: PuLP: Scalable multi-objective multi-constraint partitioning for small-world networks. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 481–490, October 2014Google Scholar
- 11.Uçar, B., Aykanat, C.: A library for parallel sparse matrix vector multiplies. Technical report BU-CE-0506, Bilkent University (2005)Google Scholar