Parallel Sparse Matrix-Vector Multiplication Using Accelerators

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9787)

Abstract

Sparse matrix-vector multiplication (SpMV) is an essential computational kernel for many applications such as scientific computing. Recently, the number of computing systems equipped with NVIDIA’s GPU and Intel’s Xeon Phi coprocessor based on the MIC architecture has been increasing. Therefore, the importance of effective algorithms for SpMV in these systems is increasing. To the best of our knowledge, while previous studies have reported CPU and GPU implementations of SpMV for a cluster and MIC implementations for a single node, implementations of SpMV for the MIC cluster have not yet been reported. In this paper, we implemented and evaluated parallel SpMV on a GPU cluster and a MIC cluster. As shown by the results, the implementation for MIC achieved relatively high performance in some matrices with a single process, but it could not achieve higher performance than other implementations with 64 MPI processes. Therefore, we implemented and evaluated the single SpMV kernel to improve the performance of parallel SpMV.

Keywords

SpMV Accelerator GPU MIC Cluster 

Notes

Acknowledgments

This research was supported by Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Agency (JST).

References

  1. 1.
  2. 2.
    Davis, T.: University of Florida Sparse Matrix Collection: sparse matrices from a wide range of applications. http://www.cise.ufl.edu/research/sparse/matrices/
  3. 3.
    Alexandersen, J., Lazarov, B., Dammann, B.: Parallel Sparse Matrix - Vector Product: Pure MPI and hybrid MPI-OpenMP implementation. IMM-Technical report-2012 (2012)Google Scholar
  4. 4.
    Catalyurek, U., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)CrossRefGoogle Scholar
  5. 5.
    Cevahir, A., Nukada, A., Matsuoka, S.: CG on GPU-enhanced clusters. IPSJ SIG Tech. Rep. 2009(15), 1–8 (2009)Google Scholar
  6. 6.
    Kudo, M., Kuroda, H., Katagiri, T., Kanada, Y.: The effect of optimal algorithm selection of parallel sparse matrix-vector multiplication. IPSJ SIG Tech. Rep. 2002(22), 151–156 (2002). (in Japanese)Google Scholar
  7. 7.
    Lange, M., Gorman, G., Weiland, M., Mitchell, L., Southern, J.: Achieving efficient strong scaling with PETSc using hybrid MPI/OpenMP optimisation. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 97–108. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Liu, W., Vinter, B.: bhSPARSEBenchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5
  9. 9.
    Liu, W., Vinter, B.: CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. CoRR abs/1503.05032 (2015)Google Scholar
  10. 10.
    Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS 2013, pp. 273–282. ACM (2013)Google Scholar
  11. 11.
    Maeda, H., Takahashi, D.: Performance evaluation of sparse matrix-vector multiplication using GPU/MIC cluster. In: 2015 Third International Symposium on Computing and Networking (CANDAR 2015). 3rd International Workshop on Computer Systems and Architectures (CSA 2015), pp. 396–399 (2015)Google Scholar
  12. 12.
    Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Ohshima, S., Sakurai, T., Katagiri, T., Nakajima, K., Kuroda, H., Naono, K., Igai, M., Itoh, S.: Optimized implementation of segmented scan method for CUDA. IPSJ Tech. Rep. 2010-HPC-126(1), 1–7 (2010). (in Japanese)Google Scholar
  14. 14.
    Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. SC 1999. ACM (1999)Google Scholar
  15. 15.
    Saule, E., Kaya, K.: Performance evaluation of sparse matrix multiplication kernels on intel Xeon Phi. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) Parallel Processing and Applied Mathematics. LNCS, vol. 8384, pp. 559–570. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  16. 16.
    Tang, W.T., Tan, W.J., Ray, R., Wong, Y.W., Chen, W., Kuo, S., Goh, R.S.M., Turner, S.J., Wong, W.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC 2013, pp. 26:1–26:12 (2013)Google Scholar
  17. 17.
    Ye, F., Calvin, C., Petiton, S.G.: A study of SpMV implementation using MPI and OpenMP on intel many-core architecture. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR 2014. LNCS, vol. 8969, pp. 43–56. Springer, Heidelberg (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaTsukubaJapan
  2. 2.Center for Computational SciencesUniversity of TsukubaTsukubaJapan

Personalised recommendations