ICCSA 2016: Computational Science and Its Applications – ICCSA 2016 pp 3-18 | Cite as
Parallel Sparse Matrix-Vector Multiplication Using Accelerators
Abstract
Sparse matrix-vector multiplication (SpMV) is an essential computational kernel for many applications such as scientific computing. Recently, the number of computing systems equipped with NVIDIA’s GPU and Intel’s Xeon Phi coprocessor based on the MIC architecture has been increasing. Therefore, the importance of effective algorithms for SpMV in these systems is increasing. To the best of our knowledge, while previous studies have reported CPU and GPU implementations of SpMV for a cluster and MIC implementations for a single node, implementations of SpMV for the MIC cluster have not yet been reported. In this paper, we implemented and evaluated parallel SpMV on a GPU cluster and a MIC cluster. As shown by the results, the implementation for MIC achieved relatively high performance in some matrices with a single process, but it could not achieve higher performance than other implementations with 64 MPI processes. Therefore, we implemented and evaluated the single SpMV kernel to improve the performance of parallel SpMV.
Keywords
SpMV Accelerator GPU MIC ClusterNotes
Acknowledgments
This research was supported by Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Agency (JST).
References
- 1.MVAPICH Benchmarks. http://mvapich.cse.ohio-state.edu/benchmarks/
- 2.Davis, T.: University of Florida Sparse Matrix Collection: sparse matrices from a wide range of applications. http://www.cise.ufl.edu/research/sparse/matrices/
- 3.Alexandersen, J., Lazarov, B., Dammann, B.: Parallel Sparse Matrix - Vector Product: Pure MPI and hybrid MPI-OpenMP implementation. IMM-Technical report-2012 (2012)Google Scholar
- 4.Catalyurek, U., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)CrossRefGoogle Scholar
- 5.Cevahir, A., Nukada, A., Matsuoka, S.: CG on GPU-enhanced clusters. IPSJ SIG Tech. Rep. 2009(15), 1–8 (2009)Google Scholar
- 6.Kudo, M., Kuroda, H., Katagiri, T., Kanada, Y.: The effect of optimal algorithm selection of parallel sparse matrix-vector multiplication. IPSJ SIG Tech. Rep. 2002(22), 151–156 (2002). (in Japanese)Google Scholar
- 7.Lange, M., Gorman, G., Weiland, M., Mitchell, L., Southern, J.: Achieving efficient strong scaling with PETSc using hybrid MPI/OpenMP optimisation. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 97–108. Springer, Heidelberg (2013)CrossRefGoogle Scholar
- 8.Liu, W., Vinter, B.: bhSPARSEBenchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5
- 9.Liu, W., Vinter, B.: CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. CoRR abs/1503.05032 (2015)Google Scholar
- 10.Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing. ICS 2013, pp. 273–282. ACM (2013)Google Scholar
- 11.Maeda, H., Takahashi, D.: Performance evaluation of sparse matrix-vector multiplication using GPU/MIC cluster. In: 2015 Third International Symposium on Computing and Networking (CANDAR 2015). 3rd International Workshop on Computer Systems and Architectures (CSA 2015), pp. 396–399 (2015)Google Scholar
- 12.Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010)CrossRefGoogle Scholar
- 13.Ohshima, S., Sakurai, T., Katagiri, T., Nakajima, K., Kuroda, H., Naono, K., Igai, M., Itoh, S.: Optimized implementation of segmented scan method for CUDA. IPSJ Tech. Rep. 2010-HPC-126(1), 1–7 (2010). (in Japanese)Google Scholar
- 14.Pinar, A., Heath, M.T.: Improving performance of sparse matrix-vector multiplication. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing. SC 1999. ACM (1999)Google Scholar
- 15.Saule, E., Kaya, K.: Performance evaluation of sparse matrix multiplication kernels on intel Xeon Phi. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) Parallel Processing and Applied Mathematics. LNCS, vol. 8384, pp. 559–570. Springer, Heidelberg (2014)CrossRefGoogle Scholar
- 16.Tang, W.T., Tan, W.J., Ray, R., Wong, Y.W., Chen, W., Kuo, S., Goh, R.S.M., Turner, S.J., Wong, W.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. SC 2013, pp. 26:1–26:12 (2013)Google Scholar
- 17.Ye, F., Calvin, C., Petiton, S.G.: A study of SpMV implementation using MPI and OpenMP on intel many-core architecture. In: Daydé, M., Marques, O., Nakajima, K. (eds.) VECPAR 2014. LNCS, vol. 8969, pp. 43–56. Springer, Heidelberg (2015)Google Scholar