Advertisement

Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms

  • Md. Mostofa Ali PatwaryEmail author
  • Nadathur Rajagopalan Satish
  • Narayanan Sundaram
  • Jongsoo Park
  • Michael J. Anderson
  • Satya Gautam Vadlamudi
  • Dipankar Das
  • Sergey G. Pudov
  • Vadim O. Pirogov
  • Pradeep Dubey
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9137)

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a key kernel in many applications in High Performance Computing such as algebraic multigrid solvers and graph analytics. Optimizing SpGEMM on modern processors is challenging due to random data accesses, poor data locality and load imbalance during computation. In this work, we investigate different partitioning techniques, cache optimizations (using dense arrays instead of hash tables), and dynamic load balancing on SpGEMM using a diverse set of real-world and synthetic datasets. We demonstrate that our implementation outperforms the state-of-the-art using Intel\(^{{\textregistered }}\) Xeon\(^{{\textregistered }}\) processors. We are up to 3.8X faster than Intel\(^{{\textregistered }}\) Math Kernel Library (MKL) and up to 257X faster than CombBLAS. We also outperform the best published GPU implementation of SpGEMM on nVidia GTX Titan and on AMD Radeon HD 7970 by up to 7.3X and 4.5X, respectively on their published datasets. We demonstrate good multi-core scalability (geomean speedup of 18.2X using 28 threads) as compared to MKL which gets 7.5X scaling on 28 threads.

Keywords

Hash Table Synthetic Dataset Dynamic Schedule Cache Line Dense Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
  2. 2.
  3. 3.
    Intel math kernel library (2015). https://software.intel.com/en-us/intel-mkl
  4. 4.
    Bell, N., Dalton, S., Olson, L.N.: Exposing fine-grained parallelism in algebraic multigrid methods. SIAM J. Sci. Comput. 34(4), C123–C152 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    Buluc, A., Gilbert, J.: On the representation and multiplication of hypersparse matrices. In: Proceedings of IPDPS, pp. 1–11, April 2008Google Scholar
  6. 6.
    Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: Implementation and experiments. CoRR abs/1109.3739 (2011)Google Scholar
  7. 7.
    Chan, T.M.: More algorithms for all-pairs shortest paths in weighted graphs. SIAM J. Comput. 39(5), 2075–2089 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  8. 8.
    Davis, T.A., Hu, Y.: The university of florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)MathSciNetGoogle Scholar
  9. 9.
    Gilbert, J., Moler, C., Schreiber, R.: Sparse matrices in matlab: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Gilbert, J.R., Reinhardt, S., Shah, V.B.: High-performance graph algorithms from parallel sparse matrices. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, pp. 260–269. Springer, Heidelberg (2007) Google Scholar
  11. 11.
    Gustavson, F.G.: Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans. Math. Softw. 4(3), 250–269 (1978)zbMATHMathSciNetCrossRefGoogle Scholar
  12. 12.
    Kaplan, H., Sharir, M., Verbin, E.: Colored intersection searching via sparse rectangular matrix multiplication. In: Symposium on Computational Geometry, pp. 52–60. ACM (2006)Google Scholar
  13. 13.
    Liu, W., Vinter, B.: An efficient GPU general sparse matrix-matrix multiplication for irregular data. In: Proceedings of IPDPS, pp. 370–381. IEEE (2014)Google Scholar
  14. 14.
    Murphy, R.C., Wheeler, K.B., Barrett, B.W., Ang, J.A.: Introducing the graph 500. Cray User’s Group (2010)Google Scholar
  15. 15.
    Siegel, J., et al.: Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems. In: IEEE Cluster Computing, pp. 1–8 (2010)Google Scholar
  16. 16.
    Sulatycke, P., Ghose, K.: Caching-efficient multithreaded fast multiplication of sparse matrices. In: Proceedings of IPPS/SPDP 1998, pp. 117–123, March 1998Google Scholar
  17. 17.
    Vassilevska, V., Williams, R., Yuster, R.: Finding heaviest h-subgraphs in real weighted graphs, with applications. CoRR abs/cs/0609009 (2006)Google Scholar
  18. 18.
    Zhu, Q., Graf, T., Sumbul, H., Pileggi, L., Franchetti, F.: Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In: IEEE HPEC, pp. 1–6 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Md. Mostofa Ali Patwary
    • 1
    Email author
  • Nadathur Rajagopalan Satish
    • 1
  • Narayanan Sundaram
    • 1
  • Jongsoo Park
    • 1
  • Michael J. Anderson
    • 1
  • Satya Gautam Vadlamudi
    • 1
  • Dipankar Das
    • 1
  • Sergey G. Pudov
    • 2
  • Vadim O. Pirogov
    • 2
  • Pradeep Dubey
    • 1
  1. 1.Parallel Computing Lab, Intel CorporationSanta ClaraUSA
  2. 2.Software and Services Group, Intel CorporationSanta ClaraUSA

Personalised recommendations