Skip to main content

Parallelization of Sparse Matrix Kernels for Big Data Applications

  • Chapter
  • First Online:
Resource Management for Big Data Platforms

Part of the book series: Computer Communications and Networks ((CCN))

  • 1543 Accesses

Abstract

Analysis of big data on large-scale distributed systems often necessitates efficient parallel graph algorithms that are used to explore the relationships between individual components. Graph algorithms use the basic adjacency list representation for graphs, which can also be viewed as a sparse matrix. This correspondence between representation of graphs and sparse matrices makes it possible to express many important graph algorithms in terms of basic sparse matrix operations, where the literature for optimization is more mature. For example, the graph analytic libraries such as Pegasus and Combinatorial BLAS use sparse matrix kernels for a wide variety of operations on graphs. In this work, we focus on two such important sparse matrix kernels: Sparse matrix–sparse matrix multiplication (SpGEMM) and sparse matrix–dense matrix multiplication (SpMM). We propose partitioning models for efficient parallelization of these kernels on large-scale distributed systems. Our models aim at reducing and improving communication volume while balancing computational load, which are two vital performance metrics on distributed systems. We show that by exploiting sparsity patterns of the matrices through our models, the parallel performance of SpGEMM and SpMM operations can be significantly improved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Intel math kernel library (2015). https://software.intel.com/en-us/intel-mkl

  2. Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pp. 1–11. IEEE Computer Society, Washington, DC, USA (2010). doi:10.1109/SC.2010.46

  3. Akbudak, K., Aykanat, C.: Simultaneous input and output matrix partitioning for outer-product–parallel sparse matrix-matrix multiplication. SIAM J. Sci. Comput. 36(5), C568–C590 (2014). doi:10.1137/13092589X

    Google Scholar 

  4. Boldi, P., Rosa, M., Santini, M., Vigna, S.: Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks. In: Srinivasan S., Ramamritham K., Kumar A., Ravindra M.P., Bertino E., Kumar R. (eds.) Proceedings of the 20th International Conference on World Wide Web, pp. 587–596. ACM Press (2011)

    Google Scholar 

  5. Boldi, P., Vigna, S.: The WebGraph framework I: compression techniques. In: Proceedings of the Thirteenth International World Wide Web Conference (WWW 2004), pp. 595–601. ACM Press, Manhattan (2004)

    Google Scholar 

  6. Boman, E., Devine, K., Heaphy, R., Hendrickson, B., Heroux, M., Preis, R.: LDRD report: Parallel repartitioning for optimal solver performance. Tech. Rep. SAND2004–0365, Sandia National Laboratories, Albuquerque, NM (2004)

    Google Scholar 

  7. Buluç, A., Gilbert, J.R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments. SIAM J. Sci. Comput. (SISC) 34(4), 170–191 (2012). doi:10.1137/110848244; http://gauss.cs.ucsb.edu/~aydin/spgemm_sisc12.pdf

    Google Scholar 

  8. Buluç, A., Madduri, K.: Parallel breadth-first search on distributed memory systems. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pp. 65:1–65:12. ACM, New York, NY, USA (2011). doi:10.1145/2063384.2063471; http://doi.acm.org/10.1145/2063384.2063471

  9. Catalyurek, U.V., Aykanat, C.: Hypergraph-partitioning-based decomposition for parallel sparse-matrix vector multiplication. IEEE Trans. Parallel Distrib. Syst. 10(7), 673–693 (1999)

    Article  Google Scholar 

  10. CP2K: CP2K home page (Accessed at 2015). http://www.cp2k.org/

  11. D’Alberto, P., Nicolau, A.: R-kleene: A high-performance divide-and-conquer algorithm for the all-pair shortest path for densely connected networks. Algorithmica 47(2), 203–213 (2007). doi:10.1007/s00453-006-1224-z

    Google Scholar 

  12. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1 (2011)

    MathSciNet  Google Scholar 

  13. Dostál, Z., Horák, D., Kučera, R.: Total FETI-an easier implementable variant of the FETI method for numerical solution of elliptic PDE. Commun. Numer. Meth. Eng. 22(12), 1155–1162 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  14. Feng, Y., Owen, D., Peri, D.: A block conjugate gradient method applied to linear systems with multiple right-hand sides. Comput. Meth. Appl. Mech. Eng. 127(14), 203–215 (1995). http://dx.doi.org/10.1016/0045-7825(95)00832-2; http://www.sciencedirect.com/science/article/pii/0045782595008322

    Google Scholar 

  15. Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T., et al.: An overview of the Trilinos project. ACM Trans. Math. Softw. (TOMS) 31(3), 397–423 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  16. Horowitz, E., Sahni, S.: Fundamentals of Computer Algorithms. Computer Science Press (1978)

    Google Scholar 

  17. Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system implementation and observations. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, ICDM ’09, pp. 229–238. IEEE Computer Society, Washington, DC, USA (2009). doi:10.1109/ICDM.2009.14

  18. Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  19. Marion-Poty, V., Lefer, W.: A wavelet decomposition scheme and compression method for streamline-based vector field visualizations. Comput. Graphics 26(6), 899–906 (2002). doi:10.1016/S0097-8493(02)00178-4; http://www.sciencedirect.com/science/article/pii/S0097849302001784

    Google Scholar 

  20. Mattson, T., Bader, D., Berry, J., Buluc, A., Dongarra, J., Faloutsos, C., Feo, J., Gilbert, J., Gonzalez, J., Hendrickson, B., Kepner, J., Leiserson, C., Lumsdaine, A., Padua, D., Poole, S., Reinhardt, S., Stonebraker, M., Wallach, S., Yoo, A.: Standards for Graph Algorithm Primitives. ArXiv e-prints (2014)

    Google Scholar 

  21. NVIDIA Corporation: CUSPARSE library (2010)

    Google Scholar 

  22. O’Leary, D.P.: The block conjugate gradient algorithm and related methods. Linear Algebra Appl. 29(0), 293–322 (1980). http://dx.doi.org/10.1016/0024-3795(80)90247-5; http://www.sciencedirect.com/science/article/pii/0024379580902475. Special Volume Dedicated to Alson S. Householder

    Google Scholar 

  23. O’Leary, D.P.: Parallel implementation of the block conjugate gradient algorithm. Parallel Comput. 5(12), 127–139 (1987). http://dx.doi.org/10.1016/0167-8191(87)90013-5; http://www.sciencedirect.com/science/article/pii/0167819187900135. Proceedings of the International Conference on Vector and Parallel Computing-Issues in Applied Research and Development

  24. Sarıyuce, A.E., Saule, E., Kaya, K., Çatalyurek, U.V.: Regularizing graph centrality computations. J. Parallel Distrib. Comput. 76(0), 106–119 (2015). http://dx.doi.org/10.1016/j.jpdc.2014.07.006; http://www.sciencedirect.com/science/article/pii/S0743731514001282. Special Issue on Architecture and Algorithms for Irregular Applications

  25. Sawyer, W., Messmer, P.: Parallel grid manipulations for general circulation models. In: Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science, vol. 2328, pp. 605–608. Springer, Berlin (2006)

    Google Scholar 

  26. Selvitopi, O., Aykanat, C.: Reducing latency cost in 2D sparse matrix partitioning models. Parallel Comput. 57, 1–24 (2016). http://dx.doi.org/10.1016/j.parco.2016.04.004; http://www.sciencedirect.com/science/article/pii/S0167819116300138

    Google Scholar 

  27. Selvitopi, R.O., Ozdal, M.M., Aykanat, C.: A novel method for scaling iterative solvers: avoiding latency overhead of parallel sparse-matrix vector multiplies. IEEE Trans. Parallel Distrib. Syst. 26(3), 632–645 (2015). doi:10.1109/TPDS.2014.2311804

    Article  Google Scholar 

  28. Shi, Z., Zhang, B.: Fast network centrality analysis using gpus. BMC Bioinf. 12(1), 149 (2011). doi:10.1186/1471-2105-12-149

  29. Uçar, B., Aykanat, C.: Encapsulating multiple communication-cost metrics in partitioning sparse rectangular matrices for parallel matrix-vector multiplies. SIAM J. Sci. Comput. 25(6), 1837–1859 (2004). doi:10.1137/S1064827502410463

    Google Scholar 

  30. Van De Geijn, R.A., Watts, J.: Summa: scalable universal matrix multiplication algorithm. Concurrency-Pract. Experience 9(4), 255–274 (1997)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by The Scientific and Technological Research Council of Turkey (TUBITAK) under Grant EEEAG-115E212. This article is also based upon work from COST Action IC1406 (cHiPSet).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cevdet Aykanat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Selvitopi, O., Akbudak, K., Aykanat, C. (2016). Parallelization of Sparse Matrix Kernels for Big Data Applications. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44881-7_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44880-0

  • Online ISBN: 978-3-319-44881-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics