S-MPEC: Sparse Matrix Multiplication Performance Estimator on a Cloud Environment


Sparse matrix multiplication (SPMM) is widely used for various machine learning algorithms. As the applications of SPMM using large-scale datasets become prevalent, executing SPMM jobs on an optimized setup has become very important. Execution environments of distributed SPMM tasks on cloud resources can be set up in diverse ways with respect to the input sparse datasets, distinct SPMM implementation methods, and the choice of cloud instance types. In this paper, we propose S-MPEC which can predict latency to complete various SPMM tasks using Apache Spark on distributed cloud environments. We first characterize various distributed SPMM implementations on Apache Spark. Considering the characters and hardware specifications on the cloud, we propose unique features to build a GB-regressor model and Bayesian optimizations. Our proposed S-MPEC model can predict latency on an arbitrary SPMM task accurately and recommend an optimal implementation method. Thorough evaluation of the proposed system reveals that a user can expect 44% less latency to complete SPMM tasks compared with the native SPMM implementations in Apache Spark.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.



  1. 1.

    Alipourfard, O., Liu, H.H., Chen, J., Venkataraman, S., Yu, M., Zhang, M.: Cherrypick: adaptively unearthing the best cloud configurations for big data analytics. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, pp. 469–482 (2017). https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/alipourfard

  2. 2.

    Bosagh Zadeh, R., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E., Staple, A., Zaharia, M.: Matrix Computations and Optimization in Apache Spark, Ser. KDD ’16. ACM, New York, pp. 31–38 (2016)

  3. 3.

    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  4. 4.

    Chen, D., Plemmons, R.J.: Nonnegativity constraints in numerical analysis. In: Bultheel, A., Cools, R. (eds.) Symposium on the Birth of Numerical Analysis. World Scientific Press, Singapore (2009)

    Google Scholar 

  5. 5.

    Cheng, Y., Iqbal, M.S., Gupta, A., Butt, A.R.: Cast: Tiering storage for data analytics in the cloud. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, ser. HPDC ’15. ACM, New York, pp. 45–56 (2015). https://doi.org/10.1145/2749246.2749252

  6. 6.

    Choi, J., Dongarra, J.J., Pozo, R., Walker, D.W.: Scalapack: a scalable linear algebra library for distributed memory concurrent computers. In: Proceedings of Fourth Symposium on the Frontiers of Massively Parallel Computation, vol. 1992, pp. 120–127 (1992)

  7. 7.

    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation, vol. 6, ser. OSDI’04. USENIX Association, Berkeley, p. 10 (2004). http://dl.acm.org/citation.cfm?id=1251254.1251264

  8. 8.

    Demmel, J., Eliahu, D., Fox, A., Kamil, S., Lipshitz, B., Schwartz, O., Spillinger, O.: Communication-optimal parallel recursive rectangular matrix multiplication. In: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, ser. IPDPS ’13. IEEE Computer Society, Washington, DC, pp. 261–272 (2013). https://doi.org/10.1109/IPDPS.2013.80

  9. 9.

    Elith, J., Leathwick, J.R., Hastie, T.: A working guide to boosted regression trees. J. Anim. Ecol. 77(4), 802–813 (2008)

    Article  Google Scholar 

  10. 10.

    Foldi, T., von Csefalvay, C., Perez, N.A.: Jampi: efficient matrix multiplication in spark using barrier execution mode. Big Data Cogn. Comput. 4, 32 (2020). https://doi.org/10.3390/bdcc4040032

    Article  Google Scholar 

  11. 11.

    Foundation, A.S.: Apache hadoop (2004). http://hadoop.apache.org/

  12. 12.

    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://doi.org/10.1214/aos/1013203451

    MathSciNet  Article  MATH  Google Scholar 

  13. 13.

    Gu, R., Tang, Y., Wang, Z., Wang, S., Yin, X., Yuan, C., Huang, Y.: Efficient large scale distributed matrix computation with spark. In: 2015 IEEE International Conference on Big Data (Big Data), October 2015, pp. 2327–2336 (2015)

  14. 14.

    Herodotou, H., Babu, S.: Profiling, what-if analysis, and cost-based optimization of mapreduce programs. PVLDB 4(11), 1111–1122 (2011)

    Google Scholar 

  15. 15.

    Huss-Lederman, S., Jacobson, E.M., Johnson, J.R., Tsao, A., Turnbull, T.: Implementation of Strassen’s algorithm for matrix multiplication. In: Supercomputing ’96:Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, pp. 32–32 (1996)

  16. 16.

    Jalaparti, V., Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Bridging the tenant-provider gap in cloud services. In: Proceedings of the Third ACM Symposium on Cloud Computing, ser. SoCC ’12. ACM, New York, pp. 10:1–10:14 (2012). http://doi.acm.org/10.1145/2391229.2391239

  17. 17.

    Kepner, J., Gilbert, J.: Graph Algorithms in the Language of Linear Algebra. Society for Industrial and Applied Mathematics (2011). https://doi.org/10.1137/1.9780898719918

  18. 18.

    Kim, J., Lee, K.: Functionbench: a suite of workloads for serverless cloud function service. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), July 2019, pp. 502–504 (2019)

  19. 19.

    Kim, J., Son, M., Lee, K.: Mpec: Distributed matrix multiplication performance modeling on a scale-out cloud environment for data mining jobs. IEEE Trans. Cloud Comput. (2019). https://doi.org/10.1109/tcc.2019.2950400

    Article  Google Scholar 

  20. 20.

    Klimovic, A., Litz, H., Kozyrakis, C.: Selecta: Heterogeneous cloud storage configuration for data analytics. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, Boston, July 2018, pp. 759–773 (2018). https://www.usenix.org/conference/atc18/presentation/klimovic-selecta

  21. 21.

    Langr, D., Simecek, I.: Analysis of memory footprints of sparse matrices partitioned into uniformly-sized blocks. Scalable Comput. Pract. Exp. 19(3), 275–292 (2018)

    Article  Google Scholar 

  22. 22.

    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS. MIT Press, Cambridge, pp. 556–562 (2000)

  23. 23.

    Lee, H.-J., Robertson, J.P., Fortes, J.A.B.: Generalized cannon’s algorithm for parallel matrix multiplication. In: Proceedings of the 11th International Conference on Supercomputing, ser. ICS ’97. Association for Computing Machinery, New York, pp. 44–51 (1997). https://doi.org/10.1145/263580.263591

  24. 24.

    Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014

  25. 25.

    Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., Xin, D., Xin, R., Franklin, M.J., Zadeh, R., Zaharia, M., Talwalkar, A.: Mllib: Machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  26. 26.

    Misra, C., Bhattacharya, S., Ghosh, S.K.: Stark: fast and scalable Strassen’s matrix multiplication using apache spark. IEEE Trans. Big Data (2020). https://doi.org/10.1109/TBDATA.2020.2977326

    Article  Google Scholar 

  27. 27.

    Nguyen Binh Duong, T.A.: FC2: cloud-based cluster provisioning for distributed machine learning. Clust. Comput. 22(4), 1299–1315 (2019)

    Article  Google Scholar 

  28. 28.

    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Stanford InfoLab, Technical Report 1999-66, November 1999, previous number = SIDL-WP-1999-0120 (1999). http://ilpubs.stanford.edu:8090/422/

  29. 29.

    Park, J., Kim, , H., Lee, K.: Evaluating concurrent executions of multiple function-as-a-service runtimes with microvm. In: 2020 IEEE 13th International Conference on Cloud Computing (CLOUD) (2020)

  30. 30.

    Park, J., Lee, K.: Performance prediction of sparse matrix multiplication on a distributed bigdata processing environment. In: 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), pp. 30–35 (2020)

  31. 31.

    Patwary, M.M.A., Satish, N.R., Sundaram, N., Park, J., Anderson, M.J., Vadlamudi, S.G., Das, D., Pudov, S.G., Pirogov, V.O., Dubey, P.: Parallel efficient sparse matrix–matrix multiplication on multicore platforms. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, pp. 48–57. Springer, Cham (2015)

    Chapter  Google Scholar 

  32. 32.

    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)

    MathSciNet  Article  Google Scholar 

  33. 33.

    Seo, S., Yoon, E.J., Kim, J., Jin, S., Kim, J., Maeng, S.: Hama: An efficient matrix computation with the mapreduce framework. In: IEEE Second International Conference on Cloud Computing Technology and Science, vol. 2010, 721–726 (2010)

  34. 34.

    Shahidinejad, A., Ghobaei-Arani, M., Masdari, M.: Resource provisioning using workload clustering in cloud computing environment: a hybrid approach. Clust. Comput. 24, 1–24 (2021)

    Article  Google Scholar 

  35. 35.

    Shen, C., Tong, W., Choo, K.-K.R., Kausar, S.: Performance prediction of parallel computing models to analyze cloud-based big data applications. Clust. Comput. 21, 06 (2018)

    Article  Google Scholar 

  36. 36.

    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 2, ser. NIPS’12. Curran Associates, Red Hook, pp. 2951–2959 (2012). http://dl.acm.org/citation.cfm?id=2999325.2999464

  37. 37.

    Son, M., Lee, K.: Distributed matrix multiplication performance estimator for machine learning jobs in cloud computing. In: 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), July 2018, pp. 638–645 (2018). https://doi.org/10.1109/CLOUD.2018.00088

  38. 38.

    Spark, A.: Apache spark MLlib distributed matrix computation (2017). https://goo.gl/Vnii2M. Accessed 20 Nov 2017

  39. 39.

    van de Geijn, R.A., Watts, J.: Summa: Scalable universal matrix multiplication algorithm. Tech. Rep, Austin, TX, USA (1995)

  40. 40.

    Venkataraman, S., Yang, Z., Franklin, M.J., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: NSDI, pp. 363–378 (2016)

  41. 41.

    Wieder, A., Bhatotia, P., Post, A., Rodrigues, R.: Orchestrating the deployment of computations in the cloud with conductor. In: Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX, San Jose, pp. 367–381 (2012). https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/wieder

  42. 42.

    Yadwadkar, N.J., Hariharan, B., Gonzalez, J.E., Smith, B., Katz, R.H.: Selecting the best VM across multiple public clouds: a data-driven performance modeling approach. In: Proceedings of the 2017 Symposium on Cloud Computing, ser. SoCC ’17. ACM, New York, pp. 452–465 (2017). http://doi.acm.org/10.1145/3127479.3131614

  43. 43.

    Yu, Y., Tang, M., Aref, W.G., Malluhi, Q.M., Abbas, M.M., Ouzzani, M.: In-memory distributed matrix computation processing and optimization. In: ICDE, April 2017, pp. 1047–1058 (2017)

  44. 44.

    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: ”Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 15–28. USENIX, San Jose (2012)

Download references


This work is supported by the National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIP) (NRF-2020R1A2C1102544, NRF-2016R1C1B2015135, and NRF-2015R1A5A7037615), the ICT R&D program of IITP (2017-0-00396), and Research Credits provided by AWS.

Author information



Corresponding author

Correspondence to Kyungyong Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Park, J., Lee, K. S-MPEC: Sparse Matrix Multiplication Performance Estimator on a Cloud Environment. Cluster Comput (2021). https://doi.org/10.1007/s10586-021-03287-3

Download citation


  • Cloud computing
  • Instance recommendation
  • Sparse matrix multiplication
  • Apache Spark