Skip to main content

Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMV

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)


The growing need for inference on edge devices brings with it a necessity for efficient hardware, optimized for particular computational kernels, such as Sparse Matrix-Vector Multiplication (SpMV). With the RISC-V Instruction Set Architecture (ISA) providing unprecedented freedom to hardware designers, there is now a greater opportunity to tailor these microarchitectures to both the application requirements and the data it is expected to process. In this paper, we demonstrate the use of the insights provided by the Cache-Aware Roofline Model (CARM) in the hardware design process, optimizing a RISC-V architecture for efficient and performant execution of SpMV. Specifically, we assess the effect architectural parameters associated with the processor’s cache and floating-point unit have on the architecture and SpMV performance. Following a reparameterization closely guided by the CARM, we demonstrate a \(2.04\times \) improvement in performance and a significant decrease in underused computational resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. Alappat, C., et al.: Level-based blocking for sparse matrices: sparse matrix-power-vector multiplication. IEEE Trans. Parallel Distrib. Syst. 34(2), 581–597 (2023)

    Article  Google Scholar 

  2. Chen, X., Chen, Y., et al.: ReGraph: scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. Technical report (2022). arXiv:2203.02676 [cs] type: article

  3. Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1–25 (2011)

    MathSciNet  MATH  Google Scholar 

  4. Elafrou, A., Goumas, G., Koziris, N.: Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures. In: International Conference for High Performance Computing. Networking, Storage and Analysis, Denver, Colorado, pp. 1–15. ACM (2019)

    Google Scholar 

  5. Ilic, A., Pratas, F., Sousa, L.: Cache-aware roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)

    Article  Google Scholar 

  6. Koohi Esfahani, M., Kilpatrick, P., Vandierendonck, H.: Exploiting in-hub temporal locality in SpMV-based graph processing. In: International Conference on Parallel Processing, Lemont, IL, USA, pp. 1–10. ACM (2021)

    Google Scholar 

  7. Li, S., Liu, D., Liu, W.: Optimized data reuse via reordering for sparse matrix-vector multiplication on FPGAs. In: IEEE/ACM International Conference On Computer Aided Design (ICCAD), Munich, Germany, pp. 1–9. IEEE (2021)

    Google Scholar 

  8. Lowe-Power, J., et al.: The gem5 Simulator: Version 20.0+. arXiv:2007.03152 [cs] (2020)

  9. Marques, D., Duarte, H., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 898–907 (2017)

    Google Scholar 

  10. Namashivavam, N., Mehta, S., Yew, P.C.: Variable-sized blocks for locality-aware SpMV. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Seoul, South Korea, pp. 211–221. IEEE (2021)

    Google Scholar 

  11. Shuvo, M.M.H., et al.: Efficient acceleration of deep learning inference on resource-constrained edge devices: a review. Proc. IEEE 111(1), 42–91 (2023)

    Article  Google Scholar 

  12. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Technical report 1407078 (2009)

    Google Scholar 

  13. Xia, T., et al.: A comprehensive performance model of sparse matrix-vector multiplication to guide kernel optimization. IEEE Trans. Parallel Distrib. Syst. 34(2), 519–534 (2023)

    Article  Google Scholar 

  14. Yesil, S., et al.: WISE: predicting the performance of sparse matrix vector multiplication with machine learning. In: ACM Symposium on Principles and Practice of Parallel Programming, Montreal, Canada, pp. 329–341. ACM (2023)

    Google Scholar 

  15. Zhao, H., et al.: Exploring better speculation and data locality in sparse matrix-vector multiplication on Intel Xeon. In: IEEE International Conference on Computer Design (ICCD), Hartford, CT, USA, pp. 601–609. IEEE (2020)

    Google Scholar 

Download references


This project has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 and Specific Grant Agreement No 101036168 (EPI SGA2) and Grant agreement No 956213 (SparCity). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and Turkey. It also received funding from FCT (Fundação para a Ciência e a Tecnologia, Portugal), through the UIDB/50021/2020 project.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Alexandre Rodrigues .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rodrigues, A., Sousa, L., Ilic, A. (2023). Performance Modelling-Driven Optimization of RISC-V Hardware for Efficient SpMV. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40842-7

  • Online ISBN: 978-3-031-40843-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics