Skip to main content

Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs

  • Conference paper
  • First Online:
Intelligent Technologies and Engineering Systems

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 234))

Abstract

In this chapter, we propose a parallel algorithm for sparse matrix transposition using CSR format to run on many-core GPUs, utilizing the tremendous computational power and memory bandwidth of the GPU offered by parallel programming in CUDA. Our code is run on a quad-core Intel Xeon64 CPU E5507 platform and a NVIDIA GPU GTX 470 card. We measure the performance of our algorithm running with input ranging from smaller to larger matrices, and our experimental results show that the preliminary results are scaling well up to 512 threads and are promising for bigger matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. NVIDIA (2008) CUDA programming guide. NVIDIA Corporation, June, 2008, version 2.0.

    Google Scholar 

  2. Buluç A, Fineman JT, Frigo M, Gilbert JR, Leiserson CE (2009) Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the 21th annual symposium on parallelism in algorithms and architectures, Calgary, 2009, pp 233–244

    Google Scholar 

  3. Cilk++ programmer’s guide (2009) Cilk Arts, Inc., Burlington. Available at: http://www.cilk.com

  4. Krishnamoorthy S, Baumgartner G, Cociorva D, Lam C-C, Sadayappan P (2004) Efficient parallel out-of-core matrix transposition. Int J High Perform Comput Netw 2(2–4):110–119

    Article  Google Scholar 

  5. Mateescu G, Bauer GH, Fiedler RA (2011) Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the second international workshop on performance modeling, benchmarking and simulation of high performance computing systems, 2011, pp 5–6

    Google Scholar 

  6. Stathis P, Cheresiz D, Vassiliadis S, Juurlink B (2004) Sparse matrix transpose unit. In: Proceeding of the 18th international conference on parallel and distributed processing symposium (IPDPS), 2004

    Google Scholar 

  7. Gustavson FG (1978) Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans Math Softw 4(3):250–269

    Article  MathSciNet  MATH  Google Scholar 

  8. Davis TA (1994) University of Florida sparse matrix collection. NA Dig 92

    Google Scholar 

  9. Li K-C, Weng T-H (2009) Performance-based parallel application toolkit for high-performance clusters. J Supercomput 48(1):43–65

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This chapter is based upon the work supported in part by Taiwan National Science Council (NSC) grants no. NSC101-2221-E-126-002 and NSC101-2915-I-126-001 and NVIDIA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSC or NVIDIA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tien-Hsiung Weng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Weng, TH., Pham, H., Jiang, H., Li, KC. (2013). Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs. In: Juang, J., Huang, YC. (eds) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol 234. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6747-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6747-2_31

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6746-5

  • Online ISBN: 978-1-4614-6747-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics