Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs

Weng, Tien-Hsiung; Pham, Hoa; Jiang, Hai; Li, Kuan-Ching

doi:10.1007/978-1-4614-6747-2_31

Tien-Hsiung Weng³,
Hoa Pham³,
Hai Jiang⁴ &
…
Kuan-Ching Li³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 234))

2043 Accesses
3 Citations

Abstract

In this chapter, we propose a parallel algorithm for sparse matrix transposition using CSR format to run on many-core GPUs, utilizing the tremendous computational power and memory bandwidth of the GPU offered by parallel programming in CUDA. Our code is run on a quad-core Intel Xeon64 CPU E5507 platform and a NVIDIA GPU GTX 470 card. We measure the performance of our algorithm running with input ranging from smaller to larger matrices, and our experimental results show that the preliminary results are scaling well up to 512 threads and are promising for bigger matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

NVIDIA (2008) CUDA programming guide. NVIDIA Corporation, June, 2008, version 2.0.
Google Scholar
Buluç A, Fineman JT, Frigo M, Gilbert JR, Leiserson CE (2009) Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of the 21th annual symposium on parallelism in algorithms and architectures, Calgary, 2009, pp 233–244
Google Scholar
Cilk++ programmer’s guide (2009) Cilk Arts, Inc., Burlington. Available at: http://www.cilk.com
Krishnamoorthy S, Baumgartner G, Cociorva D, Lam C-C, Sadayappan P (2004) Efficient parallel out-of-core matrix transposition. Int J High Perform Comput Netw 2(2–4):110–119
Article Google Scholar
Mateescu G, Bauer GH, Fiedler RA (2011) Optimizing matrix transposes using a POWER7 cache model and explicit prefetching. In: Proceedings of the second international workshop on performance modeling, benchmarking and simulation of high performance computing systems, 2011, pp 5–6
Google Scholar
Stathis P, Cheresiz D, Vassiliadis S, Juurlink B (2004) Sparse matrix transpose unit. In: Proceeding of the 18th international conference on parallel and distributed processing symposium (IPDPS), 2004
Google Scholar
Gustavson FG (1978) Two fast algorithms for sparse matrices: multiplication and permuted transposition. ACM Trans Math Softw 4(3):250–269
Article MathSciNet MATH Google Scholar
Davis TA (1994) University of Florida sparse matrix collection. NA Dig 92
Google Scholar
Li K-C, Weng T-H (2009) Performance-based parallel application toolkit for high-performance clusters. J Supercomput 48(1):43–65
Article MATH Google Scholar

Download references

Acknowledgments

This chapter is based upon the work supported in part by Taiwan National Science Council (NSC) grants no. NSC101-2221-E-126-002 and NSC101-2915-I-126-001 and NVIDIA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSC or NVIDIA.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
Tien-Hsiung Weng, Hoa Pham & Kuan-Ching Li
Department of Computer Science, Arkansas State University, Jonesboro, AR, USA
Hai Jiang

Authors

Tien-Hsiung Weng
View author publications
You can also search for this author in PubMed Google Scholar
Hoa Pham
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Ching Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tien-Hsiung Weng .

Editor information

Editors and Affiliations

School of Engineering, Mercer University, 151 Brookefield Drive, Macon, 31210, Georgia, USA
Jengnan Juang
National Changhua University of Educatio, No. 1, Jin De Road, Changhua City, 500, Taiwan R.O.C.
Yi-Cheng Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weng, TH., Pham, H., Jiang, H., Li, KC. (2013). Designing Parallel Sparse Matrix Transposition Algorithm Using CSR for GPUs. In: Juang, J., Huang, YC. (eds) Intelligent Technologies and Engineering Systems. Lecture Notes in Electrical Engineering, vol 234. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6747-2_31

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6747-2_31
Published: 28 February 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6746-5
Online ISBN: 978-1-4614-6747-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics