Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Parallel Processing

Euro-Par 2011: Euro-Par 2011: Parallel Processing Workshops pp 377–386Cite as

  1. Home
  2. Euro-Par 2011: Parallel Processing Workshops
  3. Conference paper
High-Performance Matrix-Vector Multiplication on the GPU

High-Performance Matrix-Vector Multiplication on the GPU

  • Hans Henrik Brandenborg Sørensen30 
  • Conference paper
  • 1542 Accesses

  • 9 Citations

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7155)

Abstract

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

Keywords

  • GPU
  • Matrix-Vector Multiplication
  • Dense linear algebra

Download conference paper PDF

References

  1. NVIDIA Corp.: CUDA C Programming Guide Version 4.0 (2011)

    Google Scholar 

  2. NVIDIA Corp.: CUDA CUBLAS Library (2011)

    Google Scholar 

  3. Tomov, S., Nath, R., Du, P., Dongarra, J.: MAGMA v0.2 Users’ Guide (2009)

    Google Scholar 

  4. Sørensen, H.H.B.: Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs (2011) (submitted)

    Google Scholar 

  5. Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing (2008)

    Google Scholar 

  6. Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Computing 36(12) (2010)

    Google Scholar 

  7. Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.: LAPACK Users’ guide, 3rd edn. SIAM, Philadelphia (1999)

    CrossRef  Google Scholar 

  8. Nath, R., Tomov, S., Dongarra, J.: Accelerating GPU kernels for dense linear algebra (2009)

    Google Scholar 

  9. Li, Y., Dongarra, J., Tomov, S.: A Note on Auto-tuning GEMM for GPUs (2009)

    Google Scholar 

  10. NVIDIA Corp.: Fermi, Whitepaper (2009)

    Google Scholar 

  11. Harris, M.: Optimizing Parallel Reduction in CUDA. NVIDIA Dev. Tech. (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Informatics and Mathematical Modelling, Technical University of Denmark, Bldg. 321, DK-2800, Lyngby, Denmark

    Hans Henrik Brandenborg Sørensen

Authors
  1. Hans Henrik Brandenborg Sørensen
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Scilytics, Koellnerhofgasse 3/15A, 1010, Vienna, Austria

    Michael Alexander

  2. ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy

    Pasqua D’Ambra

  3. University of Amsterdam, 1090, Amsterdam, Netherlands

    Adam Belloum

  4. Innovative Computing Laboratory, The University of Tennessee, USA

    George Bosilca

  5. Department of Experimental Medicine and Clinic, University Magna Græcia, 88100, Catanzaro, Italy

    Mario Cannataro

  6. Computer Science Department, University of Pisa, Italy

    Marco Danelutto

  7. Second University of Naples, Italy

    Beniamino Di Martino

  8. TU München, Boltzmannstr. 3, 85748, Garching, Germany

    Michael Gerndt

  9. Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Emmanuel Jeannot & Raymond Namyst & 

  10. Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Jean Roman

  11. Oak Ridge National Laboratory, Computer Science and Mathematics Division, 37831-6164, Oak Ridge, TN, USA

    Stephen L. Scott

  12. Department of Scientific Computing, University of Vienna, Nordbergstr. 15/3C, 1090, Vienna, Austrial

    Jesper Larsson Traff

  13. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA

    Geoffroy Vallée

  14. Technische Universität München, Germany

    Josef Weidendorfer

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sørensen, H.H.B. (2012). High-Performance Matrix-Vector Multiplication on the GPU. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29737-3_42

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-29737-3_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29736-6

  • Online ISBN: 978-3-642-29737-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature