Skip to main content
Log in

Parallel prefix operations on GPU: tridiagonal system solvers and scan operators

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern GPUs can achieve high computing power at low cost, but still requires much time and effort. Tridiagonal system and scan solvers are one example of widely used algorithms which can take advantage of these devices. In this article, one tridiagonal system solver and two scan primitive operators are implemented on CUDA GPUs. To do so, a tuning strategy based on three phases is developed. Additionally, a performance analysis is performed for two different CUDA GPU architectures, resulting in a huge improvement with respect to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. BPLG Library is available at http://bplg.des.udc.es/BPLib.zip.

References

  1. Davidson A, Zhang Y, Owens JD (2011) An auto-tuned method for solving large tridiagonal systems on the GPU. In: Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS’11), pp 956–965

  2. Brent RP, Kung H (1982) A regular layout for parallel adders. IEEE Trans Comput 31(3):260–264. https://doi.org/10.1109/TC.1982.1675982

    Article  MathSciNet  MATH  Google Scholar 

  3. Chang LW, Stratton JA, Kim HS, Hwu WMW (2012) A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12), pp 27:1–27:11

  4. Davidson A, Owens JD (2011) Register packing for cyclic reduction. In: Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units GPGPU-4, pp 4:1–4:6

  5. Diéguez AP, Amor M, Doallo R (2015) New tridiagonal systems solvers on GPU architectures. In: Proceedings of IEEE International Conference on High Performance Computing (HiPC’15), pp 85–93

  6. Diéguez AP, Amor M, Doallo R (2018) A tuning strategy for tridiagonal system solvers on GPU. In: Proceedings of the 18th International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE’18

  7. Diéguez AP, Amor M, Lobeiras J, Doallo R (2018) Solving large problem sizes of index-digit algorithms on GPU: FFT and tridiagonal system solvers. IEEE Trans Comput 67(1):86–101. https://doi.org/10.1109/TC.2017.2723879

    Article  MathSciNet  MATH  Google Scholar 

  8. Dotsenko Y, Govindaraju NK, Sloan PP, Boyd C, Manferdelli J (2008) Fast scan algorithms on graphics processors. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp 205–213

  9. Harris M, Sengupta S, Owens JD (2007) Parallel prefix sum (scan) with CUDA. In: GPU Gems 3. Addison Wesley

  10. Hockney R, Jesshope C (1988) Parallel computers 2: architecture, programming and algorithms. Taylor & Francis, Milton Park

    MATH  Google Scholar 

  11. Hockney RW (1965) A fast direct solution of Poisson’s equation using Fourier analysis. J ACM 12(1):95–113

    Article  MathSciNet  MATH  Google Scholar 

  12. Kim H, Wu S, Chang L, Hwu WW (2011) A scalable tridiagonal solver for GPUs. In: Proceedings of the International Conference on Parallel Processing (ICPP’11), pp 444–453. https://doi.org/10.1109/ICPP.2011.41

  13. Kogge PM, Stone HS (1973) A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans Comput 22(8):786–793

    Article  MathSciNet  MATH  Google Scholar 

  14. Ladner RE, Fischer MJ (1980) Parallel prefix computation. J ACM 27(4):831–838. https://doi.org/10.1145/322217.322232

    Article  MathSciNet  MATH  Google Scholar 

  15. László E, Giles M, Appleyard J (2016) Manycore algorithms for batch scalar and block tridiagonal solvers. ACM Trans Math Softw 42(4):31:1–31:36

    Article  MathSciNet  MATH  Google Scholar 

  16. Lobeiras J, Amor M, Doallo R (2015) BPLG: a tuned butterfly processing library for GPU architectures. Int J Parallel Program 43(6):1078–1102

    Article  Google Scholar 

  17. Lobeiras J, Amor M, Doallo R (2016) Designing efficient index-digit algorithms for CUDA GPU architectures. IEEE Trans Parallel Distrib Syst 27(5):1331–1343

    Article  Google Scholar 

  18. NVIDIA-Corporation (2012) CUDA CUSPARSE library

  19. NVIDIA-Corporation (2013) Modern GPU library. https://github.com/NVlabs/moderngpu. Accessed 01 Nov 2018

  20. NVIDIA-Corporation (2014) CUDPP: CUDA data parallel primitives library. http://cudpp.github.io/. Accessed 01 Nov 2018

  21. NVIDIA-Corporation (2015a) CUB library. http://nvlabs.github.io/cub/. Accessed 01 Nov 2018

  22. NVIDIA-Corporation (2015b) Thrust library. https://github.com/thrust/thrust. Accessed 01 Nov 2018

  23. Sengupta S, Harris M, Garland M (2008) Efficient parallel scan algorithms for GPUs. Technical report

  24. Sengupta S, Lefohn AE, Owens JD (2006) A work-efficient step-efficient prefix sum algorithm. Workshop on edge computing using new commodity architectures

  25. Yan S, Long G, Zhang Y (2013) Streamscan: fast scan algorithms for gpus without global barrier synchronization. SIGPLAN Not 48(8):229–238

    Article  Google Scholar 

  26. Yang W, Li K, Li K (2017) A parallel solving method for block-tridiagonal equations on CPU–GPU heterogeneous computing systems. J Supercomput 73(5):1760–1781

    Article  Google Scholar 

  27. Zhang Y, Cohen J, Owens JD (2010) Fast tridiagonal solvers on the GPU. In: Proceeding of the 15th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp 127–136

Download references

Acknowledgements

This work is supported by the Ministry of Economy and Competitiveness of Spain, TIN2016-75845-P (AEI/FEDER, UE), by the Galician Government and FEDER funds under the Consolidation Program of Competitive Reference Groups (GRC2013-055) as well as under the Consolidation Programme of Competitive Research Units [Ref. R2014/049 and Ref. R2016/037]; and by the FPU Program of the Ministry of Education of Spain (FPU14/02801).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrián P. Diéguez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diéguez, A.P., Amor, M. & Doallo, R. Parallel prefix operations on GPU: tridiagonal system solvers and scan operators. J Supercomput 75, 1510–1523 (2019). https://doi.org/10.1007/s11227-018-2676-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-018-2676-z

Keywords

Navigation