Parallel prefix operations on GPU: tridiagonal system solvers and scan operators

Diéguez, Adrián P.; Amor, Margarita; Doallo, Ramón

doi:10.1007/s11227-018-2676-z

Parallel prefix operations on GPU: tridiagonal system solvers and scan operators

Published: 02 November 2018

Volume 75, pages 1510–1523, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

255 Accesses
3 Citations
Explore all metrics

Abstract

Modern GPUs can achieve high computing power at low cost, but still requires much time and effort. Tridiagonal system and scan solvers are one example of widely used algorithms which can take advantage of these devices. In this article, one tridiagonal system solver and two scan primitive operators are implemented on CUDA GPUs. To do so, a tuning strategy based on three phases is developed. Additionally, a performance analysis is performed for two different CUDA GPU architectures, resulting in a huge improvement with respect to the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs

Article 16 May 2020

M. Dessole & F. Marcuzzi

Notes

BPLG Library is available at http://bplg.des.udc.es/BPLib.zip.

References

Davidson A, Zhang Y, Owens JD (2011) An auto-tuned method for solving large tridiagonal systems on the GPU. In: Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS’11), pp 956–965
Brent RP, Kung H (1982) A regular layout for parallel adders. IEEE Trans Comput 31(3):260–264. https://doi.org/10.1109/TC.1982.1675982
Article MathSciNet MATH Google Scholar
Chang LW, Stratton JA, Kim HS, Hwu WMW (2012) A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12), pp 27:1–27:11
Davidson A, Owens JD (2011) Register packing for cyclic reduction. In: Proceedings of the 4th Workshop on General Purpose Processing on Graphics Processing Units GPGPU-4, pp 4:1–4:6
Diéguez AP, Amor M, Doallo R (2015) New tridiagonal systems solvers on GPU architectures. In: Proceedings of IEEE International Conference on High Performance Computing (HiPC’15), pp 85–93
Diéguez AP, Amor M, Doallo R (2018) A tuning strategy for tridiagonal system solvers on GPU. In: Proceedings of the 18th International Conference on Computational and Mathematical Methods in Science and Engineering, CMMSE’18
Diéguez AP, Amor M, Lobeiras J, Doallo R (2018) Solving large problem sizes of index-digit algorithms on GPU: FFT and tridiagonal system solvers. IEEE Trans Comput 67(1):86–101. https://doi.org/10.1109/TC.2017.2723879
Article MathSciNet MATH Google Scholar
Dotsenko Y, Govindaraju NK, Sloan PP, Boyd C, Manferdelli J (2008) Fast scan algorithms on graphics processors. In: Proceedings of the 22nd Annual International Conference on Supercomputing, pp 205–213
Harris M, Sengupta S, Owens JD (2007) Parallel prefix sum (scan) with CUDA. In: GPU Gems 3. Addison Wesley
Hockney R, Jesshope C (1988) Parallel computers 2: architecture, programming and algorithms. Taylor & Francis, Milton Park
MATH Google Scholar
Hockney RW (1965) A fast direct solution of Poisson’s equation using Fourier analysis. J ACM 12(1):95–113
Article MathSciNet MATH Google Scholar
Kim H, Wu S, Chang L, Hwu WW (2011) A scalable tridiagonal solver for GPUs. In: Proceedings of the International Conference on Parallel Processing (ICPP’11), pp 444–453. https://doi.org/10.1109/ICPP.2011.41
Kogge PM, Stone HS (1973) A parallel algorithm for the efficient solution of a general class of recurrence equations. IEEE Trans Comput 22(8):786–793
Article MathSciNet MATH Google Scholar
Ladner RE, Fischer MJ (1980) Parallel prefix computation. J ACM 27(4):831–838. https://doi.org/10.1145/322217.322232
Article MathSciNet MATH Google Scholar
László E, Giles M, Appleyard J (2016) Manycore algorithms for batch scalar and block tridiagonal solvers. ACM Trans Math Softw 42(4):31:1–31:36
Article MathSciNet MATH Google Scholar
Lobeiras J, Amor M, Doallo R (2015) BPLG: a tuned butterfly processing library for GPU architectures. Int J Parallel Program 43(6):1078–1102
Article Google Scholar
Lobeiras J, Amor M, Doallo R (2016) Designing efficient index-digit algorithms for CUDA GPU architectures. IEEE Trans Parallel Distrib Syst 27(5):1331–1343
Article Google Scholar
NVIDIA-Corporation (2012) CUDA CUSPARSE library
NVIDIA-Corporation (2013) Modern GPU library. https://github.com/NVlabs/moderngpu. Accessed 01 Nov 2018
NVIDIA-Corporation (2014) CUDPP: CUDA data parallel primitives library. http://cudpp.github.io/. Accessed 01 Nov 2018
NVIDIA-Corporation (2015a) CUB library. http://nvlabs.github.io/cub/. Accessed 01 Nov 2018
NVIDIA-Corporation (2015b) Thrust library. https://github.com/thrust/thrust. Accessed 01 Nov 2018
Sengupta S, Harris M, Garland M (2008) Efficient parallel scan algorithms for GPUs. Technical report
Sengupta S, Lefohn AE, Owens JD (2006) A work-efficient step-efficient prefix sum algorithm. Workshop on edge computing using new commodity architectures
Yan S, Long G, Zhang Y (2013) Streamscan: fast scan algorithms for gpus without global barrier synchronization. SIGPLAN Not 48(8):229–238
Article Google Scholar
Yang W, Li K, Li K (2017) A parallel solving method for block-tridiagonal equations on CPU–GPU heterogeneous computing systems. J Supercomput 73(5):1760–1781
Article Google Scholar
Zhang Y, Cohen J, Owens JD (2010) Fast tridiagonal solvers on the GPU. In: Proceeding of the 15th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP’10), pp 127–136

Download references

Acknowledgements

This work is supported by the Ministry of Economy and Competitiveness of Spain, TIN2016-75845-P (AEI/FEDER, UE), by the Galician Government and FEDER funds under the Consolidation Program of Competitive Reference Groups (GRC2013-055) as well as under the Consolidation Programme of Competitive Research Units [Ref. R2014/049 and Ref. R2016/037]; and by the FPU Program of the Ministry of Education of Spain (FPU14/02801).

Author information

Authors and Affiliations

Grupo de Arquitectura de Computadores (GAC), Facultade de Informática, Universidade da Coruña, Campus da Coruña, 15071, A Coruña, Spain
Adrián P. Diéguez, Margarita Amor & Ramón Doallo

Authors

Adrián P. Diéguez
View author publications
You can also search for this author in PubMed Google Scholar
Margarita Amor
View author publications
You can also search for this author in PubMed Google Scholar
Ramón Doallo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrián P. Diéguez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Diéguez, A.P., Amor, M. & Doallo, R. Parallel prefix operations on GPU: tridiagonal system solvers and scan operators. J Supercomput 75, 1510–1523 (2019). https://doi.org/10.1007/s11227-018-2676-z

Download citation

Published: 02 November 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s11227-018-2676-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Parallel prefix operations on GPU: tridiagonal system solvers and scan operators

Abstract

Access this article

Similar content being viewed by others

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel prefix operations on GPU: tridiagonal system solvers and scan operators

Abstract

Access this article

Similar content being viewed by others

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU

A massively parallel algorithm for Bordered Almost Block Diagonal Systems on GPUs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation