NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch

Valero-Lara, Pedro; Martínez-Pérez, Ivan; Sirvent, Raül; Martorell, Xavier; Peña, Antonio J.

doi:10.1007/978-3-319-78024-5_22

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch

Pedro Valero-Lara ORCID: orcid.org/0000-0002-1479-4310¹⁷,
Ivan Martínez-Pérez¹⁷,
Raül Sirvent¹⁷,
Xavier Martorell^17,18 &
…
Antonio J. Peña¹⁷

Conference paper
First Online: 23 March 2018

1587 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Abstract

The solving of tridiagonal systems is one of the most computationally expensive parts in many applications, so that multiple studies have explored the use of NVIDIA GPUs to accelerate such computation. However, these studies have mainly focused on using parallel algorithms to compute such systems, which can efficiently exploit the shared memory and are able to saturate the GPUs capacity with a low number of systems, presenting a poor scalability when dealing with a relatively high number of systems. We propose a new implementation (cuThomasBatch) based on the Thomas algorithm. To achieve a good scalability using this approach is necessary to carry out a transformation in the way that the inputs are stored in memory to exploit coalescence (contiguous threads access to contiguous memory locations). The results given in this study proves that the implementation carried out in this work is able to beat the reference code when dealing with a relatively large number of Tridiagonal systems (2,000–256,000), being closed to \(3{\times }\) (in double precision) and \(4{\times }\) (in single precision) faster using one Kepler NVIDIA GPU.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
BSC-GitLab, https://pm.bsc.es/gitlab/run-math/cuThomasBatch.

References

Davidson, A., Zhang, Y., Owens, J.D.: An auto-tuned method for solving large tridiagonal systems on the GPU. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, May 2011
Google Scholar
Dongarra, J.J., Hammarling, S., Higham, N.J., Relton, S.D., Valero-Lara, P., Zounon, M.: The design and performance of batched BLAS on modern high-performance computing systems. In: International Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland, pp. 495–504 (2017)
Google Scholar
Greenbaum, A.: Iterative methods for solving linear systems. Society for Industrial and Applied Mathematics (1997). https://doi.org/10.1137/1.9781611970937
George, R.: Evaluation of vertical coordinate and vertical mixing algorithms in the Hybrid-Coordinate Ocean Model (HYCOM). Ocean Model. 7(34), 285–322 (2004)
Google Scholar
Ho, C.T., Johnsson, S.L.: Optimizing tridiagonal solvers for alternating direction methods on Boolean cube multiprocessors. SIAM J. Sci. Stat. Comput. 11(3), 563–592 (1990)
Article MathSciNet MATH Google Scholar
Kim, H.-S., Wu, S., Chang, L., Hwu, W.W.: A scalable tridiagonal solver for GPUs. In: Proceedings of the 2013 42nd International Conference on Parallel Processing, pp. 444–453 (2011)
Google Scholar
NVIDIA. cuSPARSE. CUDA Toolkit Documentation (2018)
Google Scholar
Sakharnykh, N.: Efficient tridiagonal solvers for ADI methods and fluid simulation. In: Proceedings of the NVIDIA GPU Technology Conference, September 2010
Google Scholar
de Boor, C., Conte, S.D.: Elementary Numerical Analysis, vol. 1. McGraw-Hill, New York (1976)
MATH Google Scholar
Valero-Lara, P., Martínez-Perez, I., Peña, A.J., Martorell, X., Sirvent, R., Labarta, J.: cuHinesBatch: solving multiple Hines systems on GPUs human brain \({\text{project}}^{\text{* }}\). In: International Conference on Computational Science, ICCS 2017, 12–14 June 2017, Zurich, Switzerland, pp. 566–575 (2017)
Google Scholar
Valero-Lara, P., Nookala, P., Pelayo, F.L., Jansson, J., Dimitropoulos, S., Raicu, I.: Many-task computing on many-core architectures. Scalable Comput.: Pract. Exp. 17(1), 32–46 (2016)
Google Scholar
Valero-Lara, P., Pinelli, A., Favier, J., Matias, M.P.: Block tridiagonal solvers on heterogeneous architectures. In: Proceedings of the IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, ISPA 2012, pp. 609–616 (2012)
Google Scholar
Valero-Lara, P., Pinelli, A., Prieto-Matias, M.: Fast finite difference poisson solvers on heterogeneous architectures. Comput. Phys. Commun. 185(4), 1265–1272 (2014)
Article MathSciNet MATH Google Scholar
Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. SIGPLAN Not. 45(5), 127–136 (2010)
Article Google Scholar

Download references

Acknowledgements

This project was funded from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 720270 (HBP SGA1), from the Spanish Ministry of Economy and Competitiveness under the project Computación de Altas Prestaciones VII (TIN2015-65316-P) and the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d’Execució Paral\(\cdot \)lels (2014-SGR-1051). We thank the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence and the valuable feedback provided by Lung Sheng Chien (software engineer at NVIDIA) and Alex Fit-Florea (Leading algorithms groups at NVIDIA). Antonio J. Peña is cofinanced by the Spanish Ministry of Economy and Competitiveness under Juan de la Cierva fellowship number IJCI-2015-23266.

Author information

Authors and Affiliations

Barcelona Supercomuting Center (BSC), Barcelona, Spain
Pedro Valero-Lara, Ivan Martínez-Pérez, Raül Sirvent, Xavier Martorell & Antonio J. Peña
Universitat Politècnica de Catalunya, Barcelona, Spain
Xavier Martorell

Authors

Pedro Valero-Lara
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Martínez-Pérez
View author publications
You can also search for this author in PubMed Google Scholar
Raül Sirvent
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Martorell
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Peña
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Valero-Lara .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Valero-Lara, P., Martínez-Pérez, I., Sirvent, R., Martorell, X., Peña, A.J. (2018). NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-78024-5_22
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics