Abstract
This paper focuses on the resolution of a large number of small random symmetric linear systems and its parallel implementation in single precision on graphics processing units (GPUs). The computations involved by each linear system are independent from the others, and the number of unknowns does not exceed 64. For this purpose, we present the adaptation to our context of largely used methods that include: LDLt factorization, Householder reduction to a tridiagonal matrix, parallel cyclic reduction (PCR) that is not a power of two and the divide and conquer algorithm for tridiagonal eigenproblems. We not only detail the implementation and optimization of each method, but we also compare the sustainability of each solution and its performance which include both parallel complexity and cache memory occupation. In the context of solving a large number of small random linear systems on GPUs with no information about their conditioning, our research indicates that the best strategy requires the use of Householder tridiagonalization + PCR followed if necessary by a divide and conquer diagonalization.
Similar content being viewed by others
References
Abbas-Turki LA, Bouselmi AI, Mikou MA (2014) Toward a coherent Monte Carlo simulation of CVA. Monte Carlo Methods Appl 20(3):195–216
Abbas-Turki LA, Mikou MA (2015) TVA on American derivatives. Preprint: https://hal.archives-ouvertes.fr/hal-01142874
Abbas-Turki LA, Vialle S, Lapeyre B, Mercier P (2014) Pricing derivatives on graphics processing units using Monte Carlo simulation. Concurr Comput Pract Exp 26(9):1679–1697
Ballard G, Demmel J, Holtz O, schwartz O (2010) Communication-optimal parallel and sequential Cholesky decomposition. SIAM J Sci Comput 32(6):3495–3523
Brigo D, Morini M, Pallavicini A (2013) Counterparty Credit Risk, Collateral and Funding: With Pricing Cases For All Asset Classes. Wiley, New York
Brigo D, Pallavicini A (2008) Counterparty risk and contingent CDS under correlation between interest-rates and default. Risk Mag (February) 84–88
Cesari G et al (2009) Modelling, pricing and hedging counterparty credit exposure, Springer Finance, New York
Cho H, Yoon PA (2014) A Memory-efficient algorithm for large-scale symmetric tridiagonal eigenvalue problem on multi-GPU systems. Int’l Conf. Par. and Dist. Proc. Tech. and Appl., pp 568–573
Clément E, Lamberton D, Protter P (2002) An analysis of a least squares regression algorithm for American option pricing. Financ Stoch 17:448–471
Crépey S, Bielecki TR (2014) Counterparty risk and funding. a tale of two puzzles. CRC Press, Boca Raton
Crépey S, Grbac Z, Ngor N, Skovmand D (2014) A Lévy HJM multiple-curve model with application to CVA computation. Quant Financ 15(3):1–19
Cuppen JJM (1981) A divide and conquer method for the symmetric tridi- agonal eigenproblem. Numer Math 36:177–195
Demmel JW (1997) Applied numerical linear algebra. SIAM, New Delhi
Demmel JW, Marques OA, Parlett BN, Vömel C (2008) Performance and accuracy of LAPACK’s symmetric tridiagonal eigensolvers. SIAM J Sci Comput 30(3):1508–1526
Fujii M, Takahashi A (2015) Perturbative expansion technique for non-linear FBSDEs with interacting particle method. Asia-Pac Financ Mark 22(3):283–304. doi:10.1007/s10690-015-9201-7
Goddeke D, Strzodka R (2010) Cyclic reduction tridiagonal solvers on GPUs applied to mixed precision multigrid. IEEE Trans Parallel Distrib Syst 22(1):22–32
Gordy MB, Juneja S (2010) Nested simulation in portfolio risk measurement. Manag Sci 56(10):1833–1848
Gragg WB, Thornton JR, Warner DD (1992) Parallel divide and conquer algorithms for the symmetric tridiagonal eigenproblem and bidiagonal singular value problem. Model Simul 23(1):49–56
Gu M, Eisenstat S (1992) A stable algorithm for the rank-1 modification of the symmetric eigenproblem. Computer Science Dept. Report YALEU/DCS/RR-916, Yale University, New Haven
Gu M, Eisenstat S (1995) A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. SIAM J Matrix Anal Appl 16:172–191
Hockney RW, Jesshope CR (1981) Parallel computers: architecture, programming and algorithms. Adam Hilger Ltd, England
Henry-Labordère P (2012) Cutting CVA’s complexity. Risk Mag (July) 2012:67–73
http://icl.cs.utk.edu/magma/. Accessed 11 July 2016
http://www.proba.jussieu.fr/~abbasturki/soft.htm or http://www-pequan.lip6.fr/~graillat/cva.tar.gz. Accessed 11 July 2016
http://www-pequan.lip6.fr/cadna/. Accessed 11 July 2016
Li R-C (1994) Solving secular equations stably and efficiently. Computer Science Dept. Technical Report CS-94-260, University of Tennessee, Knoxville, (LAPACK Working Note 89.)
Longstaff FA, Schwartz ES (2001) Valuing American options by simulation: a simple least-squares approach. Rev Financ Stud 14(1):113–147
Löwner K (1934) Über monotone matrixfunctionen. Math Z 38:177–216
Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2002) Numerical Recipes in C++: the art of scientific computing. Cambridge University Press, Cambridge
Volkov V, Demmel J (2008) LU, QR and Cholesky factorizations using vector capabilities of GPUs, Technical Report No. UCB/EECS-2008-49, University of California, Berkeley
Vömel C, Tomov S, Dongarra J (2012) Divide & conquer on hybrid GPU-accelerated multicore systems. SIAM J Sci Comput 34(2):70–82
Zhang Y, Cohen J, Owens JD (2010) Fast tridiagonal solvers on the GPU. Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp 127–136
Acknowledgments
This work was funded by project ARRAND (ANR-15-CE39-0002-01) and partially supported by the project FastRelax (ANR-14-CE25-0018-01).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abbas-Turki, L.A., Graillat, S. Resolving small random symmetric linear systems on graphics processing units. J Supercomput 73, 1360–1386 (2017). https://doi.org/10.1007/s11227-016-1813-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1813-9