Advertisement

Numerical Algorithms

, Volume 73, Issue 3, pp 611–630 | Cite as

Updating incomplete factorization preconditioners for model order reduction

  • Hartwig AnztEmail author
  • Edmond Chow
  • Jens Saak
  • Jack Dongarra
Original Paper

Abstract

When solving a sequence of related linear systems by iterative methods, it is common to reuse the preconditioner for several systems, and then to recompute the preconditioner when the matrix has changed significantly. Rather than recomputing the preconditioner from scratch, it is potentially more efficient to update the previous preconditioner. Unfortunately, it is not always known how to update a preconditioner, for example, when the preconditioner is an incomplete factorization. A recently proposed iterative algorithm for computing incomplete factorizations, however, is able to exploit an initial guess, unlike existing algorithms for incomplete factorizations. By treating a previous factorization as an initial guess to this algorithm, an incomplete factorization may thus be updated. We use a sequence of problems from model order reduction. Experimental results using an optimized GPU implementation show that updating a previous factorization can be inexpensive and effective, making solving sequences of linear systems a potential niche problem for the iterative incomplete factorization algorithm.

Keywords

Sequence of linear systems Preconditioner update Incomplete factorization Finegrained parallelism Model order reduction GPU 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahmad, M.I., Szyld, D.B., van Gijzen, M.B.: Preconditioned multishift biCG for H2-optimal model reduction. Research Report 12-06-15. Department of Mathematics, Temple University (2012)Google Scholar
  2. 2.
    Alvarado, F.L., Schreiber, R.: Optimal parallel solution of sparse triangular systems. SIAM J. Sci. Comput. 14, 446–460 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Antoulas, A.C.: Approximation of large-scale dynamical systems. SIAM publications, philadelphia (2005)CrossRefzbMATHGoogle Scholar
  4. 4.
    Anzt, H.: Asynchronous and multiprecision linear solvers - scalable and fault-tolerant numerics for energy efficient high performance computing. Ph.D. thesis, Karlsruhe Institute of Technology Institute for Applied and Numerical Mathematics (2012)Google Scholar
  5. 5.
    Anzt, H., Chow, E., Dongarra, J.: Iterative sparse triangular solves for preconditioning. In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015: parallel processing, lecture notes in computer science, vol. 9233, pp 650–661. Springer, Berlin (2015),  10.1007/978-3-662-48096-0_50
  6. 6.
    Badía, J.M., Benner, P., Mayo, R., Quintana-Ortí, E.S.: Parallel algorithms for balanced truncation model reduction of sparse systems. In: Dongarra, J.J., Madsen, K., Wasniewski, J. (eds.) Applied parallel computing: 7th international conference, PARA 2004. no. 3732 in Lecture Notes in Comput. Sci., pp 267–275. Springer, Berlin (2006)Google Scholar
  7. 7.
    Badía, J.M., Benner, P., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G., Remón, A.: Balanced truncation model reduction of large and sparse generalized linear systems. Technical report Chemnitz Scientific Computing Preprints 06-04. Fakultät für Mathematik, TU Chemnitz (2006)Google Scholar
  8. 8.
    Badía, J.M., Benner, P., Mayo, R., Quintana-Ortí, E.S., Quintana-Ortí, G., Saak, J.: Parallel order reduction via balanced truncation for optimal cooling of steel profiles. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005 parallel processing, Lecture Notes in Comput. Sci., vol. 3648, pp 857–866. Springer, Berlin (2005),  10.1007/11549468∖_93
  9. 9.
    Baumann, M., van Gijzen, M.B.: Nested Krylov methods for shifted linear systems. Report 14-01, Delft Institute of Applied Mathematics, Delft University of Technology (2014)Google Scholar
  10. 10.
    Beattie, C.A., Gugercin, S., Wyatt, S.: Inexact solves in interpolatory model reduction. Linear Algebra Appl. 436(8), 2916–2943 (2012). doi: 10.1016/j.laa.2011.07.015 MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Bellavia, S., De Simone, V., di Serafino, D., Morini, B.: Efficient preconditioner updates for shifted linear systems. SIAM J. Sci. Comput. 33(4), 1785–1809 (2011). doi: 10.1137/100803419 MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Benner, P., Ezzatti, P., Kressner, D., Quintana-Ortí, E.S., Remón, A.: Accelerating model reduction of large linear systems with graphics processors. In: Jónasson, K. (ed.) Applied Parallel and Scientific Computing, Lecture Notes in Comput. Sci., vol. 7134, pp 88–97. Springer, Berlin (2012),  10.1007/978-3-642-28145-7∖_9
  13. 13.
    Benner, P., Ezzatti, P., Quintana-Ortí, E.S., Remón, A.: Using hybrid CPU-GPU platforms to accelerate the computation of the matrix sign function. In: Lin, H.X., Alexander, M., Forsell, M., Knüpfer, A., Prodan, R., Sousa, L., Streit, A. (eds.) Euro-Par 2009 – parallel processing workshops, lecture notes in Comput. Sci., vol. 6043, pp 132–139. Springer, Berlin (2010),  10.1007/978-3-642-14122-5_17
  14. 14.
    Benner, P., Li, J.R., Penzl, T.: Numerical solution of large Lyapunov equations, Riccati equations, and linear-quadratic control problems. Numer. Lin. Alg. Appl. 15(9), 755–777 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Benner, P., Saak, J.: A semi-discretized heat transfer model for optimal cooling of steel profiles. In: Benner, P., Mehrmann, V., Sorensen, D. (eds.) Dimension reduction of large-scale systems, lect. Notes Comput. Sci. Eng., vol. 45, pp 353–356. Springer, Berlin (2005)Google Scholar
  16. 16.
    Benzi, M., Joubert, W., Mateescu, G.: Numerical experiments with parallel orderings for ILU preconditioners. Electron. Trans. Numer. Anal. 8, 88–114 (1999)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Bergman, K., et al.: Exascale computing study: technology challenges in achieving exascale systems. DARPA IPTO ExaScale Computing Study (2008)Google Scholar
  18. 18.
    Bertaccini, D.: Efficient preconditioning for sequences of parametric complex symmetric linear systems. Electron. Trans. Numer. Anal. 18, 49–64 (2004)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Calgaro, C., Chehab, J.P., Saad, Y.: Incremental incomplete LU factorizations with applications. Numerical Linear Algebra with Applications 17(5), 811–837 (2010). doi: 10.1002/nla.756 MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Chow, E., Anzt, H., Dongarra, J.: Asynchronous iterative algorithm for computing incomplete factorizations on GPUs. In: Lecture Notes in Comput. Sci., vol. 9137, pp 1–16 (2015)Google Scholar
  21. 21.
    Chow, E., Patel, A.: Fine-grained parallel incomplete LU factorization. SIAM J. Sci. Comput. 37, C169–C193 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Doi, S.: On parallelism and convergence of incomplete LU factorizations. Appl. Numer. Math. 7(5), 417–436 (1991). doi: 10.1016/0168-9274(91)90011-N MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Druskin, V., Knizhnerman, L., Simoncini, V.: Analysis of the rational Krylov subspace and ADI methods for solving the Lyapunov equation. SIAM J. Numer. Anal. 49, 1875–1898 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Duff, I.S., Meurant, G.A.: The effect of ordering on preconditioned conjugate gradients. BIT 29(4), 635–657 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Duintjer Tebbens, J., Tüma, M.: Efficient preconditioning of sequences of nonsymmetric linear systems. SIAM J. Sci. Comput. 29(5), 1918–1941 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Frommer, A., Szyld, D.B.: On asynchronous iterations. J. Comput. Appl. Math. 123, 201–216 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Gugercin, S., Antoulas, A.C., Beattie, C.: \(\mathcal {H}_2\) model reduction for large-scale dynamical systems. SIAM J. Matrix Anal. Appl. 30 (2), 609–638 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Innovative Computing Lab: Software distribution of MAGMA version 1.6. http://icl.cs.utk.edu/magma/ (2014)
  29. 29.
    Köhler, M., Saak, J.: Efficiency improving implementation techniques for large scale matrix equation solvers. Chemnitz scientific computing prep. CSC 09-10 TU chemnitz (2009)Google Scholar
  30. 30.
    Köhler, M., Saak, J.: A shared memory parallel implementation of the IRKA algorithm for \(\mathcal {H}_{2}\) model order reduction. In: Manninen, P., Öster, P. (eds.) Applied Parallel and Scientific Computing, Lecture Notes in Comput. Sci., vol. 7782, pp 541–544. Springer, Berlin (2013),  10.1007/978-3-642-36803-5_42
  31. 31.
    Laub, A.J., Heath, M.T., Paige, C.C., Ward, R.C.: Computation of system balancing transformations and other applications of simultaneous diagonalization algorithms. IEEE Trans. Automat. Control 32(2), 115–122 (1987)CrossRefzbMATHGoogle Scholar
  32. 32.
    Logg, A., Mardal, K.A., Wells, G.N., et al.: Automated solution of differential equations by the finite element method. Springer (2012)Google Scholar
  33. 33.
    Lukarski, D.: Parallel sparse linear algebra for multi-core and many-core platforms - parallel solvers and preconditioners. Ph.D. thesis, Karlsruhe Institute of Technology (KIT), Germany (2012)Google Scholar
  34. 34.
    Naumov, M.: Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU. Tech. Rep. NVR-2011-001 NVIDIA (2011)Google Scholar
  35. 35.
    Corporation, N V I D I A: NVIDIA CUDA Compute Unified Device Architecture Programming Guide 2.3.1 Edn (2009)Google Scholar
  36. 36.
    NVIDIA Corporation: NVIDIA CUDA TOOLKIT V6.5 (2014)Google Scholar
  37. 37.
    NVIDIA Corporation: CUBLAS library user guide Du-06702-001_v6.5 edn (2014)Google Scholar
  38. 38.
    NVIDIA Corporation: CUSPARSE LIBRARY (2014)Google Scholar
  39. 39.
    Poole, E.L., Ortega, J.M.: Multicolor ICCG methods for vector computers. SIAM J. Numer. Anal. 24, 1394–1417 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Pothen, A., Alvarado, F.: A fast reordering algorithm for parallel sparse triangular solution. SIAM J. Sci. Stat. Comput. 13(2), 645–653 (1992). doi: 10.1137/0913036 MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Saad, Y.: Iterative methods for sparse linear systems. Society for Industrial and Applied Mathematics, Philadelphia (2003)CrossRefzbMATHGoogle Scholar
  42. 42.
    Tombs, M.S., Postlethwaite, I.: Truncated balanced realization of a stable nonminimal state-space system. Int. J. Control 46(4), 1319–1330 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Wachspress, E.L.: The ADI model problem. Springer, New York (2013)CrossRefzbMATHGoogle Scholar
  44. 44.
    Wolf, M., Heroux, M., Boman, E.: Factors impacting performance of multithreaded sparse triangular solve. In: Palma, J., Daydé, M., Marques, O., Lopes, J. (eds.) High Performance Computing for Computational Science – VECPAR 2010, Lecture Notes in Comput. Sci., vol. 6449, pp 32–44. Springer, Berlin (2011),  10.1007/978-3-642-19328-6_6

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Hartwig Anzt
    • 1
    Email author
  • Edmond Chow
    • 2
  • Jens Saak
    • 3
  • Jack Dongarra
    • 1
    • 4
    • 5
  1. 1.Innovative Computing LabUniversity of TennesseeKnoxvilleUSA
  2. 2.School of Computational Science and EngineeringGeorgia Institute of TechnologyAtlantaUSA
  3. 3.Max Planck Institute for Dynamics of Complex Technical SystemsMagdeburgGermany
  4. 4.Oak Ridge National LaboratoryOak RidgeUSA
  5. 5.University of ManchesterManchesterUK

Personalised recommendations