The Journal of Supercomputing

, Volume 73, Issue 5, pp 1852–1904 | Cite as

On the improvement of a scalable sparse direct solver for unsymmetrical linear equations

  • M. Serdar Celebi
  • Ahmet Duran
  • Figen Oztoprak
  • Mehmet Tuncel
  • Bora Akaydin


This paper focuses on the application level improvements in a sparse direct solver specifically used for large-scale unsymmetrical linear equations resulting from unstructured mesh discretization of coupled elliptic/hyperbolic PDEs. Existing sparse direct solvers are designed for distributed server systems taking advantage of both distributed memory and processing units. We conducted extensive numerical experiments with three state-of-the-art direct linear solvers that can work on distributed-memory parallel architectures; namely, MUMPS (MUMPS solver website,, WSMP (Technical Report TR RC-21886, IBM, Watson Research Center, Yorktown Heights, 2000), and SUPERLU_DIST (ACM Trans Math Softw 29(2):110–140, 2003). The performance of these solvers was analyzed in detail, using advanced analysis tools such as Tuning and Analysis Utilities (TAU) and Performance Application Programming Interface (PAPI). The performance is evaluated with respect to robustness, speed, scalability, and efficiency in CPU and memory usage. We have determined application level issues that we believe they can improve the performance of a distributed-shared memory hybrid variant of this solver, which is proposed as an alternative solver [SuperLU_MCDT (Many-Core Distributed)] in this paper. The new solver utilizing the MPI/OpenMP hybrid programming is specifically tuned to handle large unsymmetrical systems arising in reservoir simulations so that higher performance and better scalability can be achieved for a large distributed computing system with many nodes of multicore processors. Two main tasks are accomplished during this study: (i) comparisons of public domain solver algorithms; existing state-of-the-art direct sparse linear system solvers are investigated and their performance and weaknesses based on test cases are analyzed, (ii) improvement of direct sparse solver algorithm (SuperLU_MCDT) for many-core distributed systems is achieved. We provided results of numerical tests that were run on up to 16,384 cores, and used many sets of test matrices for reservoir simulations with unstructured meshes. The numerical results showed that SuperLU_MCDT can outperform SuperLU_DIST 3.3 in terms of both speed and robustness.


Parallel linear direct solver Sparse direct solver Many-core distributed solver Reservoir simulations Large scale simulations Symbolic factorization Numerical factorization Scalability Linear equations SuperLU 



Authors acknowledge for the computing resources allocated by PRACE Research Infrastructures ‘Hydra’ at RZG (Rechenzentrum Garching) Germany, and ‘Karadeniz’ at UHeM (National High Performance Computing Center of Turkey) under Grant Agreement RI-28349. The authors thank to Aramco Overseas Company B.V. for their financial support under contract number 6600028651. The authors appreciate the helpful comments and suggestions by the Editor-in-Chief of The Journal of Supercomputing, Prof. Hamid R. Arabnia, and anonymous referees.


  1. 1.
    Gupta A, Karypis G, Kumar V (1997) Highly scalable parallel algorithms for sparse matrix factorization. IEEE Trans Parallel Distrib Syst 8(5):502–520CrossRefGoogle Scholar
  2. 2.
    Gupta A, Koric S, George T (2009) SC09 Proceedings., Sparse matrix factorization on massively parallel computersACM Portland, GreshamGoogle Scholar
  3. 3.
    Demmel JW, Gilbert JR, Li XS (1999) An asynchronous parallel supernodal algorithm for sparse Gaussian elimination. SIAM J Matrix Anal Appl 20(4):915–952MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Amestoy PR, Duff IS, Koster J, LExcellent JY (2001) A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J Matrix Anal Appl 23(1):15–41MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Amestoy PR, Duff IS, LExcellent JY (2000) Multifrontal parallel distributed symmetric and unsymmetric solvers. Comput Methods Appl Mech Eng 184:501–520CrossRefMATHGoogle Scholar
  6. 6.
    Li XS, Demmel JW (2003) SuperLU_DIST: a scalable distributed-memory sparse direct solver for unsymmetric linear systems. ACM Trans Math Softw 29(2):110–140CrossRefMATHGoogle Scholar
  7. 7.
    Davis Timothy A (2006) Direct methods for sparse linear systems (fundamentals of algorithms 2). SIAM, PhiladelphiaCrossRefMATHGoogle Scholar
  8. 8.
    Duran A, Celebi MS, Tuncel M (2012) Scalability of SuperLU solvers for large scale complex reservoir simulations. In: Joint Society of Petroleum Engineers (SPE) and Society of Industrial and Applied Mathematics (SIAM) Conference on Mathematical Methods in Fluid Dynamics and Simulation of Giant Oil and Gas Reservoirs, Istanbul, 3–5 September 2012Google Scholar
  9. 9.
    Demmel JW, Eisenstat SC, Gilbert JR, Li XS, Liu Joseph W H (1999) A supernodal approach to sparse partial pivoting. SIAM J Matrix Anal Appl 20:720–755MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Demmel JW, Gilbert JR, Xiaoye SL (1999) An asynchronous parallel supernodal algorithm for sparse Gaussian elimination. SIAM J Matrix Anal Appl 20(4):915–952MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Amestoy PR, Davis TA, Duff IS (1996) An approximate minimum degree ordering algorithm. SIAM J Matrix Anal Appl 17:886–905MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Grigori L, Demmel JW, Li XS (2007) Parallel symbolic factorization for sparseLU with static pivoting. SIAM J Sci Comput 29(3)Google Scholar
  13. 13.
    Sao P, Vuduc R, Li X (2014) A distributed CPU–GPU sparse direct solver. In: Europar2014, Parallel Processing, Porto, 25–29 August 2014Google Scholar
  14. 14.
    Gupta A (2000) WSMP: Watson sparse matrix package. Part I—Direct solution of symmetric sparse systems version 1.0.0. In: Technical Report TR RC-21886, IBM, Watson Research Center, Yorktown HeightsGoogle Scholar
  15. 15.
    Gupta A (2000) WSMP: Watson sparse matrix package. Part II—Direct solution of general sparse systems version 1.0.0. In: Technical Report TR RC-21888, IBM, Watson Research Center, Yorktown HeightsGoogle Scholar
  16. 16.
    MUMPS solver website.
  17. 17.
    Xiaoye SL (2005) An overview of SuperLU: algorithms, implementation, and user interface. ACM Trans Math Softw 31(3):302–325MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Steward GW (1973) Introduction to matrix computations. Academic Press, New YorkGoogle Scholar
  19. 19.
    Davis TA University of Florida sparse matrix collection.
  20. 20.
    Celebi MS, Duran A, Tuncel M, Akaydin B (2012) Scalable and improved SuperLU on GPU for heterogeneous systems. In: PRACE PN:283493, PRACE-2IP White Paper, Libraries, WP 44, 13 July 2012Google Scholar
  21. 21.
    Bhandarkar SM, Arabnia Hamid R, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recognit Artif Intell (IJPRAI) (special issue on VLSI Algorithms and Architectures for Computer Vision, Image Processing, Pattern Recognition and AI) 9(2):201–229Google Scholar
  22. 22.
    Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput (Springer) 10(3):243–270CrossRefMATHGoogle Scholar
  23. 23.
    Arabnia HR, Taha TR (1998) A parallel numerical algorithm on a reconfigurable multi-ring network. Telecommun Syst 10(1–2):185–202CrossRefGoogle Scholar
  24. 24.
    Duran A, Celebi MS, Piskin S, Tuncel M (2015) Scalibility of OpenFOAM for bio-medical flow simulations. J Supercomput 71(3):938–951CrossRefGoogle Scholar
  25. 25.
    OpenFOAM main site.
  26. 26.
    Duran A, Celebi MS, Tuncel M, Akaydin B (2012) Design and implementation of new hybrid algorithm and solver on CPU for large sparse linear systems. In: PRACE PN:283493, PRACE-2IP White Paper, Libraries, WP 43, 13 July 2012Google Scholar
  27. 27.
    Duran A, Celebi MS, Tuncel M, Oztoprak F (2013) Structural analysis of large sparse matrices for scalable direct solvers. In: PRACE PN:283493, PRACE-2IP White Paper, Scalable Algorithms, WP 82, 20 August 2013Google Scholar
  28. 28.
    Celebi MS, Duran A, Tuncel M, Akaydin B, Oztoprak F (2013) Performance analysis of BLAS libraries in SuperLU_DIST for SuperLU_MCDT development. In: PRACE PN: 283493, PRACE-2IP White Paper, Libraries, WP 83, 20 August 2013Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • M. Serdar Celebi
    • 1
  • Ahmet Duran
    • 2
  • Figen Oztoprak
    • 3
  • Mehmet Tuncel
    • 1
    • 2
  • Bora Akaydin
    • 1
  1. 1.Informatics InstituteIstanbul Technical UniversityIstanbulTurkey
  2. 2.Department of MathematicsIstanbul Technical UniversityIstanbulTurkey
  3. 3.Department of Industrial EngineeringBilgi UniversityIstanbulTurkey

Personalised recommendations