Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver

  • Emmanuel Agullo
  • George Bosilca
  • Alfredo Buttari
  • Abdou Guermouche
  • Florent Lopez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10104)


The advent of multicore processors requires to reconsider the design of high performance computing libraries to embrace portable and effective techniques of parallel software engineering. One of the most promising approaches consists in abstracting an application as a directed acyclic graph (DAG) of tasks. While this approach has been popularized for shared memory environments by the OpenMP 4.0 standard where dependencies between tasks are automatically inferred, we investigate an alternative approach, capable of describing the DAG of task in a distributed setting, where task dependencies are explicitly encoded. So far this approach has been mostly used in the case of algorithms with a regular data access pattern and we show in this study that it can be efficiently applied to a higly irregular numerical algorithm such as a sparse multifrontal QR method. We present the resulting implementation and discuss the potential and limits of this approach in terms of productivity and effectiveness in comparison with more common parallelization techniques. Although at an early stage of development, preliminary results show the potential of the parallel programming model that we investigate in this work.


Multicore architectures Programming models Runtime system Parametrized task graph Numerical scientific library Sparse direct solver Multifrontal QR factorization 


  1. 1.
    Agullo, E., Augonnet, C., Dongarra, J., Ltaief, H., Namyst, R., Thibault, S., Tomov, S.: A hybridization methodology for high-performance linear algebra software for GPUs. In: GPU Computing Gems, vol. 2, pp. 473–484. Jade Edition (2011)Google Scholar
  2. 2.
    Agullo, E., Buttari, A., Guermouche, A., Lopez, F.: Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems. In: ACM Transactions on Mathematical Software (2016, to appear)Google Scholar
  3. 3.
    Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys.: Conf. Ser. 180(1), 012–037 (2009)Google Scholar
  4. 4.
    Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann, Burlington (2002)Google Scholar
  5. 5.
    Amestoy, P.R., Duff, I.S., Puglisi, C.: Multifrontal QR factorization in a multiprocessor environment. Int. J. Num. Linear Alg. Appl. 3(4), 275–300 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Ayguadé, E., Badia, R.M., Igual, F.D., Labarta, J., Mayo, R., Quintana-Ortí, E.S.: An extension of the StarSs programming model for platforms with multiple GPUs. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 851–862. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-03869-3_79 CrossRefGoogle Scholar
  7. 7.
    Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurr. Comput.: Pract. Exp. 21(18), 2438–2456 (2009)CrossRefGoogle Scholar
  8. 8.
    Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, A., Hérault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., Yarkhan, A., Dongarra, J.J.: Distibuted dense numerical linear algebra algorithms on massively parallel architectures: DPLASMA. In: Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW 2011). PDSEC 2011, Anchorage, United States, pp. 1432–1441 (2011)Google Scholar
  9. 9.
    Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Hérault, T., Dongarra, J.J.: Parsec: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)CrossRefGoogle Scholar
  10. 10.
    Bosilca, G., Bouteiller, A., Danalis, A., Hérault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput. 38(1–2), 37–51 (2012)CrossRefGoogle Scholar
  11. 11.
    Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Luszczek, P., Dongarra, J.: Dense linear algebra on distributed heterogeneous hardware with a symbolic DAG approach. In: Scalable Computing and Communications: Theory and Practice, pp. 699–733 (2013)Google Scholar
  12. 12.
    Buttari, A.: Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. SIAM J. Sci. Comput. 35(4), C323–C345 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. 35, 38–53 (2009)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Cosnard, M., Loi, M.: Automatic task graph generation techniques. In: Proceedings of the Twenty-Eighth Hawaii International Conference on System Sciences 1995, Vol. 2, pp. 113–122, January 1995Google Scholar
  15. 15.
    Davis, T.A.: Algorithm 915, SuiteSparseQR: multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1), 8:1–8:22 (2011)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Davis, T.A., Hu, Y.: The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric linear systems. ACM Trans. Math. Softw. 9, 302–325 (1983)CrossRefzbMATHGoogle Scholar
  18. 18.
    Hadri, B., Ltaief, H., Agullo, E., Dongarra, J.: Tile QR factorization with parallel panel processing for multicore architectures. In: IPDPS, pp. 1–10. IEEE (2010)Google Scholar
  19. 19.
    Igual, F.D., Chan, E., Quintana-Ortí, E.S., Quintana-Ortí, G., van de Geijn, R.A., Zee, F.G.V.: The flame approach: from dense linear algebra algorithms to high-performance multi-accelerator implementations. J. Parallel Distrib. Comput. 72(9), 1134–1143 (2012)CrossRefGoogle Scholar
  20. 20.
    Kim, K., Eijkhout, V.: A parallel sparse direct solver via hierarchical DAG scheduling. ACM Trans. Math. Softw. 41(1), 1–27 (2014)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lacoste, X.: Scheduling and memory optimizations for sparse direct solver on multi-core/multi-GPU cluster systems. PhD thesis, LaBRI, Université Bordeaux, Talence, France, February 2015Google Scholar
  22. 22.
    Schreiber, R.: A new implementation of sparse Gaussian elimination. ACM Trans. Math. Softw. 8, 256–276 (1982)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Emmanuel Agullo
    • 1
  • George Bosilca
    • 5
  • Alfredo Buttari
    • 2
  • Abdou Guermouche
    • 3
  • Florent Lopez
    • 4
  1. 1.INRIA - LaBRIBordeauxFrance
  2. 2.CNRS - IRITToulouseFrance
  3. 3.Université de Bordeaux - LaBRIBordeauxFrance
  4. 4.RAL - STFCDidcotUK
  5. 5.University of TennesseeKnoxvilleUSA

Personalised recommendations