Multifrontal QR Factorization for Multicore Architectures over Runtime Systems

  • Emmanuel Agullo
  • Alfredo Buttari
  • Abdou Guermouche
  • Florent Lopez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8097)


To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability of runtime systems for complex applications, namely, sparse matrix multifrontal factorizations which constitute extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Experimental results on real-life matrices show that it is possible to achieve the same efficiency as with an ad hoc scheduler which relies on the knowledge of the algorithm. A detailed analysis shows the performance behavior of the resulting code and possible ways of improving the effectiveness of runtime systems.


sparse matrices multifrontal method QR factorization runtime systems heterogeneous architectures 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series 180(1), 012037 (2009)Google Scholar
  2. 2.
    Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann (2002)Google Scholar
  3. 3.
    Amestoy, P.R., Duff, I.S., Puglisi, C.: Multifrontal QR factorization in a multiprocessor environment. Int. Journal of Num. Linear Alg. and Appl. 3(4), 275–300 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Arora, N.S., Blumofe, R.D., Plaxton, C.G.: Thread scheduling for multiprogrammed multiprocessors. Theory Comput. Syst. 34(2), 115–144 (2001)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation: Practice and Experience, Special Issue: Euro-Par 2009 23, 187–198 (2011)CrossRefGoogle Scholar
  6. 6.
    Badia, R.M., Herrero, J.R., Labarta, J., Pérez, J.M., Quintana-Ortí, E.S., Quintana-Ortí, G.: Parallelizing dense and banded linear algebra libraries using SMPSs. Concurrency and Computation: Practice and Experience 21(18), 2438–2456 (2009)CrossRefGoogle Scholar
  7. 7.
    Bosilca, G., Bouteiller, A., Danalis, A., Hérault, T., Lemarinier, P., Dongarra, J.: DAGuE: A generic distributed DAG engine for high performance computing. Parallel Computing 38(1-2), 37–51 (2012)CrossRefGoogle Scholar
  8. 8.
    Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Luszczek, P., Dongarra, J.: Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach. Scalable Computing and Communications: Theory and Practice (2013)Google Scholar
  9. 9.
    Buttari, A.: Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. To appear on the SIAM Journal on Scientific Computing (2013)Google Scholar
  10. 10.
    Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Par. Comp. 35(1), 38–53 (2009)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Davis, T.A.: Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1), 8:1–8:22 (2011)Google Scholar
  12. 12.
    Duff, I.S., Reid, J.K.: The multifrontal solution of indefinite sparse symmetric linear systems. ACM Transactions on Mathematical Software 9, 302–325 (1983)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Guermouche, A., L’Excellent, J.-Y., Utard, G.: Impact of reordering on the memory of a multifrontal solver. Parallel Computing 29(9), 1191–1218 (2003)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Hermann, E., Raffin, B., Faure, F., Gautier, T., Allard, J.: Multi-GPU and multi-CPU parallelization for interactive physics simulations. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010, Part II. LNCS, vol. 6272, pp. 235–246. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Hogg, J., Reid, J.K., Scott, J.A.: A DAG-based sparse Cholesky solver for multicore architectures. Technical Report RAL-TR-2009-004, RAL (2009)Google Scholar
  16. 16.
    Lacoste, X., Ramet, P., Faverge, M., Yamazaki, I., Dongarra, J.: Sparse direct solvers with accelerators over DAG runtimes. Research report RR-7972, INRIA (2012)Google Scholar
  17. 17.
    Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3) (2009)Google Scholar
  18. 18.
    Schreiber, R.: A new implementation of sparse Gaussian elimination. ACM Transactions on Mathematical Software 8, 256–276 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Emmanuel Agullo
    • 1
  • Alfredo Buttari
    • 2
  • Abdou Guermouche
    • 3
  • Florent Lopez
    • 4
  1. 1.LaBRIINRIABordeauxFrance
  2. 2.CNRSIRITToulouseFrance
  3. 3.LaBRIUniversité de BordeauxBordeauxFrance
  4. 4.IRITUniversité Paul SabatierToulouseFrance

Personalised recommendations