Multifrontal QR Factorization for Multicore Architectures over Runtime Systems
To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. This paper evaluates the usability of runtime systems for complex applications, namely, sparse matrix multifrontal factorizations which constitute extremely irregular workloads, with tasks of different granularities and characteristics and with a variable memory consumption. Experimental results on real-life matrices show that it is possible to achieve the same efficiency as with an ad hoc scheduler which relies on the knowledge of the algorithm. A detailed analysis shows the performance behavior of the resulting code and possible ways of improving the effectiveness of runtime systems.
Keywordssparse matrices multifrontal method QR factorization runtime systems heterogeneous architectures
Unable to display preview. Download preview PDF.
- 1.Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series 180(1), 012037 (2009)Google Scholar
- 2.Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann (2002)Google Scholar
- 8.Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Luszczek, P., Dongarra, J.: Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach. Scalable Computing and Communications: Theory and Practice (2013)Google Scholar
- 9.Buttari, A.: Fine-grained multithreading for the multifrontal QR factorization of sparse matrices. To appear on the SIAM Journal on Scientific Computing (2013)Google Scholar
- 11.Davis, T.A.: Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1), 8:1–8:22 (2011)Google Scholar
- 15.Hogg, J., Reid, J.K., Scott, J.A.: A DAG-based sparse Cholesky solver for multicore architectures. Technical Report RAL-TR-2009-004, RAL (2009)Google Scholar
- 16.Lacoste, X., Ramet, P., Faverge, M., Yamazaki, I., Dongarra, J.: Sparse direct solvers with accelerators over DAG runtimes. Research report RR-7972, INRIA (2012)Google Scholar
- 17.Quintana-Ortí, G., Quintana-Ortí, E.S., van de Geijn, R.A., Van Zee, F.G., Chan, E.: Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans. Math. Softw. 36(3) (2009)Google Scholar