Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments

Haidar, Azzam; Luszczek, Piotr; Tomov, Stanimire; Dongarra, Jack

doi:10.1007/978-3-319-17353-5_3

Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments

Azzam Haidar¹⁶,
Piotr Luszczek¹⁶,
Stanimire Tomov¹⁶ &
…
Jack Dongarra^16,17,18

Conference paper
First Online: 01 January 2015

736 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Abstract

We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems’ algorithms – the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs and coprocessors using a light-weight runtime system. The use of light-weight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.

This research was partially supported by the National Science Foundation under Grants OCI-1032815, ACI-1339822, and Subcontract RA241-G1 on NSF Prime Grant OCI-0910735, DOE under Grants DE-SC0004983 and DE-SC0010042, and Intel.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concur. Comput. Pract. Exp. 23(2), 187–198 (2011)
Article Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30, 207–216 (1995)
Article Google Scholar
Haidar, A., Ltaief, H., Luszczek, P., Dongarra, J.: A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, 21–25 May 2012, pp. 25–35. IEEE Computer Society (2012)
Google Scholar
Intel\(^{\textregistered }\) Xeon Phi™ coprocessor system software developers guide. http://software.intel.com/en-us/articles/
Math Kernel Library. http://software.intel.com/intel-mkl/
Jeffers, J., Reinders, J.: Intel\(^{\textregistered }\) Xeon Phi™ Coprocessor High-Performance Programming. Morgan Kaufmann Publishers, San Francisco (2013)
Google Scholar
Kurzak, J., Ltaief, H., Dongarra, J.J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concur. Comput. Pract. Exp. 21(1), 15–44 (2009)
Google Scholar
Kurzak, J., Luszczek, P., YarKhan, A., Faverge, M., Langou, J., Bouwmeester, H., Dongarra, J.: Multithreading in the PLASMA Library. In Handbook of Multi and Many-Core Processing: Architecture, Algorithms, Programming, and Applications. Computer and Information Science Series. Chapman and Hall/CRC, 26 April 2013
Google Scholar
Ltaief, H., Luszczek, P., Dongarra, J.: Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 661–670. Springer, Heidelberg (2012)
Chapter Google Scholar
Luszczek, P., Ltaief, H., Dongarra, J.: Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In: Proceedings of IPDPS 2011: IEEE International Parallel and Distributed Processing Symposium, Anchorage, Alaska, USA, 16–20 May 2011, pp. 944–955. IEEE Computer Society (2011)
Google Scholar
Pérez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the 2008 IEEE International Conference on Cluster Computing, Tsukuba, Japan, 29 September–1 October 2008, pp. 142–151. IEEE (2008)
Google Scholar
Rinard, M.C., Scales, D.J., Lam, M.S.: Jade: a high-level, machine-independent language for parallel programming. Computer 26(6), 28–38 (1993). doi:10.1109/2.214440
Article Google Scholar
Valiant, L.G.: Bulk-synchronous parallel computers. In: Reeve, M. (ed.) Parallel Processing and Artificial Intelligence, pp. 15–22. Wiley, New York (1989)
Google Scholar
Valiant, L. G.: A bridging model for parallel computation. Commun. ACM 33(8) (1990). doi:10.1145/79173.79181
YarKhan, A.: Dynamic task execution on shared and distributed memory architectures. Ph.D. thesis, University of Tennessee, December 2012
Google Scholar

Download references

Acknowledgements

This research was supported in part by the National Science Foundation under Grants OCI-1032815, ACI-1339822, and Subcontract RA241-G1 on NSF Prime Grant OCI- 0910735, DOE under Grants DE-SC0004983 and DE-SC0010042, and Intel Corporation.

Author information

Authors and Affiliations

University of Tennessee Knoxville, Knoxville, USA
Azzam Haidar, Piotr Luszczek, Stanimire Tomov & Jack Dongarra
Oak Ridge National Laboratory, Oak Ridge, USA
Jack Dongarra
University of Manchester, Manchester, M13 9PL, UK
Jack Dongarra

Authors

Azzam Haidar
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Luszczek
View author publications
You can also search for this author in PubMed Google Scholar
Stanimire Tomov
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Piotr Luszczek .

Editor information

Editors and Affiliations

IRIT, ENSEEIHT, Toulouse Cedex, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, California, USA
Osni Marques
Information Technology Center, The University of Tokyo, Tokyo, Japan
Kengo Nakajima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haidar, A., Luszczek, P., Tomov, S., Dongarra, J. (2015). Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-17353-5_3
Published: 18 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17352-8
Online ISBN: 978-3-319-17353-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics