Skip to main content

Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8969))

Abstract

We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems’ algorithms – the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs and coprocessors using a light-weight runtime system. The use of light-weight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.

This research was partially supported by the National Science Foundation under Grants OCI-1032815, ACI-1339822, and Subcontract RA241-G1 on NSF Prime Grant OCI-0910735, DOE under Grants DE-SC0004983 and DE-SC0010042, and Intel.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concur. Comput. Pract. Exp. 23(2), 187–198 (2011)

    Article  Google Scholar 

  2. Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30, 207–216 (1995)

    Article  Google Scholar 

  3. Haidar, A., Ltaief, H., Luszczek, P., Dongarra, J.: A comprehensive study of task coalescing for selecting parallelism granularity in a two-stage bidiagonal reduction. In: Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Shanghai, China, 21–25 May 2012, pp. 25–35. IEEE Computer Society (2012)

    Google Scholar 

  4. Intel\(^{\textregistered }\) Xeon Phi™ coprocessor system software developers guide. http://software.intel.com/en-us/articles/

  5. Math Kernel Library. http://software.intel.com/intel-mkl/

  6. Jeffers, J., Reinders, J.: Intel\(^{\textregistered }\) Xeon Phi™ Coprocessor High-Performance Programming. Morgan Kaufmann Publishers, San Francisco (2013)

    Google Scholar 

  7. Kurzak, J., Ltaief, H., Dongarra, J.J., Badia, R.M.: Scheduling dense linear algebra operations on multicore processors. Concur. Comput. Pract. Exp. 21(1), 15–44 (2009)

    Google Scholar 

  8. Kurzak, J., Luszczek, P., YarKhan, A., Faverge, M., Langou, J., Bouwmeester, H., Dongarra, J.: Multithreading in the PLASMA Library. In Handbook of Multi and Many-Core Processing: Architecture, Algorithms, Programming, and Applications. Computer and Information Science Series. Chapman and Hall/CRC, 26 April 2013

    Google Scholar 

  9. Ltaief, H., Luszczek, P., Dongarra, J.: Enhancing parallelism of tile bidiagonal transformation on multicore architectures using tree reduction. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011, Part I. LNCS, vol. 7203, pp. 661–670. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  10. Luszczek, P., Ltaief, H., Dongarra, J.: Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In: Proceedings of IPDPS 2011: IEEE International Parallel and Distributed Processing Symposium, Anchorage, Alaska, USA, 16–20 May 2011, pp. 944–955. IEEE Computer Society (2011)

    Google Scholar 

  11. Pérez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of the 2008 IEEE International Conference on Cluster Computing, Tsukuba, Japan, 29 September–1 October 2008, pp. 142–151. IEEE (2008)

    Google Scholar 

  12. Rinard, M.C., Scales, D.J., Lam, M.S.: Jade: a high-level, machine-independent language for parallel programming. Computer 26(6), 28–38 (1993). doi:10.1109/2.214440

    Article  Google Scholar 

  13. Valiant, L.G.: Bulk-synchronous parallel computers. In: Reeve, M. (ed.) Parallel Processing and Artificial Intelligence, pp. 15–22. Wiley, New York (1989)

    Google Scholar 

  14. Valiant, L. G.: A bridging model for parallel computation. Commun. ACM 33(8) (1990). doi:10.1145/79173.79181

  15. YarKhan, A.: Dynamic task execution on shared and distributed memory architectures. Ph.D. thesis, University of Tennessee, December 2012

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by the National Science Foundation under Grants OCI-1032815, ACI-1339822, and Subcontract RA241-G1 on NSF Prime Grant OCI- 0910735, DOE under Grants DE-SC0004983 and DE-SC0010042, and Intel Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Luszczek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Haidar, A., Luszczek, P., Tomov, S., Dongarra, J. (2015). Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments. In: Daydé, M., Marques, O., Nakajima, K. (eds) High Performance Computing for Computational Science -- VECPAR 2014. VECPAR 2014. Lecture Notes in Computer Science(), vol 8969. Springer, Cham. https://doi.org/10.1007/978-3-319-17353-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17353-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17352-8

  • Online ISBN: 978-3-319-17353-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics