Skip to main content

Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations

  • Conference paper
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 4192))

Abstract

This paper presents a case study about the applicability and usage of non-blocking collective operations. These operations provide the ability to overlap communication with computation and to avoid unnecessary synchronization. We introduce our NBC library, a portable low-overhead implementation of non-blocking collectives on top of MPI-1. We demonstrate the easy usage of the NBC library with the optimization of a conjugate gradient solver with only minor changes to the traditional parallel implementation of the program. The optimized solver runs up to 34% faster and is able to overlap most of the communication. We show that there is, due to the overlap, no performance difference between Gigabit Ethernet and InfiniBandTM for our calculation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, G., Abdelrahman, T.: Computation-communication overlap on network-of-workstation multiprocessors. In: Proc. of the Int’l Conference on Parallel and Distributed Processing Techniques and Applications, pp. 1635–1642 (1998)

    Google Scholar 

  2. Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8, 192 processors of asci q. In: Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, Phoenix, AZ, USA, CD-Rom, 15-21 November 2003, vol. 55, ACM, New York (2003)

    Google Scholar 

  3. Hoefler, T., Mehlan, T., Mietke, F., Rehm, W.: Adding Low-Cost Hardware Barrier Support to Small Commodity Clusters. In: ARCS 2006, pp. 343–350 (2006)

    Google Scholar 

  4. Liu, J., Mamidala, A., Panda, D.: Fast and scalable mpi-level broadcast using infiniband’s hardware multicast support (2003)

    Google Scholar 

  5. Gorlatch, S.: Send-receive considered harmful: Myths and realities of message passing. ACM Trans. Program. Lang. Syst. 26(1), 47–56 (2004)

    Article  Google Scholar 

  6. Hoefler, T., Squyres, J., Rehm, W., Lumsdaine, A.: A Case for non Blocking Collective Operations, submitted to ISPA - (2006), preprint available at: http://www.unixer.de/sec/nbcoll.pdf

  7. Message Passing Interface Forum: MPI-2 Journal of Development (1997)

    Google Scholar 

  8. Kanevsky, A., Skjellum, A., Rounbehler, A.: MPI/RT - an emerging standard for high-performance real-time systems. In: HICSS, (3), pp. 157–166 (1998)

    Google Scholar 

  9. Kale, L.V., Kumar, S., Vardarajan, K.: A Framework for Collective Personalized Communication. In: Proceedings of IPDPS 2003, Nice, France (2003)

    Google Scholar 

  10. MPICH2 Developers (2006), http://www-unix.mcs.anl.gov/mpi/mpich2/

  11. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary (2004)

    Google Scholar 

  12. Hoefler, T., Squyres, J.M., Bosilca, G., Fagg, G.: Non Blocking Collective Operations for MPI-2 (2006), preprint available at: http://www.unixer.de/sec/standard_nbcoll.pdf

  13. Hackbusch, W.: Iterative solultion of large sparse systems of equations. Springer, Heidelberg (1994)

    Google Scholar 

  14. Hestenes, M., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stand. 49, 409–436 (1952)

    MATH  MathSciNet  Google Scholar 

  15. Gottschling, P., Nagel, W.E.: An efficient parallel linear solver with a cascadic conjugate gradient method: Experience with reality. In: Bode, A., Ludwig, T., Karl, W.C., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, p. 784. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  16. Trottenberg, U., Oosterlee, C., SchĂĽller, A.: Multigrid. Academic Press, London (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hoefler, T., Gottschling, P., Rehm, W., Lumsdaine, A. (2006). Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2006. Lecture Notes in Computer Science, vol 4192. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846802_52

Download citation

  • DOI: https://doi.org/10.1007/11846802_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-39110-4

  • Online ISBN: 978-3-540-39112-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics