Skip to main content

Automatic Resource-Centric Process Migration for MPI

  • Conference paper
Recent Advances in the Message Passing Interface (EuroMPI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7490))

Included in the following conference series:

Abstract

Process migration refers to the ability to move a running process from one node and make it continue on another.  The MPI standard prescribes support for process migration, but so far it was implemented mostly via checkpoint-restart. This paper presents an automatic and transparent process migration framework that can be used for MPI processes. This framework is advantageous when migration of individual processes for purposes such as load-balancing is more adequate than checkpointing the whole job.  The paper describes this framework for process migration in clusters and multi-clusters, how it was tuned for Open MPI and the performance of migrated MPI processes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. The Message Passing Interface (MPI) standard, http://www.mcs.anl.gov/mpi/

  2. Berkeley Lab Checkpoint/Restart, http://ftg.lbl.gov/checkpoint

  3. Barak, A., Shiloh, A.: The MOSIX cluster operating system for high-performance computing on Linux cluster, multi-clusters and clouds (2012), http://www.MOSIX.org/pub/MOSIX_wp.pdf

  4. Amar, L., Barak, A., Drezner, Z., Okun, M.: Randomized gossip algorithms for maintaining a distributed bulletin board with guaranteed age properties. Concurrency and Computation: Practice and Experience 21, 1907–1927 (2009)

    Article  Google Scholar 

  5. Amir, Y., Awerbuch, B., Barak, A., Borgstrom, R.S., Keren, A.: An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Tran. Parallel and Dist. Systems 11(7), 760–768 (2000)

    Article  Google Scholar 

  6. Liu, J., Chandrasekaran, B., Yu, W., Wu, J., Buntinas, D., Kini, S.P., Wyckoff, P., Panda, D.K.: Micro-benchmark level performance comparison of high-speed cluster interconnects. Hot Interconnect 11 (2003), http://nowlab.cse.ohio-state.edu/publications/conf-papers/2003/liuj-hoti03.pdf

  7. Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. Report RNR-94-007, NASA (1994)

    Google Scholar 

  8. Iancu, C., Hofmeyr, S., Blagojevic, F., Zheng, Y.: Oversubscription on multicore processors. In: Proc. 2010 IEEE Int’l Sym. on Parallel and Dist. Processing (2010)

    Google Scholar 

  9. Corbal, J., Duran, A., Labarta, J.: Dynamic load balancing of MPI+OpenMP applications. In: Proc. Int’l Conf. on Parallel Processing (ICPP), pp. 195–202 (2004)

    Google Scholar 

  10. Hursey, J., Squyres, J.M., Mattox, T.I., Lumsdaine, A.: The design and implementation of checkpoint/restart process fault tolerance for Open MPI. In: Proc. 21st IEEE Int’l Parallel and Dist. Processing Sym. (IPDPS), pp. 1–8 (2007)

    Google Scholar 

  11. Liu, T., Ma, Z., Ou, Z.: A novel process migration method for MPI applications. In: Proc. 15th IEEE Pacific Rim Int’l Sym. on Dependable Computing, pp. 247–251 (2009)

    Google Scholar 

  12. Wang, C., Mueller, F., Engelmann, C., Scott, S.: Proactive process-level live migration in HPC environments. In: Proc. 2008 ACM/IEEE Conf. on Supercomputing, SC (2008)

    Google Scholar 

  13. Roman, E.: A Survey of Checkpoint/Restart implementations. Tech. Report LBNL-54942C, Berkeley Lab. (2002)

    Google Scholar 

  14. Gao, Q., Yu, W., Huang, W., Panda, D.K.: Application-transparent checkpoint/restart for MPI programs over Infiniband. In: Proc. 35th Int’l Conf. on Parallel Processing (ICPP), pp. 471–478 (2006)

    Google Scholar 

  15. Ouyang, X., Rajachandrasekar, R., Besseron, X., Panda, D.K.: RDMA-based job migration framework for MPI over Infiniband. In: Proc. 2010 IEEE Int’l Conf. on Cluster Computing (CLUSTER), pp. 116–125 (2010)

    Google Scholar 

  16. Ma, R.K.K., Wang, C., Lau, F.C.M.: M-JavaMPI: A Java-MPI binding with process migration support. In: Proc. 2nd IEEE Int’l Sym. on Cluster Computing and the Grid (CCGRID), p. 255 (2002)

    Google Scholar 

  17. Huang, C., Zheng, G., Kale, L., Kumar, S.: Performance evaluation of Adaptive MPI. In: Proc. 11th ACM SIGPLAN Sym. on Principles and Practice of Parallel Programming (PPoPP), pp. 12–21 (2006)

    Google Scholar 

  18. Hursey, J., Mattox, T.I., Lumsdaine, A.: Interconnect agnostic checkpoint/restart in Open MPI. In: Proc. 18th ACM Int’l Sym. on High Performance Dist. Computing (HPDC), pp. 49–58 (2009)

    Google Scholar 

  19. Keller, J., Majeed, M., Kessler, C.W.: Balancing CPU load for irregular MPI applications. In: Proc. Int’l Conf. on Parallel Computing, ParCo (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barak, A., Margolin, A., Shiloh, A. (2012). Automatic Resource-Centric Process Migration for MPI. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33518-1_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33517-4

  • Online ISBN: 978-3-642-33518-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics