The Journal of Supercomputing

, Volume 70, Issue 2, pp 660–670 | Cite as

In-memory application-level checkpoint-based migration for MPI programs

  • Iván Cores
  • Gabriel Rodríguez
  • María J. Martín
  • Patricia González
Article

Abstract

Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based migration solution for MPI codes that uses the Hierarchical Data Format 5 (HDF5) to write the checkpoint files. The main features of the proposed solution are transparency for the user, achieved through the use of CPPC (ComPiler for Portable Checkpointing); portability, as the application-level approach makes the solution adequate for any MPI implementation and operating system, and the use of the HDF5 file format enables the restart on different architectures; and high performance, by saving the checkpoint files to memory instead of to disk through the use of the HDF5 in-memory files. Experimental results prove that the in-memory approach reduces significantly the I/O cost of the migration process.

Keywords

Checkpoint Migration MPI HDF5 

References

  1. 1.
    Cores I, Rodríguez G, González P, Martín MJ (2014) Failure avoidance in MPI applications using an application-level approach. Comput J 57(1):100–114Google Scholar
  2. 2.
    Cores I, Rodríguez G, González P, Martín MJ (2012) Reducing application-level checkpoint file sizes: towards scalable fault tolerance solutions. In: Proceedings of ISPA 12, Madrid, Spain, 10–13 July 2012. IEEE Computer Society Press, Los Alamitos, pp 371–378Google Scholar
  3. 3.
    Du C, Sun X-H (2006) MPI-Mitten: enabling migration technology in MPI. In: Proceedings of CCGRID 06, Singapore, 16–19 May 2006. IEEE Computer Society Press, Los Alamitos, pp 11–18Google Scholar
  4. 4.
    Li M, Vazhkudai SS, Butt AR, Meng F, Ma X, Kim Y, Engelmann C, Shipman GM (2010) Functional partitioning to optimize end-to-end performance on many-core architectures. In: Proceedings of conference on high performance computing networking, storage and analysis, SC 2010, New Orleans, LA, USA, 13–19 Nov 2010, pp 1–12Google Scholar
  5. 5.
    National Aeronautics and Space Administration. The NAS parallel benchmarks. http://www.nas.nasa.gov/publications/npb.html. Accessed on July 2013
  6. 6.
    Ouyang X, Rajachandrasekar R, Besseron X, Panda DK (2011) High performance pipelined process migration with RDMA. In: Proceedings of CCGRID 11, Newport Beach, CA, USA, 23–26 May 2011. IEEE Computer Society Press, Los Alamitos, pp 314–323Google Scholar
  7. 7.
    Rodríguez G, Martín MJ, González P, Touri no J, Doallo R (2010) CPPC: A compiler-assisted tool for portable checkpointing of message-passing applications. Concurr Comput Pract Exp 22(6):749–766Google Scholar
  8. 8.
    Singh R, Graham P (2008) Performance driven partial checkpoint/migrate for LAM-MPI. In: Proceedings of HPCS 08, Québec City, Canada, 9–11 June 2008. IEEE Computer Society Press, Los Alamitos, pp 110–116Google Scholar
  9. 9.
    The HDF Group. HDF-5: hierarchical data format. http://www.hdfgroup.org/HDF5/. Accessed on July 2013
  10. 10.
    The HDF Group. HDF5 File image operations. http://www.hdfgroup.org/HDF5/doc/Advanced/FileImageOperations/HDF5FileImageOperations.pdf. Accessed on July 2013
  11. 11.
    Wang C, Mueller F, Engelmann C, Scott SL (2008) Proactive process-level live migration in HPC environments. In: Proceedings of the 21st IEEE/ACM international conference on high performance computing, networking, storage and analysis (SC) 2008, pp 1–12Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Iván Cores
    • 1
  • Gabriel Rodríguez
    • 1
  • María J. Martín
    • 1
  • Patricia González
    • 1
  1. 1.Computer Architecture GroupUniversity of A CoruñaA CoruñaSpain

Personalised recommendations