Skip to main content
Log in

In-memory application-level checkpoint-based migration for MPI programs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based migration solution for MPI codes that uses the Hierarchical Data Format 5 (HDF5) to write the checkpoint files. The main features of the proposed solution are transparency for the user, achieved through the use of CPPC (ComPiler for Portable Checkpointing); portability, as the application-level approach makes the solution adequate for any MPI implementation and operating system, and the use of the HDF5 file format enables the restart on different architectures; and high performance, by saving the checkpoint files to memory instead of to disk through the use of the HDF5 in-memory files. Experimental results prove that the in-memory approach reduces significantly the I/O cost of the migration process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Cores I, Rodríguez G, González P, Martín MJ (2014) Failure avoidance in MPI applications using an application-level approach. Comput J 57(1):100–114

    Google Scholar 

  2. Cores I, Rodríguez G, González P, Martín MJ (2012) Reducing application-level checkpoint file sizes: towards scalable fault tolerance solutions. In: Proceedings of ISPA 12, Madrid, Spain, 10–13 July 2012. IEEE Computer Society Press, Los Alamitos, pp 371–378

  3. Du C, Sun X-H (2006) MPI-Mitten: enabling migration technology in MPI. In: Proceedings of CCGRID 06, Singapore, 16–19 May 2006. IEEE Computer Society Press, Los Alamitos, pp 11–18

  4. Li M, Vazhkudai SS, Butt AR, Meng F, Ma X, Kim Y, Engelmann C, Shipman GM (2010) Functional partitioning to optimize end-to-end performance on many-core architectures. In: Proceedings of conference on high performance computing networking, storage and analysis, SC 2010, New Orleans, LA, USA, 13–19 Nov 2010, pp 1–12

  5. National Aeronautics and Space Administration. The NAS parallel benchmarks. http://www.nas.nasa.gov/publications/npb.html. Accessed on July 2013

  6. Ouyang X, Rajachandrasekar R, Besseron X, Panda DK (2011) High performance pipelined process migration with RDMA. In: Proceedings of CCGRID 11, Newport Beach, CA, USA, 23–26 May 2011. IEEE Computer Society Press, Los Alamitos, pp 314–323

  7. Rodríguez G, Martín MJ, González P, Touri no J, Doallo R (2010) CPPC: A compiler-assisted tool for portable checkpointing of message-passing applications. Concurr Comput Pract Exp 22(6):749–766

    Google Scholar 

  8. Singh R, Graham P (2008) Performance driven partial checkpoint/migrate for LAM-MPI. In: Proceedings of HPCS 08, Québec City, Canada, 9–11 June 2008. IEEE Computer Society Press, Los Alamitos, pp 110–116

  9. The HDF Group. HDF-5: hierarchical data format. http://www.hdfgroup.org/HDF5/. Accessed on July 2013

  10. The HDF Group. HDF5 File image operations. http://www.hdfgroup.org/HDF5/doc/Advanced/FileImageOperations/HDF5FileImageOperations.pdf. Accessed on July 2013

  11. Wang C, Mueller F, Engelmann C, Scott SL (2008) Proactive process-level live migration in HPC environments. In: Proceedings of the 21st IEEE/ACM international conference on high performance computing, networking, storage and analysis (SC) 2008, pp 1–12

Download references

Acknowledgments

This research was supported by the Ministry of Science and Innovation of Spain (Project TIN2010-16735) and by the Galician Government (Project 10PXIB 105180PR and consolidation program of competitive reference groups GRC2013/055).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iván Cores.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cores, I., Rodríguez, G., Martín, M.J. et al. In-memory application-level checkpoint-based migration for MPI programs. J Supercomput 70, 660–670 (2014). https://doi.org/10.1007/s11227-014-1120-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1120-2

Keywords

Navigation