Advertisement

CAPE: A Checkpointing-Based Solution for OpenMP on Distributed-Memory Architectures

  • Van Long TranEmail author
  • Éric Renault
  • Viet Hai Ha
Conference paper
  • 282 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11657)

Abstract

CAPE, which stands for Checkpointing-Aided Parallel Execution, is a framework that automatically translates and provides runtime functions to execute OpenMP programs on distributed-memory architectures based on checkpointing techniques. In order to execute an OpenMP program on distributed-memory systems, CAPE uses a set of templates to translate an OpenMP source code into a CAPE source code which is then compiled using a regular C/C++ compiler. This code can be executed on distributed-memory systems under the support of the CAPE framework.

This paper aims at presenting the design and implementation of a new execution model based on Time-stamp Incremental Checkpoints. The new execution model allows CAPE to use resources efficiently, avoid the risk of bottlenecks, overcome the requirement of matching the Bernstein’s conditions. As a result, these approaches make CAPE improving the performance, ability as well as reliability.

Keywords

CAPE Checkpointing aided parallel execution OpenMP on cluster Parallel programming Distributed computing HPC 

References

  1. 1.
    Basumallik, A., Eigenmann, R.: Towards automatic translation of OpenMP to MPI. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 189–198. ACM (2005)Google Scholar
  2. 2.
    Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. ACM SIGARCH Comput. Archit. News 29(5), 41–48 (2001)CrossRefGoogle Scholar
  3. 3.
    Chen, Z., Sun, J., Chen, H.: Optimizing checkpoint restart with data deduplication. Sci. Program. 2016, 11 (2016)Google Scholar
  4. 4.
    Cores, I., Rodríguez, M., González, P., Martín, M.J.: Reducing the overhead of an MPI application-level migration approach. Parallel Comput. 54, 72–82 (2016)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Dorta, A.J., Badía, J.M., Quintana, E.S., de Sande, F.: Implementing OpenMP for clusters on top of MPI. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds.) EuroPVM/MPI 2005. LNCS, vol. 3666, pp. 148–155. Springer, Heidelberg (2005).  https://doi.org/10.1007/11557265_22CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Ha, V.H., Renault, E.: Design and performance analysis of CAPE based on discontinuous incremental checkpoints. In: 2011 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (2011)Google Scholar
  8. 8.
    Ha, V.H., Renault, É.: Discontinuous incremental: a new approach towards extremely lightweight checkpoints. In: 2011 International Symposium on Computer Networks and Distributed Systems (CNDS), pp. 227–232. IEEE (2011)Google Scholar
  9. 9.
    Ha, V.H., Renault, E.: Improving performance of CAPE using discontinuous incremental checkpointing. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC), pp. 802–807. IEEE (2011)Google Scholar
  10. 10.
    Heo, J., Yi, S., Cho, Y., Hong, J., Shin, S.Y.: Space-efficient page-level incremental checkpointing. In: Proceedings of the 2005 ACM symposium on Applied computing, pp. 1558–1562. ACM (2005)Google Scholar
  11. 11.
    Hoeflinger, J.P.: Extending OpenMP to clusters. White Paper, Intel Corporation (2006)Google Scholar
  12. 12.
    Huang, L., Chapman, B., Liu, Z.: Towards a more efficient implementation of OpenMP for clusters via translation to global arrays. Parallel Comput. 31(10), 1114–1139 (2005)CrossRefGoogle Scholar
  13. 13.
    Karlsson, S., Lee, S.-W., Brorsson, M.: A fully compliant OpenMP implementation on software distributed shared memory. In: Sahni, S., Prasanna, V.K., Shukla, U. (eds.) HiPC 2002. LNCS, vol. 2552, pp. 195–206. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-36265-7_19CrossRefGoogle Scholar
  14. 14.
    Li, C.C., Fuchs, W.K.: Catch-compiler-assisted techniques for checkpointing. In: 20th International Symposium Fault-Tolerant Computing. FTCS-20. Digest of Papers, pp. 74–81. IEEE (1990)Google Scholar
  15. 15.
    Morin, C., Lottiaux, R., Vallée, G., Gallard, P., Utard, G., Badrinath, R., Rilling, L.: Kerrighed: a single system image cluster operating system for high performance computing. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1291–1294. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45209-6_175CrossRefGoogle Scholar
  16. 16.
    OpenMP ARB: OpenMP application program interface version 4.0 (2013)Google Scholar
  17. 17.
    Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent checkpointing under unix. Computer Science Department (1994)Google Scholar
  18. 18.
    Renault, É.: Distributed implementation of OpenMP based on checkpointing aided parallel execution. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 195–206. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-69303-1_22CrossRefGoogle Scholar
  19. 19.
    Sato, M., Harada, H., Hasegawa, A., Ishikawa, Y.: Cluster-enabled OpenMP: an OpenMP compiler for the SCASH software distributed shared memory system. Sci. Program. 9(2–3), 123–130 (2001)CrossRefGoogle Scholar
  20. 20.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005)CrossRefGoogle Scholar
  21. 21.
    Tran, V.L., Renault, É., Ha, V.H.: Improving the reliability and the performance of CAPE by using MPI for data exchange on network. In: Boumerdassi, S., Bouzefrane, S., Renault, É. (eds.) MSPN 2015. LNCS, vol. 9395, pp. 90–100. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25744-0_8CrossRefGoogle Scholar
  22. 22.
    Tran, V.L., Renault, E., Ha, V.H.: Analysis and evaluation of the performance of CAPE. In: IEEE International Symposium on IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, pp. 620–627. IEEE (2016)Google Scholar
  23. 23.
    Tran, V.L., Renault, É., Ha, V.H., Do, X.H.: Implementation of OpenMP data-sharing on cape. In: 9th International Symposium on Information and Communication Technology SoICT 2018, pp. 359–366. ACM (2018)Google Scholar
  24. 24.
    Tran, V.L., Renault, É., Ha, V.H., Do, X.H.: Time-stamp incremental checkpointing and its application for an optimization of execution model to improve performance of cape. Informatica 42(3) (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Hue Industrial CollegeHue CityVietnam
  2. 2.SAMOVAR, Télécom SudParis, CNRS, Université Paris-SaclayEvry CedexFrance
  3. 3.College of EducationHue UniversityHueVietnam

Personalised recommendations