Towards Checkpointing Grid Architecture

  • Gracjan Jankowski
  • Jozsef Kovacs
  • Norbert Meyer
  • Radoslaw Januszewski
  • Rafal Mikolajczak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3911)


Contemporary Grid environments are featured by an increasingly growing virtualization and distribution of resources. Such situations impose greater demands on load-balancing and fault-tolerant capabilities. The checkpoint-restart mechanism seems to be the most intuitive tool that can fulfill the specific requirements. One of the goals of the CoreGRID Network of Excellence is to define the high-level checkpoint-restart Grid Service and to locate it among other Grid Services. We aim to define both the abstract model of that service and the lower layer interface that will allow the service to cooperate with the diverse existing and future checkpoint-restart tools. The paper is the first step leading to achieving this goal. It includes the overall sketch of the architecture of the considered service and its connection with the actual checkpoint-restart tools. Additionally, the work on low-level checkpoint restart tools to be used in the “proof of concept” implementation and integration is mentioned.


Parallel Application Grid Environment Grid Service Authentication Service Globus Toolkit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Jankowski, G., Mikolajczak, R., Januszewski, R.: Checkpoint/Restart mechanism for multiprocess applications implemented under SGIGrid Project. In: CGW 2004 (2004)Google Scholar
  3. 3.
    Litzkow, M., Tannenbaun, T., Basney, J., Livny, M.: Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System, Computer Sciences Department University of Wisconsin-MadisonGoogle Scholar
  4. 4.
    Libckpt: Transparent Checkpointing under Unix’. In: Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA (January 1995)Google Scholar
  5. 5.
    Kovacs, J., Kacsuk, P.: A migration framework for executing parallel programs in the Grid. In: 2nd European AxGrids Conference, Nicosia, Cyprus, January 28-30, pp. 80–89 (2004)Google Scholar
  6. 6.
    Next Generation Grid(s), European Grid Research 2005-2010, Expert Group Report, June 16 (2003)Google Scholar
  7. 7.
    Next Generation Grids 2, Requirements and Options for European Grids Research 2005-2010 and Beyond, Expert Group Report (July 2004)Google Scholar
  8. 8.
    A Survey of Checkpointing/Restart Implementations, Eric Roman, Lawrence Berkley National Laboratory, CAGoogle Scholar
  9. 9.
    Jankowski, G., Mikolajczak, R., Januszewski, R., Meyer, N., Stroinski, M.: Resources Virtualization in Fault-Tolerance and Migration Issues. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3036, pp. 449–452. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Kacsuk, P., Dozsa, G., Kovacs, J., et al.: P-GRADE: a Grid Programming Environment. Journal of Grid Computing 1(2), 171–197 (2004)CrossRefGoogle Scholar
  11. 11.
    PGRADE Parallel Grid Run-time and Application Development Environment:

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gracjan Jankowski
    • 1
  • Jozsef Kovacs
    • 2
  • Norbert Meyer
    • 1
  • Radoslaw Januszewski
    • 1
  • Rafal Mikolajczak
    • 1
  1. 1.Poznan Supercomputing and Networking CenterPoznanPoland
  2. 2.Computer and Automation Research InstituteHungarian Academy of SciencesBudapestHungary

Personalised recommendations