Towards Checkpointing Grid Architecture
Contemporary Grid environments are featured by an increasingly growing virtualization and distribution of resources. Such situations impose greater demands on load-balancing and fault-tolerant capabilities. The checkpoint-restart mechanism seems to be the most intuitive tool that can fulfill the specific requirements. One of the goals of the CoreGRID Network of Excellence is to define the high-level checkpoint-restart Grid Service and to locate it among other Grid Services. We aim to define both the abstract model of that service and the lower layer interface that will allow the service to cooperate with the diverse existing and future checkpoint-restart tools. The paper is the first step leading to achieving this goal. It includes the overall sketch of the architecture of the considered service and its connection with the actual checkpoint-restart tools. Additionally, the work on low-level checkpoint restart tools to be used in the “proof of concept” implementation and integration is mentioned.
KeywordsParallel Application Grid Environment Grid Service Authentication Service Globus Toolkit
Unable to display preview. Download preview PDF.
- 2.Jankowski, G., Mikolajczak, R., Januszewski, R.: Checkpoint/Restart mechanism for multiprocess applications implemented under SGIGrid Project. In: CGW 2004 (2004)Google Scholar
- 3.Litzkow, M., Tannenbaun, T., Basney, J., Livny, M.: Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System, Computer Sciences Department University of Wisconsin-MadisonGoogle Scholar
- 4.Libckpt: Transparent Checkpointing under Unix’. In: Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA (January 1995)Google Scholar
- 5.Kovacs, J., Kacsuk, P.: A migration framework for executing parallel programs in the Grid. In: 2nd European AxGrids Conference, Nicosia, Cyprus, January 28-30, pp. 80–89 (2004)Google Scholar
- 6.Next Generation Grid(s), European Grid Research 2005-2010, Expert Group Report, June 16 (2003)Google Scholar
- 7.Next Generation Grids 2, Requirements and Options for European Grids Research 2005-2010 and Beyond, Expert Group Report (July 2004)Google Scholar
- 8.A Survey of Checkpointing/Restart Implementations, Eric Roman, Lawrence Berkley National Laboratory, CAGoogle Scholar
- 11.PGRADE Parallel Grid Run-time and Application Development Environment: http://www.lpds.sztaki.hu/pgrade