A Replication-Based Fault Tolerance Protocol Using Group Communication for the Grid

  • Kayhan Erciyes
Conference paper

DOI: 10.1007/11946441_62

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4330)
Cite this paper as:
Erciyes K. (2006) A Replication-Based Fault Tolerance Protocol Using Group Communication for the Grid. In: Guo M., Yang L.T., Di Martino B., Zima H.P., Dongarra J., Tang F. (eds) Parallel and Distributed Processing and Applications. ISPA 2006. Lecture Notes in Computer Science, vol 4330. Springer, Berlin, Heidelberg

Abstract

We describe a replication-based protocol that uses group communication for fault tolerance in the Computational Grid. The Grid is partitioned into a number of clusters and each cluster has a designated coordinator that manages the states of the replicas within its cluster. The coordinators belong to a process group and the proposed protocol ensures the correct sequence of message deliveries to the replicas by the coordinators. Any failing node of the Grid is replaced by an active replica to provide correct continuation of the operation of the application. We show the theoretical framework along with illustrations of the replication protocol and its implementation results and analyze its performance and scalability.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kayhan Erciyes
    • 1
  1. 1.Computer Eng. Dept.Izmir Institute of TechnologyUrlaTurkey

Personalised recommendations