Correlated Set Coordination in Fault Tolerant Message Logging Protocols

  • Aurelien Bouteiller
  • Thomas Herault
  • George Bosilca
  • Jack J. Dongarra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)


Based on our current expectation for the exascale systems, composed of hundred of thousands of many-core nodes, the mean time between failures will become small, even under the most optimistic assumptions. One of the most scalable checkpoint restart techniques, the message logging approach, is the most challenged when the number of cores per node increases, due to the high overhead of saving the message payload. Fortunately, for two processes on the same node, the failure probability is correlated, meaning that coordinated recovery is free. In this paper, we propose an intermediate approach that uses coordination between correlated processes, but retains the scalability advantage of message logging between independent ones. The algorithm still belongs to the family of event logging protocols, but eliminates the need for costly payload logging between coordinated processes.


Shared Memory Outgoing Message Message Logging Recovery Line Internal Message 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alvisi, L., Elnozahy, E., Rao, S., Husain, S.A., Mel, A.D.: An analysis of communication induced checkpointing. In: 29th Symposium on Fault-Tolerant Computing (FTCS 1999). IEEE CS Press, Los Alamitos (1999)Google Scholar
  2. 2.
    Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Dongarra, J.J.: Dodging the cost of unavoidable memory copies in message logging protocols. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 189–197. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Bouteiller, A., Bosilca, G., Dongarra, J.: Redesigning the message logging model for high performance. In: ISC 2008, Wiley, Dresden (June 2008) (p. to appear)Google Scholar
  4. 4.
    Bouteiller, A., Ropars, T., Bosilca, G., Morin, C., Dongarra, J.: Reasons to be pessimist or optimist for failure recovery in high performance clusters. In: IEEE (ed.) Proceedings of the 2009 IEEE Cluster Conference (September 2009)Google Scholar
  5. 5.
    Buntinas, D., Coti, C., Herault, T., Lemarinier, P., Pilard, L., Rezmerita, A., Rodriguez, E., Cappello, F.: Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI protocols. Future Generation Computer Systems 24(1), 73–84 (2008), CrossRefGoogle Scholar
  6. 6.
    Chandy, K.M., Lamport, L.: Distributed snapshots: Determining global states of distributed systems. Transactions on Computer Systems 3(1), 63–75 (1985)CrossRefGoogle Scholar
  7. 7.
    Dongarra, J., Beckman, P., et al.: The international exascale software roadmap. Intl. Journal of High Performance Computer Applications 25(11) (to appear) (2011)Google Scholar
  8. 8.
    Esteban Meneses, C.L.M., Kalé, L.V.: Team-based message logging: Preliminary results. In: 3rd Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids (CCGRID 2010) (May 2010)Google Scholar
  9. 9.
    Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (September 2004)Google Scholar
  10. 10.
    Gao, Q., Huang, W., Koop, M.J., Panda, D.K.: Group-based coordinated checkpointing for mpi: A case study on infiniband. In: International Conference on Parallel Processing, ICPP 2007 (2007)Google Scholar
  11. 11.
    Ho, J.C.Y., Wang, C.L., Lau, F.C.M.: Scalable Group-based Checkpoint/Restart for Large-Scale Message-Passing Systems. In: Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12. IEEE, Los Alamitos (2008)Google Scholar
  12. 12.
    Hlary, J.M., Mostefaoui, A., Raynal, M.: Communication-induced determination of consistent snapshots. IEEE Transactions on Parallel and Distributed Systems 10(9), 865–877 (1999)CrossRefGoogle Scholar
  13. 13.
    Kale, L.: Charm++. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, Springer, Heidelberg (to appear) (2011)Google Scholar
  14. 14.
    Lamport, L.: Time, clocks, and the ordering of events in a distributed system. Communications of the ACM 21(7), 558–565 (1978)CrossRefzbMATHGoogle Scholar
  15. 15.
    Lemarinier, P., Bouteiller, A., Herault, T., Krawezik, G., Cappello, F.: Improved message logging versus improved coordinated checkpointing for fault tolerant MPI. In: IEEE International Conference on Cluster Computing. IEEE CS Press, Los Alamitos (2004)Google Scholar
  16. 16.
    Negara, S., Pan, K.C., Zheng, G., Negara, N., Johnson, R.E., Kale, L.V., Ricker, P.M.: Automatic MPI to AMPI Program Transformation. Tech. Rep. 10-09, Parallel Programming Laboratory (March 2010)Google Scholar
  17. 17.
    Plank, J.S.: Efficient Checkpointing on MIMD Architectures. Ph.D. thesis, Princeton University (June 1993),
  18. 18.
    Rao, S., Alvisi, L., Vin, H.M.: The cost of recovery in message logging protocols. In: 17th Symposium on Reliable Distributed Systems (SRDS), October 1998, pp. 10–18. IEEE CS Press, Los Alamitos (1998)Google Scholar
  19. 19.
    The MPI Forum: MPI: a message passing interface. In: Supercomputing 1993: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 878–883. ACM Press, New York (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Aurelien Bouteiller
    • 1
  • Thomas Herault
    • 1
  • George Bosilca
    • 1
  • Jack J. Dongarra
    • 1
    • 2
  1. 1.Innovative Computing LaboratoryThe University of TennesseeUSA
  2. 2.Oak Ridge National LaboratoryUSA

Personalised recommendations