Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols

  • George Bosilca
  • Aurelien Bouteiller
  • Thomas Herault
  • Pierre Lemarinier
  • Jack J. Dongarra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6305)


With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols is the performance penalty on communications during failure-free periods, mostly coming from the payload copy introduced for each message. In this paper, we investigate different approaches for logging payload and compare their impact on network performance.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Meuer, W.H.: The top500 project: Looking back over 15 years of supercomputing experience. Informatik-Spektrum 31(3), 203–222 (2008)CrossRefGoogle Scholar
  2. 2.
    The MPI Forum: MPI: a message passing interface. In: Supercomputing 1993: Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pp. 878–883. ACM Press, New York (1993)Google Scholar
  3. 3.
    Fagg, G.E., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pjesivac-Grbovic, J., London, K., Dongarra, J.J.: Extending the MPI specification for process fault tolerance on high performance computing systems. In: Proceedings of the International Supercomputer Conference (ICS) 2004, Primeur (2004)Google Scholar
  4. 4.
    Lemarinier, P., Bouteiller, A., Herault, T., Krawezik, G., Cappello, F.: Improved message logging versus improved coordinated checkpointing for fault tolerant MPI. In: IEEE International Conference on Cluster Computing (Cluster 2004). IEEE CS Press, Los Alamitos (2004)Google Scholar
  5. 5.
    Bouteiller, A., Ropars, T., Bosilca, G., Morin, C., Dongarra, J.: Reasons to be pessimist or optimist for failure recovery in high performance clusters. In: IEEE (ed.): Proceedings of the 2009 IEEE Cluster Conference, New Orleans, Louisiana, USA (2009)Google Scholar
  6. 6.
    Bouteiller, A., Bosilca, G., Dongarra, J.: Redesigning the message logging model for high performance. In: Proceedings of the International Supercomputer Conference (ISC 2008), Dresden, Germany. Wiley, Chichester (2008) (to appear)Google Scholar
  7. 7.
    Strom, R.E., Bacon, D.F., Yemini, S.: Volatile logging in n-fault-tolerant distributed systems. In: Society, I.C. (ed.) Proceedings of the Eighteenth International Symposium on Fault Tolerant Computing (1988)Google Scholar
  8. 8.
    Strom, R.E., Yemini, S.: Optimistic recovery: an asynchronous approah to fault-tolerance in distributed systems. In: Proceedings of the 14th International Symposium on Fault-Tolerant Computing. IEEE Computer Society Press, Los Alamitos (1984)Google Scholar
  9. 9.
    Manivannan, D., Singhal, M.: A low-overhead recovery technique using quasi-synchronous checkpointing. In: International Conference on Distributed Computing Systems, p. 100 (1996)Google Scholar
  10. 10.
    Vaidyanathan, K., Chai, L., Huang, W., Panda, D.K.: Efficient asynchronous memory copy operations on multi-core systems and i/oat. In: CLUSTER 2007: Proceedings of the 2007 IEEE International Conference on Cluster Computing, Washington, DC, USA, pp. 159–168. IEEE Computer Society Press, Los Alamitos (2007)CrossRefGoogle Scholar
  11. 11.
    Goglin, B.: Improving message passing over ethernet with i/oat copy offload in open-mx. In: Proceedings of the 2008 IEEE International Conference on Cluster Computing, pp. 223–231. IEEE, Los Alamitos (2008)CrossRefGoogle Scholar
  12. 12.
    Stricker, T., Gross, T.: Optimizing memory system performance for communication in parallel computers. In: ISCA 1995: Proceedings of the 22nd annual international symposium on Computer architecture, pp. 308–319. ACM, New York (1995)CrossRefGoogle Scholar
  13. 13.
    Geoffray, P.: Opiom: Off-processor i/o with myrinet. Future Generation Comp. Syst. 18(4), 491–499 (2002)zbMATHCrossRefGoogle Scholar
  14. 14.
    Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S.: Open MPI: Goals, concept, and design of a next generation MPI implementation. In: Proceedings of 11th uropean PVM/MPI Users’ Group Meeting, Budapest, Hungary, pp. 97–104 (2004)Google Scholar
  15. 15.
    Bouteiller, A., Bosilca, G., Dongarra, J.: Retrospect: Deterministic replay of mpi applications for interactive distributed debugging. In: Proccedings of the 14th European PVM/MPI User’s Group Meeting (EuroPVM/MPI), pp. 297–306 (2007)Google Scholar
  16. 16.
    Snell, Q.O., Mikler, A.R., Gustafson, J.L.: Netpipe: A network protocol independent performance evaluator. In: IASTED International Conference on Intelligent Information Management and Systems (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • George Bosilca
    • 1
  • Aurelien Bouteiller
    • 1
  • Thomas Herault
    • 1
    • 2
  • Pierre Lemarinier
    • 1
  • Jack J. Dongarra
    • 1
    • 3
  1. 1.University of TennesseeUSA
  2. 2.Universite Paris-Sud, INRIAFrance
  3. 3.Oak Ridge National LaboratoryUSA

Personalised recommendations