Advertisement

Deterministic fault injection of distributed systems

  • Scott Dawson
  • Farnam Jahanian
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 938)

Abstract

Ensuring that a system meets its prescribed specification is a growing challenge that confronts software developers and system engineers. Meeting this challenge is particularly important for distributed systems with strict dependability and timeliness constraints. This paper presents a technique, called script-driven probing and fault injection, for the evaluation and validation of dependable protocols. The proposed approach can be used to demonstrate three aspects of a target protocol: i) detection of design or implementation errors, ii) identification of violations of protocol specifications, and iii) insight into design decisions made by the implementors. To demonstrate the capabilities of this technique, the paper briefly describes a probing and fault injection tool, called the PFI tool, and several experiments on two protocols: the Transmission Control Protocol (TCP) [4, 24] and the Group Membership Protocol (GMP) [19]. The tool can be used to delay, drop, reorder, duplicate, and modify messages. It can also introduce new messages into the system to probe participants. In the case of TCP, we used the PFI tool to duplicate the experiments reported in [7] on several TCP implementations without access to the vendors' TCP source code in a very short time. We also ran several new experiments that are difficult to perform using past approaches based on packet monitoring and filtering. In the case of GMP, we used the tool to test the fault-tolerance capabilities of an implementation under various failure models including daemon/link crash, send/receive omissions, and timing failures. Furthermore, by selective reordering of messages and spontaneous transmission of new messages, we were able to guide a distributed computation into hard to reach global states without instrumenting the protocol implementation.

Keywords

Transmission Control Protocol Fault Injection Protocol Implementation Crash Failure Protocol Participant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. Arlat, Y. Crouzet, and J.-C. Laprie. Fault injection for dependability validation of fault-tolerant computing systems. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 348–355, June 1989.Google Scholar
  2. 2.
    Jean Arlat, Martine Aguera, Yves Crouzet, Jean-Charles Fabre, Eliane Martins, and David Powell. Experimental evaluation of the fault tolerance of an atomic multicast system. IEEE Trans. Reliability, 39(4):455–467, October 1990.Google Scholar
  3. 3.
    D. Avresky, J. Arlat, J.C. Laprie, and Yves Crouzet. Fault injection for the formal testing of fault tolerance. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 345–354. IEEE, 1992.Google Scholar
  4. 4.
    R. Braden. RFC-1122: Requirements for internet hosts. Request for Comments, October 1989. Network Information Center.Google Scholar
  5. 5.
    R. Chillarege and N. S. Bowen. Understanding large system failures — a fault injection experiment. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 356–363, June 1989.Google Scholar
  6. 6.
    G. Choi, R. Iyer, and V. Carreno. Simulated fault injection: A methodology to evaluate fault tolerant microprocessor architectures. IEEE Trans. Reliability, 39(4):486–490, October 1990.Google Scholar
  7. 7.
    Douglas E. Comer and John C. Lin. Probing TCP implementations. In Proc. Summer USENIX Conference, June 1994.Google Scholar
  8. 8.
    F. Cristian. Reaching agreement on processor-group membership in synchronous distributed systems. Distributed Computing, (4):175–187, 1991.Google Scholar
  9. 9.
    E. Czeck and D. Siewiorek. Effects of transient gate-level faults on program behaviour. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 236–243. IEEE, 1990.Google Scholar
  10. 10.
    Scott Dawson and Farnam Jahanian. Probing and fault injection of protocol implementations. Technical Report CSE-TR-217-94, The University of Michigan, October 1994.Google Scholar
  11. 11.
    K. Echtle and Y. Chen. Evaluation of deterministic fault injection for faulttolerant protocol testing. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 418–425. IEEE, 1991.Google Scholar
  12. 12.
    Klaus Echtle and Martin Leu. The EFA fault injector for fault-tolerant distributed system testing. In Workshop on Fault-Tolerant Parallel and Distributed Systems, pages 28–35. IEEE, 1992.Google Scholar
  13. 13.
    G. Finelli. Characterization of fault recovery through fault injection on ftmp. IEEE Trans. Reliability, 36(2):164–170, June 1987.Google Scholar
  14. 14.
    K. Goswami and R. Iyer. Simulation of software behaviour under hardware faults. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 218–227. IEEE, 1993.Google Scholar
  15. 15.
    Vassos Hadzilacos and Sam Toueg. Fault-tolerant broadcasts and related problems. In Sape Mullender, editor, Distributed Systems. Addison Wesley, 1993. Second Edition.Google Scholar
  16. 16.
    Seungjae Han, Harold A. Rosenberg, and Kang G. Shin. DOCTOR: An IntegrateD sOftware fault injeCTOn enviRonment. Technical Report CSE-TR-192-93, The University of Michigan, December 1993.Google Scholar
  17. 17.
    Norman C. Hutchinson and Larry L. Peterson. The x-Kernel: An architecture for implementing network protocols. IEEE Trans. Software Engineering, 17(1):1–13, January 1991.Google Scholar
  18. 18.
    David B. Ingham and Graham D. Parrington. Delayline: A wide-area network emulation tool. Computing Systems, 7(3):313–332, Summer 1994.Google Scholar
  19. 19.
    Farnam Jahanian, Ragunathan Rajkumar, and Sameh Fakhouri. Processor group membership protocols: Specification, design and implementation. In Proceedings of the 12th Symposium on Reliable Distributed Systems, pages 2–11, Princeton, New Jersey, October 1993.Google Scholar
  20. 20.
    G.A Kanawati, N.A. Kanawati, and J.A. Abraham. FERRARI: A tool for the validation of system dependability properties. In Proc. Int'l Symp. on Fault-Tolerant Computing, pages 336–344. IEEE, 1992.Google Scholar
  21. 21.
    Steven McCanne and Van Jacobson. The bsd packet filter: A new architecture for user-level packet capture. In Winter USENIX Conference, pages 259–269, January 1993.Google Scholar
  22. 22.
    Shivakant Mishra, Larry L. Peterson, and Richard D. Schlichtung. A membership protocol based on partial order. In Second Working Conference on Dependable Computing for Critical Applications, February 1990.Google Scholar
  23. 23.
    J. Mogul, R. Rashid, and M. Accetta. The packet filter: An efficient mechanism for user-level network code. In Proc. ACM Symp. on Operating Systems Principles, pages 39–51, Austin, TX, November 1987. ACM.Google Scholar
  24. 24.
    Jon Postel. RFC-793: Transmission control protocol. Request for Comments, September 1981. Network Information Center.Google Scholar
  25. 25.
    A. M. Ricciardi and K. P. Birman. Using process groups to implement failure detection in asynchronous environments. In Proceedings of the 11th ACM Symposium on Principles of Distributed Computing, Montreal, Quebec, August 1991.Google Scholar
  26. 26.
    Z. Segall et al. Fiat — fault injection based automated testing environment. In FTCS-18, pages 102–107, 1988.Google Scholar
  27. 27.
    K. G. Shin and Y. H. Lee. Measurement and application of fault latency. IEEE Trans. Computers, C-35(4):370–375, April 1986.Google Scholar
  28. 28.
    Masanobu Yuhara, Brian N. Bershad, Chris Maeda, and J. Eliot B. Moss. Efficient packet demultiplexing for multiple endpoints and large messages. In Winter USENIX Conference, January 1994. Second Edition.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Scott Dawson
    • 1
  • Farnam Jahanian
    • 1
  1. 1.Real-Time Computing Laboratory Electrical Engineering and Computer Science DepartmentUniversity of MichiganAnn Arbor

Personalised recommendations