Skip to main content
Log in

Simulation-based Testing of Communication Protocols for Dependable Embedded Systems

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We present a novel approach to testing fault-tolerant and real-time protocol implementations. Cesium, our testing environment, executes the protocols in a centralized simulator of the distributed system. It simulates the occurrence of inputs and the failure scenarios the protocols are designed to tolerate, while automatically verifying that the required safety and timeliness properties hold at all times during test experiments. Within this framework, the human tester can define failure operations that simulate every failure class studied in the literature. We apply our approach to two fault-tolerant protocols typical in embedded systems. The results show that Cesium can pinpoint implementation errors that would be very difficult to identify in a real system, and can also compute accurate performance predictions that would be problematic to measure in the real embedded platform without ad hoc hardware instrumentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. Alvarez. A centralized simulation approach to testing fault-tolerant and real-time communication protocols. PhD thesis, Dept. of Computer Science and Engineering, University of California, San Diego, Ca., May 1998.

    Google Scholar 

  2. G. Alvarez, W. Burkhard, L. Stockmeyer, and F. Cristian. Declustered disk array architectures with optimal and near-optimal parallelism. In Proc. of the 25th Annual ACM/IEEE Int'l Symp. on Computer Architecture, pages 109–120, Barcelona, Spain, June 1998.

  3. G. Alvarez and F. Cristian. Applying simulation to the design and performance evaluation of faulttolerant systems. In Proc. of the 16th IEEE Symp. on Reliable Distributed Systems, pages 35–42, October 1997.

  4. G. Alvarez and F. Cristian. Centralized failure injection for distributed, fault-tolerant protocol testing. In Proc. of the 17th IEEE Int'l Conf. on Distributed Computing Systems, pages 78–85, May 1997.

  5. G. Alvarez and F. Cristian. Simulation-based test of fault-tolerant group membership services. In Proc. of the 12th Annual Conf. on Computer Assurance, pages 129–138, Gaithersburg, Maryland, June 1997.

  6. J. Arlat, A. Costes, Y. Crouzet, J.-C. Laprie, and D. Powell. Fault injection and dependability evaluation of fault-tolerant systems. IEEE Transactions on Computers, 42(8): 913–923, August 1993.

    Google Scholar 

  7. T. Bickard, N. Hubart, and J.-L. Lanet. Jet engine control systems. In Proc. of the IEEE Int'l Workshop on Embedded Fault-Tolerant Systems, September 1996.

  8. E. Brewer, C. Dellarocas, A. Colbrook, and W. Weihl. Proteus: A high-performance parallelarchitecture simulator. In Proc. of ACM SIGMETRICS and Performance, pages 247–258, May 1992.

  9. S. Budkowski. Estelle development toolset (EDT). Computer Networks and ISDN Systems, 25(1): 63–82, August 1992.

    Google Scholar 

  10. R. Covington, S. Madala, V. Mehta, J. Jump, and J. Sinclair. The Rice parallel processing testbed. In Proc. of ACM SIGMETRICS, pages 4–11, May 1988.

  11. F. Cristian. Reaching agreement on processor-group membership in synchronous distributed systems. Distributed Computing, 4: 175–187, 1991.

    Google Scholar 

  12. F. Cristian, H. Aghili, and R. Strong. Clock synchronization in the presence of omission and performance faults, and processor joins. In Global States and Time in Distributed Systems, pages 69–75. IEEE Comp. Soc. Press, 1986.

  13. F. Cristian and F. Schmuck. Agreeing on processor-group membership in asynchronous distributed systems. Technical Report CS95–428, Dept. of Computer Science and Engineering, University of California, San Diego, 1995.

    Google Scholar 

  14. H. Davis, S. Goldschmidt, and J. Hennessy. Tango: A multiprocessor simulation and tracing system. In Proc. of the Int'l. Conf. on Parallel Processing, August 1991.

  15. S. Dawson, F. Jahanian, T. Mitton, and T. Tung. Testing of fault-tolerant and real-time distributed systems via protocol fault injection. In Proc. of the 26th Int'l Symp. on Fault-Tolerant Computing, pages 404–414, June 1996.

  16. K. Echtle and Y. Chen. Evaluation of deterministic fault injection for fault-tolerant protocol testing. In Proc. of the 21rd Int'l Symp. on Fault-Tolerant Computing, pages 418–425, June 1991.

  17. M. Fischer, N. Lynch, and M. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2): 374–382, 1985.

    Google Scholar 

  18. J. Gait. A probe effect in concurrent programs. Software-Practice and Experience, 16(3): 225–233, March 1986.

    Google Scholar 

  19. S. Goldschmidt and J. Hennessy. The accuracy of trace-driven simulations of multiprocessors. In Proc. of ACM SIGMETRICS, pages 146–157, May 1993.

  20. K. Goswami and R. Iyer. Simulation of software behavior under hardware faults. In Proc. of the 23rd Int'l Symp. on Fault-Tolerant Computing, pages 218–227, June 1993.

  21. S. Han, K. Shin, and H. Rosenberg. k Doctor: An integrated software fault injection environment for distributed real-time systems. In Proc. of the Int'l Computer Performance and Dependability Symp., pages 204–213, April 1995.

  22. H. Kopetz, M. Braun, C. Ebner, A. Kruger, D. Millinger, R. Nossal, and A. Schedl. The design of large real-time systems: the time-triggered approach. In Proc. of the 16th IEEE Real-time Systems Symp., pages 182–187, December 1995.

  23. B. Laurey, C. Meissner, B. Johnson, T. Smith, T. Delong, and J. Profeta. A prototype safety-critical railway controller. In Proc. of the IEEE Int'l Workshop on Embedded Fault-Tolerant Systems, September 1996.

  24. W. Schutz. The Testability of Distributed Real-time Systems. Kluwer Academic Publishers, 1993.

  25. Z. Segall, D. Vrsalovic, D. Siewiorek, D. Yaskin, J. Kownacki, J. Barton, R. Dancey, A. Robinson, and T. Lin. FIAT: Fault-injection based automated testing environment. In Proc. of the 18th Int'l Symp. on Fault-Tolerant Computing, pages 102–107, June 1988.

  26. S. Tao, P. Ezhilchelvan, and R. Shrivastava. Focused fault injection of software implemented fault tolerance mechanisms of Voltan TMR nodes. Distributed Systems Engineering, 2(1): 39–49, March 1995.

    Google Scholar 

  27. P. Zhou and J. Hooman. Formal specification and compositional verification of an atomic broadcast protocol. Real-Time Systems, 9(2): 119–145, September 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alvarez, G.A., Cristian, F. Simulation-based Testing of Communication Protocols for Dependable Embedded Systems. The Journal of Supercomputing 16, 93–116 (2000). https://doi.org/10.1023/A:1008185530601

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008185530601

Navigation