Achieving Critical System Survivability Through Software Architectures

  • John C. Knight
  • Elisabeth A. Strunk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3069)

Abstract

Software-intensive systems often exhibit dimensions in size and complexity that exceed the scope of comprehension of system designers and analysts. With this complexity comes the potential for undetected errors in the system. While software often causes or exacerbates this problem, its form can be exploited to ameliorate the difficulty in what is referred to as a survivability architecture. In a system with a survivability architecture, under adverse conditions such as system damage or software failures, some desirable function will be eliminated but critical services will be retained. Making a system survivable rather than highly reliable or highly available has many advantages, including overall system simplification and reduced demands on assurance technology. In this paper, we explore the motivation for survivability, how it might be used, what the concept means in a precise and testable sense, and how it is being implemented in two very different application areas.

Keywords

IEEE Computer Society Embed System Software Architecture Critical Infrastructure Dependability Requirement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alvisi, L., Marzullo, K.: WAFT: Support for Fault-Tolerance in Wide-Area Object Oriented Systems. In: Proc. 2nd Information Survivability Workshop, October 1998, IEEE Computer Society Press, Los Alamitos (1998)Google Scholar
  2. 2.
    Anderson, T., Witty, R.W.: Safe programming. BIT 18, 1–8 (1978)MATHCrossRefGoogle Scholar
  3. 3.
    Appel, A.: Foundational Proof-Carrying Code. In: IEEE Symposium on Logic in Computer Science, Boston MA (2001)Google Scholar
  4. 4.
    Avizienis, A., Laprie, J., Randell, B.: Fundamental Concepts of Computer System Dependability. In: IARP/IEEE-RAS Workshop on Robot Dependability: Technological Challenge of Dependable Robots in Human Environments, Seoul, Korea (May 2001)Google Scholar
  5. 5.
    Babaoglu, O., Schiper, A.: On Group Communication in Large-Scale Distributed Systems. ACM Operating Systems Review 29(1), 62–67 (1995)CrossRefGoogle Scholar
  6. 6.
    Backes, M., Cachin, C.: Reliable Broadcast In A Computational Hybrid Model With Byzantine Faults, Crashes, And Recoveries. In: International Conference on Dependable Systems and Networks, San Francisco CA (June 2003)Google Scholar
  7. 7.
    Birman, K.: The Process Group Approach to Reliable Distributed Computing. Communications of the ACM 36(12), 37–53 (1993)CrossRefGoogle Scholar
  8. 8.
    Burns, A., Wellings, A.J.: Safety Kernels: Specification and Implementation. High Integrity Systems 1(3), 287–300 (1995)Google Scholar
  9. 9.
    Carzaniga, A., Rosenblum, D., Wolf, A.: Achieving Scalability and Expressiveness in an Internet-scale Event Notification Service. In: Symposium on Principles of Distributed Computing (2000)Google Scholar
  10. 10.
    Cristian, F.: Understanding Fault-Tolerant Distributed Systems. Communications of the ACM 34(2), 56–78 (1991)CrossRefGoogle Scholar
  11. 11.
    Deswarte, Y., Abghour, N., Nicomette, V., Powell, D.: An Intrusion-Tolerant Authorization Scheme for Internet Applications. In: Sup. to Proc. 2002 International Conference on Dependable Systems and Networks, Washington, D.C. (June 2002)Google Scholar
  12. 12.
    Deutsch, M.S., Willis, R.R.: Software Quality Engineering: A Total Technical and Management Approach. Prentice-Hall, Englewood Cliffs (1988)Google Scholar
  13. 13.
    Ellison, B., Fisher, D., Linger, R., Lipson, H., Longstaff, T., Mead, N.: Survivable Network Systems: An Emerging Discipline. Technical Report CMU/SEI- 97-TR-013, Software Engineering Institute, Carnegie Mellon University (November 1997)Google Scholar
  14. 14.
    Federal Aviation Administration Advisory Circular 25.1309-1A, System Design and AnalysisGoogle Scholar
  15. 15.
    Fraser, T., Badger, L., Feldman, M.: Hardening COTS Software with Generic Software Wrappers. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  16. 16.
    Gartner, F.C.: Fundamentals of Fault-Tolerant Distributed Computing in Asynchronous Environments. ACM Computing Surveys 31(1), 1–26 (1999)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Jalote, P.: Fault Tolerance in Distributed Systems. Prentice-Hall, Englewood Cliffs (1994)Google Scholar
  18. 18.
    Knight, J., Elder, M., Flinn, J., Marx, P.: Summaries of Four Critical Infrastructure Systems. Technical Report CS-97-27, Department of Computer Science, University of Virginia (November 1997)Google Scholar
  19. 19.
    Knight, J.C., Heimbigner, D., Wolf, A., Carzaniga, A., Hill, J., Devanbu, P., Gertz, M.: The Willow Architecture: Comprehensive Survivability for Large-Scale Distributed Applications. In: Intrusion Tolerance Workshop, The International Conference on Dependable Systems and Networks, Washington, DC (June 2002)Google Scholar
  20. 20.
    Knight, J.C., Strunk, E.A., Sullivan, K.J.: Towards a Rigorous Definition of Information System Survivability. In: DISCEX 2003, Washington, DC (April 2003)Google Scholar
  21. 21.
    Lala, J.: Foundations of Intrusion Tolerant Systems. IEEE Computer Society Press, Catalog # PR02057 (2003)Google Scholar
  22. 22.
    Leveson, N., Shimeall, T., Stolzy, J., Thomas, J.: Design for Safe Software. In: AIAA Space Sciences Meeting, Reno, Nevada (1983)Google Scholar
  23. 23.
    Magee, J., Dulay, N., Kramer, J.: Structuring Parallel and Distributed Programs. Software Engineering Journal 8(2), 73–82 (1993)CrossRefGoogle Scholar
  24. 24.
    Magee, J., Kramer, J.: Darwin: An Architectural Description Language (1998), http://www-dse.doc.ic.ac.uk/research/darwin/darwin.html
  25. 25.
    Melliar-Smith, P., Moser, L.: Surviving Network Partitioning. IEEE Computer 31(3), 62–68 (1998)Google Scholar
  26. 26.
    Myers, J.F.: On Evaluating The Performability Of Degradable Computing Systems. IEEE Transactions on Computers 29(8), 720–731 (1980)CrossRefGoogle Scholar
  27. 27.
    Myers, J.F., Sanders, W.H.: Specification And Construction Of Performability Models. In: Proc. Second International Workshop on Performability Modeling of Computer and Communication Systems, June 1993, Mont Saint-Michel, France (1993)Google Scholar
  28. 28.
    Nace, W., Koopman, P.: A Product Family Based Approach to Graceful Degradation. In: DIPES 2000, Paderborn, Germany (October 2000)Google Scholar
  29. 29.
    Peters, D.K., Parnas, D.L.: Requirements-based Monitors for Real-time Systems. IEEE Trans. on Software Engineering 28(2), 146–158 (2002)CrossRefGoogle Scholar
  30. 30.
    Powell, D., Stroud, R. (eds.): Conceptual Model and Architecture of MAFTIA, http://www.newcastle.research.ec.org/maftia/deliverables/D21.pdf
  31. 31.
    Ramasamy, H., Pandey, P., Lyons, J., Cukier, M., Sanders, W.: Quantifying the Cost of Providing Intrusion Tolerance in Group Communications. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  32. 32.
    Reynolds, J., Just, J., Lawson, E., Clough, L., Maglich, R., Levitt, K.: The Design and Implementation of an Intrusion Tolerant System. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  33. 33.
    Rowanhill, J.C., Varner, P.E., Knight, J.C.: Efficient Hierarchic Management For Reconfiguration of Networked Information Systems. In: The International Conference on Dependable Systems and Networks (DSN 2004), Florence, Italy (June 2004)Google Scholar
  34. 34.
    Rushby, J.: Kernels for Safety? In: Anderson, T. (ed.) Safe and Secure Computing Systems, Blackwell Scientific Publications, Malden (1989)Google Scholar
  35. 35.
    Sames, D., Matt, B., Niebuhr, B., Tally, G., Whitmore, B., Bakken, D.: Developing a Heterogeneous Intrusion Tolerant CORBA Systems. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  36. 36.
    Scandariato, R., Knight, J.C.: An Automated Defense System to Counter Internet Worms. Technical Report CS-2004-12, Department of Computer Science, University of Virginia (March 2004)Google Scholar
  37. 37.
    Schlichting, R.D., Schneider, F.B.: Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Transactions on Computing Systems 1(3), 222–238Google Scholar
  38. 38.
    Sha, L.: Using Simplicity to Control Complexity. IEEE Software 18(4), 20–28 (2001)CrossRefGoogle Scholar
  39. 39.
    Sha, L., Rajkumar, R., Gagliardi, M.: A Software Architecture for Dependable and Evolvable Industrial Computing Systems. Technical Report CMU/SEI-95- TR-005, Software Engineering Institute, Carnegie Mellon University (1995)Google Scholar
  40. 40.
    Shelton, C., Koopman, P., Nace, W.: A framework for scalable analysis and design of system-wide graceful degradation in distributed embedded systems. In: Eighth IEEE International Workshop on Object-oriented Real-time Dependable Systems, Guadelajara, Mexico (January 2003)Google Scholar
  41. 41.
    Storey, N.: Safety-Critical Computer Systems. Prentice-Hall, Harlow (1996)Google Scholar
  42. 42.
    Strunk, E.: The Role of Natural Language in a Software Product. M.S. Thesis, University of Virginia Dept. of Computer Science (May 2002)Google Scholar
  43. 43.
    Strunk, E.A., Knight, J.C.: Assured Reconfiguration of Embedded Real-Time Software. In: The International Conference on Dependable Systems and Networks (DSN 2004), Florence, Italy (June 2004)Google Scholar
  44. 44.
    Strunk, J., Goodson, G., Scheinholz, M., Soules, C.: Self Securing Storage: Protecting Data in Compromised Systems. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  45. 45.
    Sullivan, K., Knight, J., Du, X., Geist, S.: Information Survivability Control Systems. In: Proc. 21st International Conference on Software Engineering, IEEE Computer Society Press, Los Alamitos (May 1999)Google Scholar
  46. 46.
    U.S. Department of Commerce, National Telecommunications and Information Administration, Institute for Telecommunications Services, Federal Std. 1037C Google Scholar
  47. 47.
    Valetto, G., Kaiser, G.: Using Process Technology to Control and Coordinate Software Adaptation. In: 25th International Conference on Software Engineering, Portland, Or (May 2003)Google Scholar
  48. 48.
    van Renesse, R., Birman, K., Maffeis, S.: Horus: A Flexible Group Communications System. Comm. of the ACM 39(4), 76–83 (1996)CrossRefGoogle Scholar
  49. 49.
    van Renesse, R., Birman, K., Hayden, M., Vaysburd, A., Karr, D.: Building Adaptive Systems Using Ensemble. Technical Report TR97-1638, Department of Computer Science, Cornell University (July 1997)Google Scholar
  50. 50.
    Van Renesse, R., Birman, K., Vogels, W.: Astrolabe: A Robust and Scalable Technology for Distributed System Monitoring, Management, and Data Mining. ACM Transactions on Computer Systems 21(2), 164–206 (2003)CrossRefGoogle Scholar
  51. 51.
    Veríssimo, P., Neves, N.F., Correia, M.: Intrusion-Tolerant Architectures: Concepts and Design (extended). Technical Report DI/FCUL TR03-5, Department of Computer Science, University of Lisboa (2003)Google Scholar
  52. 52.
    Wang, F., Jou, F., Gong, F., Sargor, C., Goseva-Popstojanova, K., Trivedi, K.: SITAR: A Scalable Intrusion-Tolerant Architecture for Distributed Services. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar
  53. 53.
    Wika, K.J., Knight, J.C.: On The Enforcement of Software Safety Policies. In: Proceedings of the Tenth Annual Conference on Computer Assurance (COMPASS), Gaithersburg, MD (1995)Google Scholar
  54. 54.
    Zhou, L., Schneider, F., Renesse, R.: COCA: A Secure Distributed Online Certification Authority. In: Lala, J. (ed.) OASIS: Foundations of Intrusion Tolerant Systems, IEEE Computer Society Press, Los Alamitos (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • John C. Knight
    • 1
  • Elisabeth A. Strunk
    • 1
  1. 1.Department of Computer ScienceUniversity of VirginiaCharlottesvilleUSA

Personalised recommendations