Fault-Adaptivity in Hard Real-Time Component-Based Software Systems

  • Abhishek Dubey
  • Gabor Karsai
  • Nagabhushan Mahadevan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7475)


Complexity in embedded software systems has reached the point where we need run-time mechanisms that provide fault management services. Testing and verification may not cover all possible scenarios that a system encounters, hence a simpler, yet formally specified run-time monitoring, diagnosis, and fault mitigation architecture is needed to increase the software system’s dependability. The approach described in this paper borrows concepts and principles from the field of ‘Systems Health Management’ for complex aerospace systems and implements a novel two level health management architecture that can be applied in the context of a model-based software development process.

At the first level, the Component-level Health Manager (CLHM) provides localized and limited service for managing the health of individual software components. A higher-level System-level Health Manager (SLHM) manages the health of the overall system. SLHM includes a diagnosis engine that uses a Timed Failure Propagation (TFPG) model automatically synthesized from the system specification built in the model-based design environment that accompanies the runtime system. SLHM also includes a reactive timed state machine used for mitigation, whose code is also generated from the model-based specification. This paper uses simple examples to illustrate the use of the approach.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    ARINC specification 653-2: Avionics application software standard interface part 1 - Required Services. Aeronautical Radio, lnc. Google Scholar
  2. 2.
    Abdelwahed, S., Karsai, G., Mahadevan, N., Ofsthun, S.C.: Practical considerations in systems diagnosis using timed failure propagation graph models. IEEE Transactions on Instrumentation and Measurement 58(2), 240–247 (2009)CrossRefGoogle Scholar
  3. 3.
    Abdelwahed, S., Karsai, G., Biswas, G.: A consistency-based robust diagnosis approach for temporal causal systems. In: 16th International Workshop on Principles of Diagnosis, pp. 73–79 (2005)Google Scholar
  4. 4.
    Avizienis, A., Laprie, J.C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing 1(1), 11–33 (2004)CrossRefGoogle Scholar
  5. 5.
    Bureau, A.T.S.: In-flight upset; 240km NW Perth, WA; Boeing Co 777-200, 9M-MRG. Tech. rep. (August 2005), http://www.atsb.gov.au/publications/investigation_reports/2005/AAIR/aair200503722.aspx
  6. 6.
    Bureau, A.T.S.: AO-2008-070: In-flight upset, 154 km west of Learmonth, WA, 7, VH-QPA, Airbus A330-303. Tech. rep (October 2008), http://www.atsb.gov.au/publications/investigation_reports/2008/AAIR/aair200806143.aspx
  7. 7.
    Bustard, D.W., Sterritt, R.: A requirements engineering perspective on autonomic systems development. In: Autonomic Computing: Concepts, Infrastructure, and Applications, pp. 19–33 (2006)Google Scholar
  8. 8.
    Butler, R.: A primer on architectural level fault tolerance. Tech. rep., NASA Scientific and Technical Information (STI) Program Office, Report No. NASA/TM-2008-215108 (2008), http://shemesh.larc.nasa.gov/fm/papers/Butler-TM-2008-215108-Primer-FT.pdf
  9. 9.
    Charette, R.: This car runs on code. IEEE Spectrum (February 2009)Google Scholar
  10. 10.
    Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J., Andersson, J., Becker, B., Bencomo, N., Brun, Y., Cukic, B., Di Marzo Serugendo, G., Dustdar, S., Finkelstein, A., Gacek, C., Geihs, K., Grassi, V., Karsai, G., Kienle, H.M., Kramer, J., Litoiu, M., Malek, S., Mirandola, R., Müller, H.A., Park, S., Shaw, M., Tichy, M., Tivoli, M., Weyns, D., Whittle, J.: Software Engineering for Self-Adaptive Systems: A Research Roadmap. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 1–26. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  11. 11.
    Dashofy, E.M., van der Hoek, A., Taylor, R.N.: Towards architecture-based self-healing systems. In: WOSS 2002: Proceedings of the First Workshop on Self-healing Systems, pp. 21–26. ACM Press, New York (2002)CrossRefGoogle Scholar
  12. 12.
    DO-178B, Software considerations in airborne systems and equipment certification. RTCA, Incorporated (1992)Google Scholar
  13. 13.
    Dubey, A., Karsai, G., Mahadevan, N.: Towards model-based software health management for real-time systems. Tech. Rep. ISIS-10-106, Institute for Software Integrated Systems, Vanderbilt University (August 2010), http://isis.vanderbilt.edu/node/4196
  14. 14.
    Dubey, A., Karsai, G., Mahadevan, N.: A component model for hard real-time systems: CCM with ARINC-653. Software: Practice and Experience 41(12), 1517–1550 (2011), http://dx.doi.org/10.1002/spe.1083Google Scholar
  15. 15.
    Dubey, A., Karsai, G., Mahadevan, N.: Model-based Software Health Management for Real-Time Systems. In: IEEE Aerospace Conference, pp. 1–18. IEEE (2011)Google Scholar
  16. 16.
    Dubey, A., Mahadevan, N., Karsai, G.: The inertial measurement unit example: A software health management case study. Tech. Rep. ISIS-12-101, Institute for Software Integrated Systems, Vanderbilt University (February 2012), http://isis.vanderbilt.edu/node/4496
  17. 17.
    Garlan, D., Cheng, S.W., Schmerl, B.: Increasing System Dependability Through Architecture-based self-repair. In: de Lemos, R., Gacek, C., Romanovsky, A. (eds.) Architecting Dependable Systems. LNCS, vol. 2677, pp. 61–89. Springer, Heidelberg (2003), http://dl.acm.org/citation.cfm?id=1768179.1768183CrossRefGoogle Scholar
  18. 18.
    Greenwell, W.S., Knight, J., Knight, J.C.: What should aviation safety incidents teach us? In: SAFECOMP 2003, The 22nd International Conference on Computer Safety, Reliability and Security (2003)Google Scholar
  19. 19.
    Harel, D.: Statecharts: a visual formalism for complex systems. Science of Computer Programming 8(3), 231–274 (1987), http://www.sciencedirect.com/science/article/pii/0167642387900359MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Hayden, S., Oza, N., Mah, R., Mackey, R., Narasimhan, S., Karsai, G., Poll, S., Deb, S., Shirley, M.: Diagnostic technology evaluation report for on-board crew launch vehicle. Tech. rep., NASA (2006)Google Scholar
  21. 21.
    Jaffe, M., Busser, R., Daniels, D., Delseny, H., Romanski, G.: Progress report on some proposed upgrades to the conceptual underpinnings of do-178b/ed-12b. In: 2008 3rd IET International Conference on System Safety, pp. 1–6. IET (2008)Google Scholar
  22. 22.
    Johnson, S., Gormley, T., Kessler, S., Mott, C., Patterson-Hine, A., Reichard, K., Scandura Jr., P.: System Health Management: With Aerospace Applications. John Wiley & Sons, Inc. (2011)Google Scholar
  23. 23.
    de Lemos, R.: Analysing failure behaviours in component interaction. Journal of Systems and Software 71(1-2), 97–115 (2004)CrossRefGoogle Scholar
  24. 24.
    Lightstone, S.: Seven software engineering principles for autonomic computing development. ISSE 3(1), 71–74 (2007)Google Scholar
  25. 25.
    Lyu, M.R.: Software Fault Tolerance. John Wiley & Sons, Inc., New York (1995), http://www.cse.cuhk.edu.hk/~lyu/book/sft/Google Scholar
  26. 26.
    Lyu, M.R.: Software reliability engineering: A roadmap. In: 2007 Future of Software Engineering, FOSE 2007, pp. 153–170. IEEE Computer Society, Washington, DC (2007), http://dx.doi.org/10.1109/FOSE.2007.24Google Scholar
  27. 27.
    Mahadevan, N., Dubey, A., Karsai, G.: Application of software health management techniques. In: Proceedings of the 2011 ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2011. ACM, New York (2011)Google Scholar
  28. 28.
    Potocti de Montalk, J.: Computer software in civil aircraft. In: IEEE/AIAA 10th Digital Avionics Systems Conference, pp. 324–330 (October 1991)Google Scholar
  29. 29.
    NASA: Report on the loss of the mars polar lander and deep space 2 missions. Tech. rep., NASA (2000), ftp://ftp.hq.nasa.gov/pub/pao/reports/2000/2000_mpl_report_1.pdf
  30. 30.
    Ofsthun, S.: Integrated vehicle health management for aerospace platforms. IEEE Instrumentation Measurement Magazine 5(3), 21–24 (2002)CrossRefGoogle Scholar
  31. 31.
    Ofsthun, S.C., Abdelwahed, S.: Practical applications of timed failure propagation graphs for vehicle diagnosis. In: Proc. IEEE Autotestcon, September 17-20, pp. 250–259 (2007)Google Scholar
  32. 32.
    Prisaznuk, P.: Arinc 653 role in integrated modular avionics (IMA). In: IEEE/AIAA 27th Digital Avionics Systems Conference, DASC 2008, pp. 1.E.5–1 – 1.E.5–10. IEEE (2008)Google Scholar
  33. 33.
    Pullum, L.L.: Software fault tolerance techniques and implementation. Artech House, Inc., Norwood (2001)MATHGoogle Scholar
  34. 34.
    Robertson, P., Williams, B.: Automatic recovery from software failure. Commun. ACM 49(3), 41–47 (2006)CrossRefGoogle Scholar
  35. 35.
    Rohr, M., Boskovic, M., Giesecke, S., Hasselbring, W.: Model-driven development of self-managing software systems. In: Proceedings of the Workshop “Models@run.time” at the 9th International Conference on model Driven Engineering Languages and Systems, MoDELS/UML 2006 (2006)Google Scholar
  36. 36.
    Sha, L.: The complexity challenge in modern avionics software. In: National Workshop on Aviation Software Systems: Design for Certifiably Dependable Systems (2006)Google Scholar
  37. 37.
    Shaw, M.: “self-healing”: softening precision to avoid brittleness: position paper for woss 2002: workshop on self-healing systems. In: WOSS 2002: Proceedings of the First Workshop on Self-healing Systems, pp. 111–114. ACM Press, New York (2002)CrossRefGoogle Scholar
  38. 38.
    Srivastava, A., Schumann, J.: The Case for Software Health Management. In: Fourth IEEE International Conference on Space Mission Challenges for Information Technology, SMC-IT 2011, pp. 3–9 (August 2011)Google Scholar
  39. 39.
    Taleb-Bendiab, A., Bustard, D.W., Sterritt, R., Laws, A.G., Keenan, F.: Model-based self-managing systems engineering. In: DEXA Workshops, pp. 155–159 (2005)Google Scholar
  40. 40.
    Torres-Pomales, W.: Software fault tolerance: A tutorial. Tech. rep., NASA (2000), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
  41. 41.
    Wallace, M.: Modular architectural representation and analysis of fault propagation and transformation. Electron. Notes Theor. Comput. Sci. 141(3), 53–71 (2005)CrossRefGoogle Scholar
  42. 42.
    Wang, N., Schmidt, D.C., O’Ryan, C.: Overview of the CORBA component model. In: Component-based Software Engineering: Putting the Pieces Together, pp. 557–571 (2001)Google Scholar
  43. 43.
    Williams, B., Ingham, M., Chung, S., Elliott, P.: Model-based programming of intelligent embedded systems and robotic space explorers. Proceedings of the IEEE 91(1), 212–237 (2003)CrossRefGoogle Scholar
  44. 44.
    Williams, B.C., Ingham, M., Chung, S., Elliott, P., Hofbaur, M., Sullivan, G.T.: Model-based programming of fault-aware systems. AI Magazine 24(4), 61–75 (2004)Google Scholar
  45. 45.
    Zhang, J., Cheng, B.H.C.: Specifying adaptation semantics. In: WADS 2005: Proceedings of the 2005 Workshop on Architecting Dependable Systems, pp. 1–7. ACM, New York (2005)Google Scholar
  46. 46.
    Zhang, J., Cheng, B.H.C.: Model-based development of dynamically adaptive software. In: ICSE 2006: Proceeding of the 28th International Conference on Software Engineering, pp. 371–380. ACM, New York (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Abhishek Dubey
    • 1
  • Gabor Karsai
    • 1
  • Nagabhushan Mahadevan
    • 1
  1. 1.Institute for Software-Integrated SystemsVanderbilt UniversityNashvilleUSA

Personalised recommendations