Availability analysis of software architecture decomposition alternatives for local recovery


We present an efficient and easy-to-use methodology to predict—at design time—the availability of systems that support local recovery. Our analysis techniques work at the architectural level, where the software designer simply inputs the software modules’ decomposition annotated with failure and repair rates. From this decomposition, we automatically generate an analytical model (a continuous-time Markov chain), from which an availability measure is then computed, in a completely automated way. A crucial step is the use of intermediate models in the input/output interactive Markov chain formalism, which makes our techniques efficient, mathematically rigorous, and easy to adapt. In particular, we use aggressive minimization techniques to keep the size of the generated state spaces small. We have applied our methodology on a realistic case study, namely the MPlayer open-source software. We have investigated four different decomposition alternatives and compared our analytical results with the measured availability on a running MPlayer. We found that our predicted results closely match the measured ones .

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    An important component used within a software architecture that supports local recovery.

  2. 2.

    Proactively restarting a software component to mitigate its aging and thus its failure.

  3. 3.

    Interaction between the RUs is redirected through Inter-Process Communication.

  4. 4.

    Modeled as part of the RM as mentioned in Sect. 4.

  5. 5.

    As described later, for each RU, two models are in fact generated.

  6. 6.

    The recovery time includes the time for restarting failed modules and also the time for error detection, error notification and diagnosis.

  7. 7.

    An exponential distribution might not be, in some cases, a realistic choice; however, it is also possible to use a phase-type distribution which approximates any distribution arbitrarily closely.

  8. 8.

    Note that these models are used for availability estimation only and they do not necessarily reflect software implementation. In an actual implementation, a module might not be aware of its failure to notify it. External error detection mechanisms can be employed (Sozer et al. 2009) for this purpose.


  1. Alvarez, G., & Cristian, F. (1997). Centralized failure injection for distributed, fault-tolerant protocol testing. In Proceedings of the 17th international conference on distributed computing systems, pp. 78–85.

  2. Avizienis, A., Laprie, J. C., Randell, B., & Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.

    Article  Google Scholar 

  3. Bernardi, S., Merseguer, J., & Petriu, D. (2011). A dependability profile within MARTE. Software and Systems Modeling, 10(3), 313–336.

    Article  Google Scholar 

  4. Bernardi, S., Merseguer, J., Petriu, D., & Dorina, C. (2012). Dependability modeling and analysis of software systems specified with UML. ACM Computing Surveys, 45(1), 1–48.

    Article  MATH  Google Scholar 

  5. Boudali, H., Crouzen, P., & Stoelinga, M. (2007a). A compositional semantics for dynamic fault trees in terms of Interactive Markov Chains. In Proceedings of the 5th international symposium on automated technology for verification and analysis, lecture notes on computer science (LNCS), pp. 441–456.

  6. Boudali, H., Crouzen, P., & Stoelinga, M. (2007b). Dynamic fault tree analysis using input/output interactive Markov chains. In Proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN), pp. 708–717.

  7. Boudali, H., Crouzen, P., Haverkort, B. R., Kuntz, M., & Stoelinga, M. (2008). Architectural dependability evaluation with arcade. In Proceedings of the 38th IEEE/IFIP international conference on dependable systems and networks (DSN), IEEE, pp. 512–521.

  8. Bowles, J., Dobbins, J., & Gregory, J. (2004). Approximate reliability and availability models for high availability and fault-tolerant systems with repair. Quality and Reliability Engineering International, 20(7), 679–697.

    Article  Google Scholar 

  9. Bozzano, M., Cimatti, A., Katoen, J. P., Nguyen, V. Y., Noll, T., & Roveri, M. (2011). Safety, dependability and performance analysis of extended AADL models. The Computer Journal, 54(5), 754–775.

    Article  Google Scholar 

  10. Brosch, F., Koziolek, H., Buhnova, B., & Reussner, R. (2012). Architecture-based reliability prediction with the palladio component model. IEEE Transactions on Software Engineering, 38(6), 1319–1339.

    Article  Google Scholar 

  11. Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., & Fox, A. (2004b). Microreboot: A technique for cheap recovery. In Proceedings of the 6th symposium on operating systems design and implementation (OSDI), San Francisco, CA, pp. 31–44.

  12. Candea, G., Cutler, J., & Fox, A. (2004a). Improving availability with recursive micro-reboots: A soft-state system case study. Performance Evaluation, 56(1–4), 213–248.

    Article  Google Scholar 

  13. Clements, P., Bachman, F., Bass, L., Garlan, D., Ivers, J., Little, R., et al. (2010). Documenting software architectures: Views and beyond (2nd ed.). Reading, MA: Addison-Wesley.

    Google Scholar 

  14. Das, O., & Woodside, C. (1998). The fault-tolerant layered queueing network model for performability of distributed systems. In Proceedings of the international performance and dependability symposium (IPDS) (pp. 132–141). Durham, NC.

  15. Dashofy, E., van der Hoek, A., & Taylor, R. (2002). An infrastructure for the rapid development of XML-based architecture description languages. In International conference on software engineering (ICSE) (pp. 266–276). Orlando, Florida: ACM.

  16. Devroye, L. (1986). Non-uniform random variate generation. Berlin: Springer.

    Book  MATH  Google Scholar 

  17. Dugan, J., & Lyu, M. (1995). Dependability modeling for fault-tolerant software and systems. In M. R. Lyu (Ed.), Software fault tolerance, chapter 5 (pp. 109–138). London: Wiley.

    Google Scholar 

  18. Durares, J., & Henrique, S. (2006). Emulation of software faults: A field data study and a practical approach. IEEE Transactions on Software Engineering, 32(11), 849–867.

    Article  Google Scholar 

  19. ECLIPSE (2015) Eclipse foundation. http://www.eclipse.org/

  20. Franco, J., Barbosa, R., & Zenha-Rela, M. (2012). Automated reliability prediction from formal architectural descriptions. In Proceedings of joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp. 302–309.

  21. Franco, J., Barbosa, R., & Zenha-Rela, M. (2014). Availability evaluation of software architectures through formal methods. In Proceedings of the 9th international conference on the quality of information and communications technology (QUATIC), pp. 282–287.

  22. Garavel, H., & Lang, F. (2001). SVL: A scripting language for compositional verification. In Proceedings of the international conference on formal techniques for networked and distributed systems (FORTE), pp. 377–394.

  23. Garavel, H., Lang, F., Mateescu, R., & Serwe, W. (2007). CADP 2006: A toolbox for the construction and analysis of distributed processes. In Computer-aided verification (CAV), Springer, Lecture Notes on Computer Science (LNCS), vol. 4590, pp. 158–163.

  24. Garlan, D., Monroe, R., & Wile, D. (1997). Acme: An architecture description interchange language. In Proceedings of conference of the centre for advanced studies on collaborative research (CASCON), pp. 169–183.

  25. Garland, S., Lynch, N., Tauber, J., & Vaziri, M. (2004). IOA user guide and reference manual. Tech. rep., MIT CSAI Laboratory, Cambridge, MA.

  26. Geist, R., & Trivedi, K. (1990). Reliability estimation of fault-tolerant systems: Tools and techniques. IEEE Computer, 23(7), 52–61.

    Article  Google Scholar 

  27. Hermanns, H. (2002). Interactive Markov Chains: The quest for quantified quality, lecture notes on computer science (LNCS), vol. 2428.

  28. Hermanns, H., & Katoen, J. P. (2000). Automated compositional Markov chain generation for a plain-old telephone system. Science of Computer Programming, 36(1), 97–127.

    Article  MATH  Google Scholar 

  29. Hunt, G. C., et al. (2007). Sealing OS processes to improve dependability and safety. ACM SIGOPS Operating Systems Review, 41(3), 341–354.

    Article  Google Scholar 

  30. Immonen, A., & Niemel, E. (2008). Survey of reliability and availability prediction methods from the viewpoint of software architecture. Software and Systems Modeling, 7(1), 49–65.

    Article  Google Scholar 

  31. Joyce, J. (2007). Architecting dependable systems with the sae architecture analysis and description language (AADL). In R. de Lemos, C. Gacek, & A. Romanovsky (Eds.), Architecting dependable systems, IV, lecture notes in computer science, vol. 4615 (pp. 1–13). Berlin: Springer.

    Google Scholar 

  32. Kuntz, M., & Haverkort, B. R. (2008). Formal dependability engineering with MIOA. Technical Report TR-CTIT-08-39.

  33. Lai, C. D., et al. (2002). A model for availability analysis of distributed software/hardware systems. Information and Software Technology, 44(6), 343–350.

    Article  Google Scholar 

  34. Lynch, N., & Tuttle, M. (1989). An introduction to input/output automata. CWI Quarterly, 2(3), 219–246.

    MathSciNet  MATH  Google Scholar 

  35. Maier, M., Emery, D., & Hilliard, R. (2001). Software architecture: Introducing IEEE standard 1471. IEEE Computer, 34(4), 107–109.

    Article  Google Scholar 

  36. Majzik, I., & Huszerl, G. (2002). Towards dependability modeling of FT-CORBA architectures. In Proceedings of the 4th European dependable computing conference, lecture notes on computer science (LNCS), pp. 121–139.

  37. Monnet, S., & Bertier, M. (2007). Using failure injection mechanisms to experiment and evaluate a grid failure detector. In M. Dayde, J. Palma, A. Coutinho, E. Pacitti, & J. Lopes (Eds.), High performance computing for computational science, vol. 4395 (pp. 610–621). Berlin, Heidelberg: Springer.

  38. MPLAYER (2015) MPlayer official website. http://www.mplayerhq.hu/

  39. Rugina, A. E., Kanoun, K., & Kaaniche, M. (2007). A system dependability modeling framework using AADL and GSPNs. In R. de Lemos, C. Gacek, & A. Romanovsky (Eds.), Architecting dependable systems, IV, lecture notes in computer science, vol. 4615 (pp. 14–38). Berlin: Springer.

    Google Scholar 

  40. Sozer, H. (2009). Architecting fault-tolerant software systems. Ph.D. thesis, University of Twente, Enschede, The Netherlands.

  41. Sozer, H., & Tekinerdogan, B. (2008). Introducing recovery style for modeling and analyzing system recovery. In Proceedings of the 7th working IEEE/IFIP conference on software architecture (WICSA), (pp. 167–176). Canada: Vancouver.

  42. Sozer, H., Tekinerdogan, B., & Aksit, M. (2009). FLORA: A framework for decomposing software architecture to introduce local recovery. Software: Practice and Experience, 39(10), 869–889.

    Google Scholar 

  43. Sozer, H., Tekinerdogan, B., & Aksit, M. (2013). Optimizing decomposition of software architecture for local recovery. Software Quality Journal, 21(2), 203–240.

    Article  Google Scholar 

  44. Vaidyanathan, K., & Trivedi, K. S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.

    Article  Google Scholar 

Download references


We thank Pepijn Crouzen for his help using CADP and Boudewijn Haverkort for his comments on an earlier version of this paper. This work has been carried out as a part of the TRADER project under the responsibilities of the Embedded Systems Institute. This work is partially supported by the Dutch Ministry of Economic Affairs under the BSIK program; by the Netherlands Organization for Scientific Research (NWO) under FOCUS/BRICKS Grant Number 642.000.505 (MOQS); and by the EU under Grants Numbers IST-004527 (ARTIST2) and FP7-ICT-2007-1 (QUASIMODO).

Author information



Corresponding author

Correspondence to Hasan Sözer.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sözer, H., Stoelinga, M., Boudali, H. et al. Availability analysis of software architecture decomposition alternatives for local recovery. Software Qual J 25, 553–579 (2017). https://doi.org/10.1007/s11219-016-9315-9

Download citation


  • Dependability
  • Availability
  • Fault tolerance
  • Local recovery
  • Software architecture evaluation