We present an efficient and easy-to-use methodology to predict—at design time—the availability of systems that support local recovery. Our analysis techniques work at the architectural level, where the software designer simply inputs the software modules’ decomposition annotated with failure and repair rates. From this decomposition, we automatically generate an analytical model (a continuous-time Markov chain), from which an availability measure is then computed, in a completely automated way. A crucial step is the use of intermediate models in the input/output interactive Markov chain formalism, which makes our techniques efficient, mathematically rigorous, and easy to adapt. In particular, we use aggressive minimization techniques to keep the size of the generated state spaces small. We have applied our methodology on a realistic case study, namely the MPlayer open-source software. We have investigated four different decomposition alternatives and compared our analytical results with the measured availability on a running MPlayer. We found that our predicted results closely match the measured ones .
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
An important component used within a software architecture that supports local recovery.
Proactively restarting a software component to mitigate its aging and thus its failure.
Interaction between the RUs is redirected through Inter-Process Communication.
Modeled as part of the RM as mentioned in Sect. 4.
As described later, for each RU, two models are in fact generated.
The recovery time includes the time for restarting failed modules and also the time for error detection, error notification and diagnosis.
An exponential distribution might not be, in some cases, a realistic choice; however, it is also possible to use a phase-type distribution which approximates any distribution arbitrarily closely.
Note that these models are used for availability estimation only and they do not necessarily reflect software implementation. In an actual implementation, a module might not be aware of its failure to notify it. External error detection mechanisms can be employed (Sozer et al. 2009) for this purpose.
Alvarez, G., & Cristian, F. (1997). Centralized failure injection for distributed, fault-tolerant protocol testing. In Proceedings of the 17th international conference on distributed computing systems, pp. 78–85.
Avizienis, A., Laprie, J. C., Randell, B., & Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.
Bernardi, S., Merseguer, J., & Petriu, D. (2011). A dependability profile within MARTE. Software and Systems Modeling, 10(3), 313–336.
Bernardi, S., Merseguer, J., Petriu, D., & Dorina, C. (2012). Dependability modeling and analysis of software systems specified with UML. ACM Computing Surveys, 45(1), 1–48.
Boudali, H., Crouzen, P., & Stoelinga, M. (2007a). A compositional semantics for dynamic fault trees in terms of Interactive Markov Chains. In Proceedings of the 5th international symposium on automated technology for verification and analysis, lecture notes on computer science (LNCS), pp. 441–456.
Boudali, H., Crouzen, P., & Stoelinga, M. (2007b). Dynamic fault tree analysis using input/output interactive Markov chains. In Proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN), pp. 708–717.
Boudali, H., Crouzen, P., Haverkort, B. R., Kuntz, M., & Stoelinga, M. (2008). Architectural dependability evaluation with arcade. In Proceedings of the 38th IEEE/IFIP international conference on dependable systems and networks (DSN), IEEE, pp. 512–521.
Bowles, J., Dobbins, J., & Gregory, J. (2004). Approximate reliability and availability models for high availability and fault-tolerant systems with repair. Quality and Reliability Engineering International, 20(7), 679–697.
Bozzano, M., Cimatti, A., Katoen, J. P., Nguyen, V. Y., Noll, T., & Roveri, M. (2011). Safety, dependability and performance analysis of extended AADL models. The Computer Journal, 54(5), 754–775.
Brosch, F., Koziolek, H., Buhnova, B., & Reussner, R. (2012). Architecture-based reliability prediction with the palladio component model. IEEE Transactions on Software Engineering, 38(6), 1319–1339.
Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., & Fox, A. (2004b). Microreboot: A technique for cheap recovery. In Proceedings of the 6th symposium on operating systems design and implementation (OSDI), San Francisco, CA, pp. 31–44.
Candea, G., Cutler, J., & Fox, A. (2004a). Improving availability with recursive micro-reboots: A soft-state system case study. Performance Evaluation, 56(1–4), 213–248.
Clements, P., Bachman, F., Bass, L., Garlan, D., Ivers, J., Little, R., et al. (2010). Documenting software architectures: Views and beyond (2nd ed.). Reading, MA: Addison-Wesley.
Das, O., & Woodside, C. (1998). The fault-tolerant layered queueing network model for performability of distributed systems. In Proceedings of the international performance and dependability symposium (IPDS) (pp. 132–141). Durham, NC.
Dashofy, E., van der Hoek, A., & Taylor, R. (2002). An infrastructure for the rapid development of XML-based architecture description languages. In International conference on software engineering (ICSE) (pp. 266–276). Orlando, Florida: ACM.
Devroye, L. (1986). Non-uniform random variate generation. Berlin: Springer.
Dugan, J., & Lyu, M. (1995). Dependability modeling for fault-tolerant software and systems. In M. R. Lyu (Ed.), Software fault tolerance, chapter 5 (pp. 109–138). London: Wiley.
Durares, J., & Henrique, S. (2006). Emulation of software faults: A field data study and a practical approach. IEEE Transactions on Software Engineering, 32(11), 849–867.
ECLIPSE (2015) Eclipse foundation. http://www.eclipse.org/
Franco, J., Barbosa, R., & Zenha-Rela, M. (2012). Automated reliability prediction from formal architectural descriptions. In Proceedings of joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), pp. 302–309.
Franco, J., Barbosa, R., & Zenha-Rela, M. (2014). Availability evaluation of software architectures through formal methods. In Proceedings of the 9th international conference on the quality of information and communications technology (QUATIC), pp. 282–287.
Garavel, H., & Lang, F. (2001). SVL: A scripting language for compositional verification. In Proceedings of the international conference on formal techniques for networked and distributed systems (FORTE), pp. 377–394.
Garavel, H., Lang, F., Mateescu, R., & Serwe, W. (2007). CADP 2006: A toolbox for the construction and analysis of distributed processes. In Computer-aided verification (CAV), Springer, Lecture Notes on Computer Science (LNCS), vol. 4590, pp. 158–163.
Garlan, D., Monroe, R., & Wile, D. (1997). Acme: An architecture description interchange language. In Proceedings of conference of the centre for advanced studies on collaborative research (CASCON), pp. 169–183.
Garland, S., Lynch, N., Tauber, J., & Vaziri, M. (2004). IOA user guide and reference manual. Tech. rep., MIT CSAI Laboratory, Cambridge, MA.
Geist, R., & Trivedi, K. (1990). Reliability estimation of fault-tolerant systems: Tools and techniques. IEEE Computer, 23(7), 52–61.
Hermanns, H. (2002). Interactive Markov Chains: The quest for quantified quality, lecture notes on computer science (LNCS), vol. 2428.
Hermanns, H., & Katoen, J. P. (2000). Automated compositional Markov chain generation for a plain-old telephone system. Science of Computer Programming, 36(1), 97–127.
Hunt, G. C., et al. (2007). Sealing OS processes to improve dependability and safety. ACM SIGOPS Operating Systems Review, 41(3), 341–354.
Immonen, A., & Niemel, E. (2008). Survey of reliability and availability prediction methods from the viewpoint of software architecture. Software and Systems Modeling, 7(1), 49–65.
Joyce, J. (2007). Architecting dependable systems with the sae architecture analysis and description language (AADL). In R. de Lemos, C. Gacek, & A. Romanovsky (Eds.), Architecting dependable systems, IV, lecture notes in computer science, vol. 4615 (pp. 1–13). Berlin: Springer.
Kuntz, M., & Haverkort, B. R. (2008). Formal dependability engineering with MIOA. Technical Report TR-CTIT-08-39.
Lai, C. D., et al. (2002). A model for availability analysis of distributed software/hardware systems. Information and Software Technology, 44(6), 343–350.
Lynch, N., & Tuttle, M. (1989). An introduction to input/output automata. CWI Quarterly, 2(3), 219–246.
Maier, M., Emery, D., & Hilliard, R. (2001). Software architecture: Introducing IEEE standard 1471. IEEE Computer, 34(4), 107–109.
Majzik, I., & Huszerl, G. (2002). Towards dependability modeling of FT-CORBA architectures. In Proceedings of the 4th European dependable computing conference, lecture notes on computer science (LNCS), pp. 121–139.
Monnet, S., & Bertier, M. (2007). Using failure injection mechanisms to experiment and evaluate a grid failure detector. In M. Dayde, J. Palma, A. Coutinho, E. Pacitti, & J. Lopes (Eds.), High performance computing for computational science, vol. 4395 (pp. 610–621). Berlin, Heidelberg: Springer.
MPLAYER (2015) MPlayer official website. http://www.mplayerhq.hu/
Rugina, A. E., Kanoun, K., & Kaaniche, M. (2007). A system dependability modeling framework using AADL and GSPNs. In R. de Lemos, C. Gacek, & A. Romanovsky (Eds.), Architecting dependable systems, IV, lecture notes in computer science, vol. 4615 (pp. 14–38). Berlin: Springer.
Sozer, H. (2009). Architecting fault-tolerant software systems. Ph.D. thesis, University of Twente, Enschede, The Netherlands.
Sozer, H., & Tekinerdogan, B. (2008). Introducing recovery style for modeling and analyzing system recovery. In Proceedings of the 7th working IEEE/IFIP conference on software architecture (WICSA), (pp. 167–176). Canada: Vancouver.
Sozer, H., Tekinerdogan, B., & Aksit, M. (2009). FLORA: A framework for decomposing software architecture to introduce local recovery. Software: Practice and Experience, 39(10), 869–889.
Sozer, H., Tekinerdogan, B., & Aksit, M. (2013). Optimizing decomposition of software architecture for local recovery. Software Quality Journal, 21(2), 203–240.
Vaidyanathan, K., & Trivedi, K. S. (2005). A comprehensive model for software rejuvenation. IEEE Transactions on Dependable and Secure Computing, 2(2), 124–137.
We thank Pepijn Crouzen for his help using CADP and Boudewijn Haverkort for his comments on an earlier version of this paper. This work has been carried out as a part of the TRADER project under the responsibilities of the Embedded Systems Institute. This work is partially supported by the Dutch Ministry of Economic Affairs under the BSIK program; by the Netherlands Organization for Scientific Research (NWO) under FOCUS/BRICKS Grant Number 642.000.505 (MOQS); and by the EU under Grants Numbers IST-004527 (ARTIST2) and FP7-ICT-2007-1 (QUASIMODO).
About this article
Cite this article
Sözer, H., Stoelinga, M., Boudali, H. et al. Availability analysis of software architecture decomposition alternatives for local recovery. Software Qual J 25, 553–579 (2017). https://doi.org/10.1007/s11219-016-9315-9
- Fault tolerance
- Local recovery
- Software architecture evaluation