Skip to main content
Log in

Optimizing decomposition of software architecture for local recovery

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

The increasing size and complexity of software systems has led to an amplified number of potential failures and as such makes it harder to ensure software reliability. Since it is usually hard to prevent all the failures, fault tolerance techniques have become more important. An essential element of fault tolerance is the recovery from failures. Local recovery is an effective approach whereby only the erroneous parts of the system are recovered while the other parts remain available. For achieving local recovery, the architecture needs to be decomposed into separate units that can be recovered in isolation. Usually, there are many different alternative ways to decompose the system into recoverable units. It appears that each of these decomposition alternatives performs differently with respect to availability and performance metrics. We propose a systematic approach dedicated to optimizing the decomposition of software architecture for local recovery. The approach provides systematic guidelines to depict the design space of the possible decomposition alternatives, to reduce the design space with respect to domain and stakeholder constraints and to balance the feasible alternatives with respect to availability and performance. The approach is supported by an integrated set of tools and illustrated for the open-source MPlayer software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Notes

  1. In principle, UML or any ADL can be used for architecture definition. We have used an XML-based ADL for its extensibility capabilities and to be able to utilize the tool support (ArchStudio), which is also highly extensible for integrating our own tools. There exist also other extensible ADLs (di Ruscio et al. 2010) that could be utilized in our approach.

  2. In another study, we have worked on the derivation of Criticality values based on scenario-based analysis (Tekinerdogan et al. 2008).

  3. The graphical representation of MDG is provided only for illustration purposes. This representation can be too complicated for manual interpretation, and as such, it is not exposed to the user. The module dependency data is stored in a database, and it is processed by a tool.

  4. In our implementation, we have used sockets for communication. The performance overhead is introduced mainly due to the marshalling of exchanged messages.

  5. Each RU is a partition in the parlance of (Mitchell and Mancoridis 2006).

  6. The tool is available online at http://srl.ozyegin.edu.tr/tools/ard/.

  7. An RU detects if an expected response to a message is not received within a period of time.

  8. The Recovery Manager is the parent process of all RUs and receives and handles a signal when a child process is dead.

References

  • Aleti, A., Björnander, S., Grunske, L., & Meedeniya, I. (2009). Archeopterix: An extendable tool for architecture optimization of aadl models. In Proceedings of the ICSE 2009 workshop on model-based methodologies for pervasive and embedded software (MOMPES), Vancouver, Canada, pp. 61–71.

  • Alexander, C. (1964). Notes on the synthesis of form. Harvard Cambridge, MA: University Press.

    Google Scholar 

  • Anquetil, N., Fourrier, C., & Lethbridge, T. (1999). Experiments with clustering as a software remodularization method. In Proceedings of the 6th working conference on reverse engineering (WCRE), IEEE Computer Society, pp. 235–245.

  • Athon, T., & Papalambros, P. (1996). A note on weighted criteria methods for compromise solutions in multi-objective optimization. Engineering Optimization, 27(2), 155–176.

    Article  Google Scholar 

  • Avizienis, A., Laprie, J. C., Randell, B., & Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.

    Article  Google Scholar 

  • Bachman, F., Bass, L., & Klein, M. (2003). Deriving architectural tactics: A step toward methodical architectural design. Tech. Rep. CMU/SEI-2003-TR-004, SEI, Pittsburgh, PA, USA.

  • Boudali, H., Sozer, H., & Stoelinga, M. (2009). Architectural availability analysis of software decomposition for local recovery. In Proceedings of the third IEEE international conference on secure software integration and reliability improvement, Shanghai, China, pp. 14–22.

  • Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., & Stal, M. (1996). Pattern-oriented software architecture, a system of patterns. Wiley.

    Google Scholar 

  • Candea, G., Cutler, J., & Fox, A. (2004). Improving availability with recursive micro-reboots: A soft-state system case study. Performance Evaluation, 56(1-4), 213–248.

    Article  Google Scholar 

  • Candea, G., Kawamoto, S., Fujiki, Y., Friedman, G., & Fox, A. (2004b). Microreboot: A technique for cheap recovery. In Proceedings of the 6th symposium on operating systems design and implementation (OSDI), San Francisco, CA, USA, pp. 31–44.

  • Clements, P., Bachmann, F., Bass, L., Garlan, D., Ivers, J., Little, R., Nord, R., & Stafford, J. (2002a). Documenting software architectures: Views and beyond. Boston, MA: Addison-Wesley.

    Google Scholar 

  • Clements, P., Kazman, R., & Klein, M. (2002b). Evaluating software architectures: Methods and case studies. Boston: Addison-Wesley.

    Google Scholar 

  • Patterson, D. et al. (2002). Recovery oriented computing (ROC): Motivation, definition, techniques, and case studies. Technical Report UCB/CSD-02-1175, University of California, Berkeley.

  • Dashofy, E., van der Hoek, A., & Taylor, R. (2002). An infrastructure for the rapid development of XML-based architecture description languages. In Proceedings of the 22rd international conference on software engineering (ICSE), ACM, Orlando, FL, USA, pp. 266–276.

  • Davey, J., & Burd, E. (2000). Evaluating the suitability of data clustering for software remodularization. In Proceedings of the 7th working conference on reverse engineering (WCRE). IEEE Computer Society, pp. 268–278.

  • Dobrica, L., & Niemela, E. (2002). A survey on software architecture analysis methods. IEEE Transactions on Software Engineering, 28(7), 638–654.

    Article  Google Scholar 

  • Fenlason, J., & Stallman, R. (2000). GNU gprof: The GNU profiler. Free Software Foundation, http://www.gnu.org.

  • Gokhale, S. (2007). Architecture-based software reliability analysis: Overview and limitations. IEEE Transactions on Dependable and Secure Computing, 4(1), 32–40.

    Article  Google Scholar 

  • Grassi, V., Mirandola, R., & Sabetta, A. (2005). An XML-based language to support performance and reliability modeling and analysis in software architectures. In R. Reussner, J. Mayer, J. Stafford, S. Overhage, S. Becker, & P. Schroeder (Eds.), QoSA/SOQUA, Springer, Lecture Notes in Computer Science, Vol. 3712, pp. 71–87.

  • Grunske, L., Lindsay, P., Bondarev, E., Papadopoulos, Y., & Parker, D. (2007). An outline of an architecture-based method for optimizing dependability attributes of software-intensive systems. In R. de Lemos, C. Gacek, & A. B. Romanovsky (Eds.), Architecting dependable systems IV (pp. 188–209). Berlin: Springer.

  • Harris, J., Hirst, J., & Mossinghoff, M. (2000). Combinatorics and graph theory. New York: Springer.

  • Herder, J., Bos, H., Gras, B., Homburg, P., & Tanenbaum, A. (2007). Failure resilience for device drivers. In Proceedings of the 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN). Edinburgh, UK, pp. 41–50.

  • Heyliger, G. (1994). Coupling. In J. Marciniak (Ed.), Encyclopedia of software engineering (pp. 220–228). Wiley.

  • Huang, Y., & Kintala, C. (1995). Software fault tolerance in the application layer. In M. R. Lyu (Ed.), Software fault tolerance, chapter 10 (pp. 231–248). New York: Wiley

    Google Scholar 

  • Hunt, G., Aiken, M., Fhndrich, M., Hawblitzel, C., Hodson, O., Larus, J., Levi, S., Steensgaard, B., Tarditi, D., & Wobber, T. (2007). Sealing OS processes to improve dependability and safety. SIGOPS Operating Systems Review, 41(3), 341–354.

    Article  Google Scholar 

  • Jokiaho, T., Herrmann, F., Penkler, D., & Moser, L. (2003). The service availability forum application interface specification. RTC Magazine, 12(6), 52–58.

    Google Scholar 

  • Kang, K., Cohen, S., Hess, J., Novak, W., & Peterson, A. (1990). Feature-oriented domain analysis (FODA) feasibility study. Tech. Rep. CMU/SEI-90-TR-21, SEI.

  • Laprie, J. C., Arlat, J., Beounes, C., & Kanoun, K. (1995). Architectural issues in software fault tolerance. In M. R. Lyu (Ed.), Software fault tolerance, chapter 3 (pp. 47–80). Cichester: Wiley.

  • Lung, C. H., Xu, X., & Zaman, M. (2007). Software architecture decomposition using attributes. International Journal of Software Engineering and Knowledge Engineering, 17, 599–613.

    Article  Google Scholar 

  • Medvidovic, N., & Taylor, R. N. (2000). A classification and comparison framework for software architecture description languages. IEEE Transactions on Software Engineering, 26(1), 70–93.

    Article  Google Scholar 

  • Meedeniya, I., Buhnova, B., Aleti, A., & Grunske L. (2011). Reliability-driven deployment optimization for embedded systems. Journal of Systems and Software, 84(5), 835–846.

    Article  Google Scholar 

  • Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.

    Article  Google Scholar 

  • MPlayer (2010). MPlayer official website. http://www.mplayerhq.hu/. Accessed 31 Mar 2011.

  • Necula, G., McPeak, S., Rahul, S., & Weimer, W. (2002). CIL: Intermediate language and tools for analysis and transformation of C programs. In Proceedings of the conference on compiler construction, pp. 213–228.

  • Nethercote, N., & Seward, J. (2007). Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Notices, 42(6), 89–100.

    Article  Google Scholar 

  • Nguyen, G., Hluchý, L., Tran, V., & Kotocova, M. (2001). DDG task recovery for cluster computing. In Proceedings of the 4th international conference on parallel processing and applied mathematics, Springer, Naleczow, Poland, Lecture Notes in Computer Science, Vol. 2328, pp. 369–378.

  • Object Management Group (2001) Fault tolerant CORBA. Tech. Rep. OMG Document formal/2001-09-29, Object Management Group.

  • Pareto, V. (1896). Cours D’ economie politique. Lausanne, Switzerland: F. Rouge

    Google Scholar 

  • Ross, S. (2007). Introduction to probability models. San Diego: Elsevier Inc.

    Google Scholar 

  • di Ruscio, D., Malavolta, I., Muccini, H., Pelliccione, P., & Pierantonio, A. (2010). Developing next generation ADLs through MDE techniques. In Proceedings of the 32nd international conference on software engineering (ICSE), Cape Town, South Africa, pp. 85–94.

  • Ruskey, F. (1993). Simple combinatorial gray codes constructed by reversing sublists. In Proceedings of the 4th international symposium on algorithms and computation (ISAAC), Springer, Lecture Notes in Computer Science, Vol. 762, pp. 201–208.

  • Ruskey, F. (2003). Combinatorial generation. University of Victoria, Victoria, BC, Canada, manuscript CSC-425/520

  • Santos, G., Duarte, A., Rexachs, D., & Luque, E. (2008). Increasing the performability of computer clusters using RADIC II. In Proceedings of the third international conference on availability, reliability and security, IEEE Computer Society, pp. 653–658.

  • Sozer, H., & Tekinerdogan, B. (2008). Introducing recovery style for modeling and analyzing system recovery. In Proceedings of the 7th working IEEE/IFIP conference on software architecture (WICSA). Vancouver, BC, Canada, pp. 167–176.

  • Sozer, H., Tekinerdogan, B., & Aksit, M. (2009). FLORA: A framework for decomposing software architecture to introduce local recovery. Software: Practice and Experience, 39(10), 869–889.

    Article  Google Scholar 

  • Teitelbaum, T. (2000). Codesurfer. SIGSOFT Software Engineering Notes, 25(1), 99.

    Article  Google Scholar 

  • Tekinerdogan, B., Sozer, H., & Aksit, M. (2008). Software architecture reliability analysis using failure scenarios. Journal of Systems and Software, 81(4), 558–575.

    Article  Google Scholar 

  • TRADER (2011). Trader project, ESI. http://www.esi.nl/projects/trader. Accessed 31-March-2011.

  • de Visser, I. (2008). Analyzing user perceived failure severity in consumer electronics products. PhD thesis, Technische Universiteit Eindhoven, Eindhoven, The Netherlands.

  • White, J., Doughtery, B., Strowd, H., & Schmidt, D. (2009). Creating self-healing service compositions with feature models and microrebooting. International Journal of Business Process Integration and Management, 4, 35–46.

    Article  Google Scholar 

  • Wiggerts, T. (1997). Using clustering algorithms in legacy systems remodularization. In Proceedings of the 4th Working Conference on Reverse Engineering (WCRE), IEEE Computer Society, pp. 33–43.

  • Yacoub, S., Cukic, B., & Ammar, H. (2004). A scenario-based reliability analysis approach for component-based software. IEEE Transactions on Reliability, 53(14), 465–480.

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge the feedback from the discussions with our TRADER project (TRADER, 2011) partners from NXP Research, NXP Semiconductors, TASS, Philips Consumer Electronics, Design Technology Institute, Embedded Systems Institute, IMEC, Leiden University and Delft University of Technology. We thank the anonymous reviewers for their feedback to improve this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hasan Sözer.

Additional information

This work has been carried out as part of the TRADER project (TRADER 2011) under the responsibility of the Embedded Systems Institute. This project is partially supported by the Netherlands Ministry of Economic Affairs under the Bsik program.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sözer, H., Tekinerdoğan, B. & Akşit, M. Optimizing decomposition of software architecture for local recovery. Software Qual J 21, 203–240 (2013). https://doi.org/10.1007/s11219-011-9171-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-011-9171-6

Keywords

Navigation