A Framework for Reconfiguration-Based Fault-Tolerance in Distributed Systems

  • Stefano Porcarelli
  • Marco Castaldi
  • Felicita Di Giandomenico
  • Andrea Bondavalli
  • Paola Inverardi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3069)


Nowadays, many critical services are provided by complex distributed systems which are the result of the reuse and integration of a large number of components. Given their multi-context nature, these components are, in general, not designed to achieve high dependability by themselves, thus their behavior with respect to faults can be the most disparate. Nevertheless, it is paramount for these kinds of systems to be able to survive failures of individual components, as well as attacks and intrusions, although with degraded functionalities. To provide control capabilities over unanticipated events, we focus on fault handling strategies, particularly on system’s reconfiguration. The paper describes a framework which provides fault tolerance of components based applications by detecting failures through monitoring and by recovering through system reconfiguration. The framework is based on Lira, an agent distributed infrastructure for remote control and reconfiguration, and a decision maker for selecting suitable new configurations. Lira allows for monitoring and reconfiguration at components and applications level, while decisions are taken following the feedbacks provided by the evaluation of statistical Petri net models.


Failure Probability Fault Tolerance Faulty Node Application Agent Simple Network Management Protocol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Garlan, D., Cheng, S.W., Schmerl, B.: Increasing System Dependability through Architecture-based Self-repair. In: de Lemos, R., Gacek, C., Romanovsky, A. (eds.) Architecting Dependable Systems. LNCS, vol. 2677, Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Knight, J.C., Heimbigner, D., Wolf, A.L., Carzaniga, A., Hill, J., Devanbu, P., Gertz, M.: The Willow Architecture: Comprehensive Survivability for Large-Scale Distributed Applications. In: International Conference of Dependable Computer and Systems (DSN 2002), Washington DC (2002)Google Scholar
  3. 3.
    Kramer, J., Magee, J.: Dynamic Configuration of Distributed System. IEEE Transaction of Software Engineering SE, 424–436 (1985)CrossRefGoogle Scholar
  4. 4.
    Kramer, J., Magee, J.: The Evolving Philosophers Problem: Dynamic Change Management. IEEE Transactions on Software Engineering 16, 1293–1306 (1990)CrossRefGoogle Scholar
  5. 5.
    Young, A.J., Magee, J.N.: A Flexible Approach to Evolution of Reconfigurable Systems. In: Proc. of IEE/IFIP Int. Workshop on Configurable Distributed Systems (1992)Google Scholar
  6. 6.
    Magee, J.: Configuration of Distributed Systems. In: Sloman, M. (ed.) Network and Distributed Systems Management, Addison-Wesley, Reading (1994)Google Scholar
  7. 7.
    Kramer, J., Magee, J.: Analysing Dynamic Change in Software Architectures: A Case Study. In: Proc. 4th Int. Conf. on Configurable Distributed Architecture, pp. 91–100 (1998)Google Scholar
  8. 8.
    Wermelinger, M.: Towards a Chemical Model for Software Architecture Reconfiguration. In: Proceedings of the 4th International Conference on Configurable Distributed Systems (1998)Google Scholar
  9. 9.
    Castaldi, M., De Angelis, G., Inverardi, P.: A Reconfiguration Language for Remote Analysis and Application Adaptation. In: Orso, A., Porter, A. (eds.) Proceedings of Remote Analysis and Measurement of Software Systems, pp. 35–38 (2003)Google Scholar
  10. 10.
    Castaldi, M., Carzaniga, A., Inverardi, P., Wolf, A.: A Light-weight Infrastructure for Reconfiguring Applications. In: Westfechtel, B., van der Hoek, A. (eds.) SCM 2001 and SCM 2003. LNCS, vol. 2649, pp. 231–244. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Castaldi, M., Costantini, S., Gentile, S., Tocchio, A.: A Logic-based Infrastructure for Reconfiguring Applications. Technical report, University of L’Aquila, Department of Computer Science, To appear in LNAI, Springer (2003)Google Scholar
  12. 12.
    Rose, M.T.: The Simple Book: An Introduction to Networking Management. Prentice-Hall, Englewood Cliffs (1996)Google Scholar
  13. 13.
    Castaldi, M., Ryan, N.D.: Supporting Component-based Development by Enriching the Traditional API. In: Proceedings of Net.Object Days 2002 - Workshop on Generative and Component-based Software Engineering, Erfurt, Germany, pp. 44–48 (2002)Google Scholar
  14. 14.
    Huang, Y., Kintala, C., Kollettis, N.: Software rejuvenation: Analysis, module and applications. In: Proc. of 25th Int. Symposium on Fault-Tolerance Computing (FTCS-25), Pasadena, CA, USA (June 1995)Google Scholar
  15. 15.
    Petty, M.D., Weisel, E.W.: A Composability Lexicon. In: Proceedings of the Spring 2003 Simulation Interoperability Workshop, Orlando FL, USA (2003)Google Scholar
  16. 16.
    Betous-Almeida, C., Kanoun, K.: Stepwise Construction and Refinement of Dependability Models. In: IEEE International Conference on Dependable Systems and Networks, Washington D.C, USA (2002)Google Scholar
  17. 17.
    Siewiorek, D.P., Swarz, R.S.: Reliable Computer System - Design and Evaluation, 3rd edn. Digital Press (2001)Google Scholar
  18. 18.
    Chohra, A., Porcarelli, S., Di Giandomenico, F., Bondavalli, A.: Towards Optimal Database Maintenance in Wireless Communication System. In: 5th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2001), Orlando, Florida (2001)Google Scholar
  19. 19.
    Powell, D.: Failure Mode Assumptions and Assumption Coverage. In: Laprie, J., Randell, B., Kopetz, H., Littlewood, B. (eds.) Predictably Dependable Computing Systems, pp. 3–24. Springer, Heidelberg (1995)Google Scholar
  20. 20.
    Bondavalli, A., Mura, I., Chiaradonna, S., Filippini, R., Poli, S., Sandrini, F.: DEEM: a Tool for the Dependability Modeling and Evaluation of Multiple Phased Systems. In: Proc. of Dependable Systems and Networks, New York, USA (2000)Google Scholar
  21. 21.
    Marsan, M.A., Chiola, G.: On Petri Nets with Deterministic and Exponentially Distribuited Firing Times. In: Rozenberg, G. (ed.) APN 1987. LNCS, vol. 266, pp. 132–145. Springer, Heidelberg (1987)Google Scholar
  22. 22.
    Muppala, A.K., Ciardo, G., Trivedi, K.S.: Stochastic reward nets for reliability prediction. Communications in Reliability, Maintenability and Serviceability 1, 9–20 (1994)Google Scholar
  23. 23.
    Garlan, D., Schmerl, B., Chang, J.: Using Gauges for Architecture-Based Monitoring and Adaptation. In: Proceedings of Working Conference on Complex and Dynamic Systems Architecture, Brisbane, Australia (2001)Google Scholar
  24. 24.
    Garlan, D., Monroe, R., Wile, D.: Acme: Architectural Description of Component- Based Systems. In: Leavens, G.T., Sitaraman, M. (eds.) Foundations of Component- Based Systems, pp. 47–68. Cambridge University Press, Cambridge (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Stefano Porcarelli
    • 1
  • Marco Castaldi
    • 2
  • Felicita Di Giandomenico
    • 1
  • Andrea Bondavalli
    • 3
  • Paola Inverardi
    • 2
  1. 1.ISTI Dept.Italian National Research CouncilItaly
  2. 2.Dip. InformaticaUniversity of L’AquilaItaly
  3. 3.Dip. Sistemi e InformaticaUniversity of FirenzeItaly

Personalised recommendations