An Architecture for Supporting Network Fault Recovery Management

  • Feng Liu
  • Antonis M. Hadjiantonis
  • Ha Manh Tran
  • Mina Amin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5127)


Highly available and resilient networks play a decisive role in today’s networked world. As network faults are inevitable and networks are becoming constantly intricate, finding effective fault recovery solutions in a timely manner is becoming a challenging task for administrators. Therefore, an automated mechanism to support fault resolution is essential towards efficient fault handling process. In this paper we propose an architecture to support automated fault recovery in terms of traffic engineering, recovery knowledge discovery and automated recovery planning. We base our discussion on an application scenario for recovery from border router failure to maintain optimized configuration of outbound inter-domain traffic.


Fault Management Fault Recovery Automated Planning Policy-Based Management Case-Based Reasoning Peer-to-Peer Inter-Domain Traffic Engineering 


  1. 1.
    Bressoud, T., Rastogi, R., Smith, M.: Optimal configuration for bgp route selection. In: Proc. IEEE INFOCOM (2003)Google Scholar
  2. 2.
    Aamodt, A., Plaza, E.: Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Communications 7(1), 39–59 (1994)Google Scholar
  3. 3.
    Tran, H.M., Schönwälder, J.: Distributed Case-Based Reasoning for Fault Management. In: Proc. 1st International Conference on Autonomous Infrastructure, Management and Security, pp. 200–203. Springer, Heidelberg (2007)Google Scholar
  4. 4.
    Verma, D.C.: Simplifying network administration using policy-based management. IEEE Network 16(2) (2002)Google Scholar
  5. 5.
    Amin, M., Ho, K., Howarth, M., Pavlou, G.: An integrated network management framework for inter-domain outbound traffic engineering. In: Helmy, A., Jennings, B., Murphy, L., Pfeifer, T. (eds.) MMNS 2006. LNCS, vol. 4267, pp. 208–222. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Feamster, N., Borkenhagen, J., Rexford, J.: Guidelines for interdomain traffic engineering. SIGCOMM Comput. Commun. Rev. 33(5), 19–30 (2003)CrossRefGoogle Scholar
  7. 7.
    Tran, H.M., Schönwälder, J.: Heuristic Search using a Feedback Scheme in Unstructured Peer-to-Peer Networks. In: Proc. 5th International Workshop on Databases, Information Systems and P2P Computing. Springer, Heidelberg (2007)Google Scholar
  8. 8.
    McDermott, D., et al.: Pddl - the planning domain definition language (1998)Google Scholar
  9. 9.
    Nau, D., Traverso, P., Ghallab, M.: Automated Planning - Theory and Practic. Morgan Kaufmann, San Francisco (2004)zbMATHGoogle Scholar
  10. 10.
    Brodie, M., Ma, S., Lohman, G., Syeda-Mahmood, T., Mignet, L., Modani, N., Champlin, J., Sohn, P.: Quickly finding known software problems via automated symptom matching. In: Proc. 2nd International Conference on Automatic Computing, Washington, DC, USA, pp. 101–110. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  11. 11.
    Montani, S., Anglano, C.: Case-based reasoning for autonomous service failure diagnosis and remediation in software systems. In: Proc. 8th European Conference on Case-Based Reasoning, pp. 489–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  12. 12.
    Hadjiantonis, A.M., Charalambides, M., Pavlou, G.: A policy-based approach for managing ubiquitous networks in urban spaces. In: Proc. IEEE International Conference on Communications (ICC 2007) (2007)Google Scholar
  13. 13.
    Flegkas, P., Trimintzios, P., Pavlou, G.: A policy-based quality of service management system for ip diffserv networks. IEEE Network 16(2) (2002)Google Scholar
  14. 14.
    Kephart, J.O.: Research challenges of autonomic computing. In: Proc. 27th International Conference on Software Engineering (ICSE 2005). ACM, New York (2005)Google Scholar
  15. 15.
    Srivastava, B., Kambhampati, S.: The case for automated planning in autonomic computing. IEEE, Los Alamitos (2005)CrossRefGoogle Scholar
  16. 16.
    Arshad, N., Heimbigner, D., Wolf, A.L.: A planning based approach to failure recovery in distributed systems. In: Proc. 1st ACM SIGSOFT workshop on Self-managed systems, pp. 8–12. ACM, New York (2004)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2008

Authors and Affiliations

  • Feng Liu
    • 1
  • Antonis M. Hadjiantonis
    • 2
  • Ha Manh Tran
    • 3
  • Mina Amin
    • 2
  1. 1.MNM TeamLudwig-Maximilians-University MunichGermany
  2. 2.Centre for Communications Systems ResearchUniversity of SurreyUK
  3. 3.Computer ScienceJacobs University BremenGermany

Personalised recommendations