Science China Information Sciences

, Volume 55, Issue 5, pp 1186–1200 | Cite as

Probabilistic fault localization with sliding windows

  • Cheng Zhang
  • JianXin Liao
  • TongHong Li
  • XiaoMin Zhu
Research Paper


Fault localization is a central element in network fault management. This paper takes a weighted bipartite graph as a fault propagation model and presents a heuristic fault localization algorithm based on the idea of incremental coverage, which is resilient to inaccurate fault propagation model and the noisy environment. Furthermore, a sliding window mechanism is proposed to tackle the inaccuracy of this algorithm in the presence of improper time windows. As shown in the simulation study, our scheme achieves higher detection rate and lower false positive rate in the noisy environment as well as in the presence of inaccurate windows, than current fault localization algorithms.


fault management fault diagnosis fault localization fault propagation model time windows incremental coverage 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Steinder M, Sethi A S. A survey of fault localization techniques in computer networks. Sci Comput Progr, 2004, 53: 165–194MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Mas C, Thiran P. An efficient algorithm for locating soft and hard failures in WDM networks. IEEE J Sel Area Commun, 2000, 18: 1900–1911CrossRefGoogle Scholar
  3. 3.
    Wang C, Schwartz M. Fault detection with multiple observers. IEEE/ACM Trans Netw, 1993, 1: 48–55CrossRefGoogle Scholar
  4. 4.
    Liu G, Mok A K, Yang E J. Composite events for network event correlation. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management(IM), Boston, 1999. 247–260Google Scholar
  5. 5.
    Lewis L. A case-based reasoning approach to the resolution of faults in communications networks. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management(IM), San Francisco, 1993. 671–681Google Scholar
  6. 6.
    Wietgrefe H. Investigation and practical assessment of alarm correlation methods for the use in GSM access networks. In: Proceedings of IFIP/IEEE Network Operation and Management Symposium(NOMS), Florence, 2002. 391–404Google Scholar
  7. 7.
    Benveniste A, Fabre E, Haar S, et al. Diagnosis of asynchronous discrete-event systems: a net unfolding approach. IEEE Trans Aut Contr, 2003, 48: 714–727MathSciNetCrossRefGoogle Scholar
  8. 8.
    Rouvellou I, Hart G W. Automatic alarm correlation for fault identification. In: Proceedings of 14th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM). Bringing Information to People, Boston, 1995. 553–561Google Scholar
  9. 9.
    Zhang C, Liao J X, Zhu X M. SWPM: An incremental fault localization algorithm based on sliding window with preprocessing mechanism. In: Proceedings of 9th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), New Zealand, 2008. 235–242Google Scholar
  10. 10.
    Brodie M, Rish I, Ma S, et al. Active probing strategies for problem diagnosis in distributed systems. In: Proceeding of International Joint Conferences on Artificial Intelligence(IJCAI), Acapulco, 2003. 1337–1338Google Scholar
  11. 11.
    Tang Y N, Al-Shaer E S, Boutaba R. Active integrated fault localization in communication networks. In: Proceeding of 9th IFIP/IEEE International Symposium on Integrated Network Management (IM), Nice, 2005. 543–556Google Scholar
  12. 12.
    Katzela I, Schwartz M. Schemes for fault identification in communication networks. IEEE/ACM Trans Netw, 1995, 3: 733–764CrossRefGoogle Scholar
  13. 13.
    Peng G Q, Cheng H. A causal model for diagnostic reasoning. J Comput Sci Tech, 2000, 15: 287–294MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Kandula S, Katabi D, Vasseur J P. Shrink: a tool for failure diagnosis in IP networks. In: ACM SIGCOMM Workshop on Mining Network Data (MineNet), Philadelphia, 2005. 173-178Google Scholar
  15. 15.
    Khanafer R M, Solana B, Triola J, et al. Automated diagnosis for UMTS networks using Bayesian network approach. IEEE Trans Vehic Tech, 2008, 57: 2451–2461CrossRefGoogle Scholar
  16. 16.
    Steinder M, Sethi A S. Probabilistic fault localization in communication systems using belief networks. IEEE/ACM Trans Netw, 2004, 12: 809–822CrossRefGoogle Scholar
  17. 17.
    Rao N S V. Computational complexity issues in operative diagnosis of graph-based systems. IEEE Trans Comput, 1993, 42: 447–457CrossRefGoogle Scholar
  18. 18.
    Kompella R R, Yates J, Greenberg A, et al. IP fault localization via risk modeling. In: Proceedings of 2nd ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI), Boston, 2005. 57–70Google Scholar
  19. 19.
    Huang X H, Zou S H, Wang W D, et al. Fault management for Internet service: modeling and algorithms. In: Proceedings of IEEE Communication on Conference (ICC), Istanbul, 2006. 854–859Google Scholar
  20. 20.
    Steinder M, Sethi A S. Probabilistic event-driven fault diagnosis through incremental hypothesis updating. In: Proceedings of IFIP/IEEE International Symposium on Integrated Network Management(IM), Colorado Springs, 2003. 635–648Google Scholar
  21. 21.
    Zheng Q H, Qian Y T. An event correlation approach based on the combination of IHU and codebook. In: International Conference Computational Intelligence and Security(CIS), Xi’an, 2005. 757–763Google Scholar
  22. 22.
    Zheng Q H, Qian Y T, Yao M. A network event correlation algorithm based on fault filtration. In: Proceeding of the 9th Pacific Rim International Conference on Artificial Intelligence (PRICAI), Guilin, 2006. 864–869Google Scholar
  23. 23.
    Natu M, Sethi A S. Probabilistic fault diagnosis using adaptive probing. In: IFIP/IEEE International Workshop on Distributed Systems: Operations and Managements(DSOM), San Jose, 2007. 38–49Google Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Cheng Zhang
    • 1
    • 2
  • JianXin Liao
    • 2
    • 3
  • TongHong Li
    • 4
  • XiaoMin Zhu
    • 2
    • 3
  1. 1.School of Electronic and Information EngineeringBeijing Jiaotong UniversityBeijingChina
  2. 2.State Key Laboratory of Networking and Switching TechnologyBeijing University of Posts and TelecommunicationsBeijingChina
  3. 3.EBUPT Information Technology Co. Ltd.BeijingChina
  4. 4.Computer Science DepartmentTechnical University of MadridMadridSpain

Personalised recommendations