Software Fault Localization Using N-gram Analysis

  • Syeda Nessa
  • Muhammad Abedin
  • W. Eric Wong
  • Latifur Khan
  • Yu Qi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5258)


A major portion of software development effort is spent in testing and debugging. Execution sequence collected in the testing phase can be a rich source of information for locating the fault in the program, but the exact execution sequence of a program, i.e., the actual order of execution of the statements in the program, is seldom used due to the huge volume. In this study, we apply data mining techniques on this data to reduce the debugging time by narrowing down the possible location of the fault. Our method applies N-gram analysis to rank the executable statements of a software by level of suspicion. We conducted three case studies to demonstrate the effectiveness of our proposed method. We also present comparison with other approaches, and illustrate the potential of our method.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jones, J.A., Harrold, M.J., Stasko, J.: Visualization of test information to assist fault localization. In: Proceedings of the 24th International Conference on Software Engineering (2002)Google Scholar
  2. 2.
    Liu, C., Yan, X., Han, J.: Mining control flow abnormality for logic error isolation. In: Proceedings of 2006 SIAM International Conference on Data Mining (2006)Google Scholar
  3. 3.
    Guo, L., Roychoudhury, A., Wang, T.: Accurately choosing execution runs for software fault localization. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 80–95. Springer, Heidelberg (2006)Google Scholar
  4. 4.
    Liu, C., Yan, X., Yu, H., Han, J., Yu, P.S.: Mining behavior graphs for backtrace of noncrashing bugs. In: Proc. 2005 SIAM Int. Conf. on Data Mining (2005)Google Scholar
  5. 5.
    Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation (2003)Google Scholar
  6. 6.
    Liu, C., Han, J.: Failure proximity: a fault localization-based approach. In: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering (2006)Google Scholar
  7. 7.
    Liu, C., Lian, Z., Han, J.: How bayesians debug. In: IEEE International Conference on Data Mining (2006)Google Scholar
  8. 8.
    Do, H., Elbaum, S.G., Rothermel, G.: Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering: An International Journal (2005)Google Scholar
  9. 9.
    Agrawal, H.: Dominators, super blocks, and program coverage. In: Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1994)Google Scholar
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)zbMATHGoogle Scholar
  11. 11.
    Liu, M.C., Fei, M.L., Yan, M.X., Han, S.M.J., Midkiff, M.S.P.: Statistical debugging: A hypothesis testing-based approach. IEEE Trans. Softw. Eng. (2006)Google Scholar
  12. 12.
    Li, J.J., Horgan, J.R.: χsuds-sdl: A tool for testing software architecture specifications. Software Quality Journal (2000)Google Scholar
  13. 13.
    Renieris, M., Reiss, S.P.: Fault localization with nearest neighbor queries. In: Proceedings of 18th IEEE International Conference on Automated Software Engineering (2003)Google Scholar
  14. 14.
    Denmat, T., Ducass, M., Ridoux, O.: Data mining and cross-checking of execution traces: a re-interpretation of jones, harrold and stasko test information. In: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering (2005)Google Scholar
  15. 15.
    Fatta, G.D., Leue, S., Stegantova, E.: Discriminative pattern mining in software fault detection. In: Proceedings of the 3rd international workshop on Software quality assurance (2006)Google Scholar
  16. 16.
    Engler, D., Chen, D.Y., Hallem, S., Chou, A., Chelf, B.: Bugs as deviant behavior: a general approach to inferring errors in systems code. SIGOPS Oper. Syst. Rev (2001)Google Scholar
  17. 17.
    Li, Z., Zhou, Y.: Pr-miner: automatically extracting implicit programming rules and detecting violations in large software code. SIGSOFT Softw. Eng. Notes (2005)Google Scholar
  18. 18.
    Chang, R.Y., Podgurski, A., Yang, J.: Finding what’s not there: a new approach to revealing neglected conditions in software. In: Proceedings of the 2007 international symposium on Software testing and analysis (2007)Google Scholar
  19. 19.
    Ramanathan, M.K., Grama, A., Jagannathan, S.: Path-sensitive inference of function precedence protocols. In: Proceedings of the 29th international conference on Software Engineering (2007)Google Scholar
  20. 20.
    Li, Z., Lu, S., Myagmar, S., Zhou, Y.: Cp-miner: a tool for finding copy-paste and related bugs in operating system code. In: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Syeda Nessa
    • 1
  • Muhammad Abedin
    • 1
  • W. Eric Wong
    • 1
  • Latifur Khan
    • 1
  • Yu Qi
    • 1
  1. 1.Department of Computer ScienceThe University of Texas at DallasUSA

Personalised recommendations