Statistical Debugging Using Latent Topic Models

  • David Andrzejewski
  • Anne Mulhern
  • Ben Liblit
  • Xiaojin Zhu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4701)


Statistical debugging uses machine learning to model program failures and help identify root causes of bugs. We approach this task using a novel Delta-Latent-Dirichlet-Allocation model. We model execution traces attributed to failed runs of a program as being generated by two types of latent topics: normal usage topics and bug topics. Execution traces attributed to successful runs of the same program, however, are modeled by usage topics only. Joint modeling of both kinds of traces allows us to identify weak bug topics that would otherwise remain undetected. We perform model inference with collapsed Gibbs sampling. In quantitative evaluations on four real programs, our model produces bug topics highly correlated to the true bugs, as measured by the Rand index. Qualitative evaluation by domain experts suggests that our model outperforms existing statistical methods for bug cause identification, and may help support other software tasks not addressed by earlier models.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arumuga Nainar, P., Chen, T., Rosin, J., Liblit, B.: Statistical debugging using compound boolean predicates. In: Elbaum, S. (ed.) International Symposium on Software Testing and Analysis, July 9–12, 2007, London, United Kingdom (2007)Google Scholar
  2. 2.
    Dickinson, W., Leon, D., Podgurski, A.: Finding failures by cluster analysis of execution profiles. In: Proceedings of the 23rd International Conference on Software Engeneering (ICSE-01), pp. 339–348. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  3. 3.
    Hangal, S., Lam, M.S.: Tracking down software bugs using automatic anomaly detection. In: ICSE 2002: Proceedings of the 24th International Conference on Software Engineering, pp. 291–301. ACM Press, New York (2002)CrossRefGoogle Scholar
  4. 4.
    Jones, J.A., Harrold, M.J.: Empirical evaluation of the Tarantula automatic fault-localization technique. In: ASE 2005: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering, pp. 273–282. ACM Press, New York (2005)CrossRefGoogle Scholar
  5. 5.
    Liblit, B., Naik, M., Zheng, A.X., Aiken, A., Jordan, M.I.: Scalable statistical bug isolation. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, June 12–15 2005, Chicago, Illinois (2005)Google Scholar
  6. 6.
    Liu, C., Yan, X., Fei, L., Han, J., Midkiff, S.P.: SOBER: statistical model-based bug localization. In: Wermelinger, M., Gall, H. (eds.) ESEC/SIGSOFT FSE, pp. 286–295. ACM, New York (2005)Google Scholar
  7. 7.
    Zheng, A.X., Jordan, M.I., Liblit, B., Aiken, A.: Statistical debugging of sampled programs. In: Thrun, S., Saul, L., Schölkopf, B. (eds.) NIPS 16, MIT Press, Cambridge, MA (2004)Google Scholar
  8. 8.
    Zheng, A.X., Jordan, M.I., Liblit, B., Naik, M., Aiken, A.: Statistical debugging: Simultaneous identification of multiple bugs. In: ICML (2006)Google Scholar
  9. 9.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI 1999, Stockholm (1999)Google Scholar
  10. 10.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHCrossRefGoogle Scholar
  11. 11.
    Griffiths, T., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  12. 12.
    Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  13. 13.
    Liblit, B.: Cooperative Bug Isolation: Winning Thesis of the 2005 ACM Doctoral Dissertation Competition. LNCS, vol. 4440. Springer, Heidelberg (2007)MATHGoogle Scholar
  14. 14.
    Liblit, B.: The Cooperative Bug Isolation Project,
  15. 15.
    Kass, R., Raftery, A.: Bayes factors. Journal of the American Statistical Association 90, 773–795 (1995)MATHCrossRefGoogle Scholar
  16. 16.
    EXIF Tag Parsing Library,
  17. 17.
    Do, H., Elbaum, S., Rothermel, G.: Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering: An International Journal 10(4), 405–435 (2005)CrossRefGoogle Scholar
  18. 18.
    Rothermel, G., Elbaum, S., Kinneer, A., Do, H.: Software-artifact intrastructure repository (September 2006),
  19. 19.
    Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data 2003, San Diego, California, June 09–12, 2003, pp. 76–85. ACM Press, New York (2003)CrossRefGoogle Scholar
  20. 20.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66, 846–850 (1971)CrossRefGoogle Scholar
  21. 21.
    Lal, A., Lim, J., Polishchuk, M., Liblit, B.: Path optimization in programs and its application to debugging. In: Sestoft, P. (ed.) 15th European Symposium on Programming, Vienna, Austria, pp. 246–263. Springer, Heidelberg (2006)Google Scholar
  22. 22.
    Blei, D.M., Griffiths, T.L., Jordan, M.I., Tenenbaum, J.B.: Hierarchical topic models and the nested Chinese restaurant process. In: NIPS 16 (2003)Google Scholar
  23. 23.
    Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the Association for Computational Linguistics, pp. 271–278 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • David Andrzejewski
    • 1
  • Anne Mulhern
    • 1
  • Ben Liblit
    • 1
  • Xiaojin Zhu
    • 1
  1. 1.Computer Sciences Department, University of Wisconsin, Madison WI 53706USA

Personalised recommendations