Skip to main content
Log in

Hierarchy-Debug: a scalable statistical technique for fault localization

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Considering the fact that faults may be revealed as undesired mutual effect of program predicates on each other, a new approach for localizing latent bugs, namely Hierarchy-Debug, is presented in this paper. To analyze the vertical effect of predicates on each other and on program termination status, the predicates are fitted into a logistic lasso model. To support scalability, a hierarchical clustering algorithm is applied to cluster the predicates according to their presence in different executions. Considering each cluster as a pseudo-predicate, a distinct lasso model is built for intermediate levels of the hierarchy. Then, we apply a majority voting technique to score the predicates according to their lasso coefficients at different levels of the hierarchy. The predicates with relatively higher scores are ranked as fault relevant predicates. To provide the context of failure, faulty sub-paths are identified as sequences of fault relevant predicates. The grouping effect of Hierarchy-Debug helps programmers to detect multiple bugs. Four case studies have been designed to evaluate the proposed approach on three well-known test suites, SpaceSiemens, and Bash. The evaluations show that Hierarchy-Debug produces more precise results compared with prior fault localization techniques on the subject programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Abreu, R., Zoeteweij, P., Golsteijn, R., van Gemund, A., & Arjan, J. C. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780–1792.

    Google Scholar 

  • Arumuga Nainar, P., Chen, T., Rosin, J., & Liblit, B. (2007). Statistical debugging using compound Boolean predicates. In Proceedings of international symposium on software testing and analysis (pp. 5–15).

  • Arumuga Nainar, P., & Liblit, B. (2010). Adaptive bug isolation. In Proceedings of 32nd international conference on software engineering (pp. 255–264).

  • Chatterjee, S., Hadi, A., & Price, B. (2006). Regression analysis by example. New York: Wiley.

    Book  MATH  Google Scholar 

  • Cheng, H., Lo, D., Zhou, Y., & Wang, X. (2009). Identifying bug signatures using discriminative graph mining. In Proceedings of international symptoms on software testing and analysis (pp. 141–151).

  • Chilimbi, T. M., Liblit, B., Mehra, K., Nori, A. V., & Vaswani, K. (2009). HOLMES: Effective statistical debugging via efficient path profiling. In Proceedings of 31st international conference on software engineering (pp. 34–44).

  • Cleve, H., & Zeller, A. (2005). Locating causes of program failures. In Proceedings of the 27th international conference on software engineering (pp. 342–351).

  • Collofello, J. S., & Woodfield, S. N. (1989). Evaluating the effectiveness of reliability-assurance techniques. Journal of System and Software, 9(3), 191–195.

    Article  Google Scholar 

  • Dickinson, W., Leon, D., & Podgurski, A. (2001). Finding failures by cluster analysis of execution profiles.In Proceedings of the 23rd international conference on software engineering (pp. 339–348).

  • Do, H., Elbaum, S., & Rothermel, G. (2005). Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact. Empirical Software Engineering, 10(4), 405–435.

    Article  Google Scholar 

  • Hangal, S., & Lam, M. (2002). Tracking down software bugs using automatic anomaly detection. In Proceedings of the 24th international conference software engineering (pp. 291–301).

  • Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998). Cluster analysis and display of genomewide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95(25), 14863–14868.

  • Ernst, M. D., Cockrell, J., Griswold, G. W., & Notkin, D. (2001). Dynamically discovering likely program invariants to support program evolution. IEEE Transactions on Software Engineering, 27(2), 99–123.

    Article  Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2009). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.

    Google Scholar 

  • Gupta, N., He, H., Zhang, X., & Gupta, R. (2008). Locating faulty code using failure-inducing chops. In Proceedings of the 20th IEEE/ACM international conference on automated software engineering (pp. 263–272).

  • Hastie, T. J., Tibshirani, R. J., & Friedman, J. (2009). The elements of statistical learning: Data mining inference and prediction (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Hsu, H., Jones, J. A., & Orso, A. (2008). Rapid: Identifying bug signatures to support debugging activities. In Proceedings of the 23rd IEEE/ACM international conference on automated software engineering (pp. 439–442).

  • Jiang, L., & Su, Z. (2007). Context-aware statistical debugging: From bug predictors to faulty control flow paths. In Proceedings of twenty-second IEEE/ACM international conference on automated software engineering (pp. 184–193).

  • Jones, J. A., & Harrold, M. J. (2005). Evaluation of the tarantula automatic faultlocalization technique. In Proceedings of automated software engineering (pp. 273–282).

  • Liblit, B. (2004). Cooperative bug isolation. PhD thesis, University of California, Berkeley, Springer.

  • Liblit, B., Aiken, A., Zheng, X., & Jordan, M.I. (2003). Bug isolation via remote program sampling. In Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation (pp. 141–154).

  • Liblit, B., Naik, M., Zheng, A., Jordan, M., & Aiken, A. (2005). Scalable statistical bug isolation. In Proceedings of international conference on programming language design and implementation (pp. 15–26).

  • Liu, C., Yan,X., Fei, L., & Midkiff, S. P. (2005). Sober: Statistical model-based bug localization. In Proceedings of 10th European software engineering conference/13th ACM SIGSOFT international symposium foundations of software engineering (pp. 286–295).

  • Park, M., Hastie, T., & Tibshirani, R. (2007). Averaged gene expressions for regression. Biostatistics Journal, 8(2), 212–227.

    Google Scholar 

  • Parsa, S., Arabi, S., Vahidi-Asl, M., & Minaei-Bidgoli, B. (2009a). Statistical software debugging: From bug predictors to the main causes of failure. Special session on software metrics and measurement in conjunction with “The second international conference on application of digital information and web technologies” (pp. 802–807).

  • Parsa, S., Arabi, S., Vahidi-Asl, M., & Minaei-Bidgoli, B. (2009b). Software fault localization using elastic net: A new statistical approach, Communications in Computer and Information Science, 59, 127–134.

    Google Scholar 

  • Parsa, S., Asadi-Aghbolaghi, M., & Vahidi-Asl, M. (2011). Statistical debugging using a hierarchical model of correlated predicates. Lecture Notes in Computer Science (Vol. 7002, pp. 251–256). Springer.

  • Parsa, S., Vahidi-Asl, M., & Arabi, S. (2008). Finding causes of software failure using ridge regression and association rule generation methods. In Proceedings of ninth ACIS international conference on parallel/distributed computing (pp. 873–878).

  • Podgurski, A., Leon, D., Francis, P., Masri, W., Minch, M., Sun, J., & Wang, B. (2003). Automated support for classifying software failure reports. In Proceedings of the 25th international conference on software engineering (pp. 465–475).

  • Pytlik, B., Renieris, M., Krishnamurthi, S., & Reiss, S.(2003). Automated fault localization using potential invariants. In Proceedings of the fifth international workshop automated and algorithmic debugging (pp. 273–276).

  • Renieris, M., & Reiss, S. (2003). Fault localization with nearest neighbor queries. In Proceedings of 18th IEEE international conference on automated software engineering, Montreal (pp. 30–39).

  • Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    Google Scholar 

  • Tibshirani, R. (1996). Optimal reinsertion: Regression shrinkage and selection via the lasso. Royal Statistical Society, 58, 267–288.

    MATH  MathSciNet  Google Scholar 

  • Vessey, L. (1985). Expertise in debugging computer programs: A process analysis. In Proceedings of the International Journal of Man–Machine Studies Expertise in debugging computer programs, 23(5), 459–494.

  • Vokolos, F., & Frankl, P. (1998). Empirical evaluation of the textual differencing regression testing techniques. In Proceedings of the international conference on software maintenance (p. 44).

  • Zeller, A. (2002) Isolating cause–effect chains from computer programs. In Proceedings of ACM international symposium on foundations of software engineering (pp. 1–10).

  • Zeller, A. (2006). Why programs fail: A guide to systematic debugging. Burlington: Morgan Kaufmann.

  • Zeller, A., & Hildebrandt, R. (2002). Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 28(2), 183–200.

    Article  Google Scholar 

  • Zhang, Z., Chan, W. K., Tse, T. H., Hu, P., & Wang, X. (2009). Is non-parametric hypothesis testing model robust for statistical fault localization? Journal of Information and Software Technology, 51, 1573–1585.

    Google Scholar 

  • Zhang, Z., Chan, W. K., Tse, T. H., Yu, Y. T., & Hu, P. (2011). Non-parametric statistical fault localization. Journal of System and Software, 84(6), 885–905.

    Google Scholar 

  • Zhang, X., Gupta, N., & Gupta, R. (2006a). Locating faults through automated predicate switching. In Proceedings of the 28th international conference on Software engineering (pp. 272–281).

  • Zhang, X., Gupta, N., & Gupta, R. (2006b). Pruning dynamic slices with confidence. SIGPLAN notices (Vol. 41, No. 6, pp. 169–180). ACM Press.

  • Zhang, X.,Gupta, N., & Gupta, R. (2006c). Prunning dynamic slices with confidence. In Proceedings of ACM SIGPLAN conference on programming language design and implementation (pp.169–180).

  • Zhang, X., Gupta, N., & Gupta, R. (2007). Locating faulty code by multiple points slicing. Software: Practice and Experience, 37(9), 935–961.

    Google Scholar 

  • Zhang, X., Gupta, R., & Zhang, Y. (2003). Precise dynamic slicing algorithms. In Proceedings of IEEE/ACM international conference on software engineering (pp. 319–329).

  • Zheng, A. X., Jordan, M. I.,Liblit, B., Naik, M., & Aiken, A. (2006). Statistical debugging: Simultaneous identification of multiple bug. In Proceedings of the 23rd international conference on machine learning (pp. 1105–1112).

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 301–320.

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank G. Rothermel for making the Siemens test suite available; W. Motycka for his invaluable help and support on the execution of SIR programs; B. Liblit for his insightful comments. But above all, the authors deeply appreciate the insightful questions, comments, and recommendation from the journal’s Editor-in-Chief, editor, and anonymous referees during the preparation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mojtaba Vahidi-Asl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Parsa, S., Vahidi-Asl, M. & Asadi-Aghbolaghi, M. Hierarchy-Debug: a scalable statistical technique for fault localization. Software Qual J 22, 427–466 (2014). https://doi.org/10.1007/s11219-013-9199-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-013-9199-x

Keywords

Navigation