Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Feyzi, Farid; Parsa, Saeed

doi:10.1007/s11704-017-6512-z

Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Research Article
Published: 18 June 2019

Volume 13, pages 735–759, (2019)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Farid Feyzi¹ &
Saeed Parsa¹

138 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant cause-effect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experiments with seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FTFL: A Fisher’s test-based approach for fault localization

Article 17 June 2021

Evaluation of Measures for Statistical Fault Localisation and an Optimising Scheme

Lightweight fault localization combined with fault context to improve fault absolute rank

Article 27 July 2017

References

Parsa S, Vahidi-AslM, Asadi-Aghbolaghi M. Hierarchy-Debug: a scalable statistical technique for fault localization. Software Quality Journal, 2014, 22(3): 427–466
Article Google Scholar
Wong W E, Debroy V, Gao R, Li Y. The DStar method for effective software fault localization. IEEE Transactions on Reliability, 2014, 63(1): 290–308
Article Google Scholar
Wong W E, Debroy V, Xu D. Towards better fault localization: a crosstab-based statistical approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(3): 378–396
Article Google Scholar
Wong W E, Debroy V, Choi B. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208
Article Google Scholar
Wong W E, Qi Y. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 2009, 19(4): 573–597
Article Google Scholar
Wong W E, Debroy V, Golden R, Xu X, Thuraisingham B. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 2012, 61(1): 149–169
Article Google Scholar
Abreu R, Zoeteweij P, Van Gemund A J. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the International Symposium on Dependable Computing. 2006, 39–46
Google Scholar
Naish L, Lee H, Ramamohanarao K. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology (TOSEM), 2011, 20(3): 11
Article Google Scholar
Roychowdhury S, Khurshid S. Software fault localization using feature selection. In: Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering. 2011, 11–18
Google Scholar
Roychowdhury S, Khurshid S. A family of generalized entropies and its application to software fault localization. In: Proceedings of the 6th IEEE International Conference on Intelligent Systems. 2012, 368–373
Google Scholar
Jiang L, Su Z. Context-aware statistical debugging: from bug predictors to faulty control flow paths. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 2007, 184–193
Google Scholar
Liu H, Sun J, Liu L, Zhang H. Feature selection with dynamic mutual information. Pattern Recognition, 2009, 42(7): 1330–1339
Article MATH Google Scholar
Sun X, Liu Y, Xu M, Chen H, Han J, Wang K. Feature selection using dynamic weights for classification. Knowledge-Based Systems, 2013, 37: 541–549
Article Google Scholar
Estévez P A, Tesmer M, Perez C A, Zurada J M. Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 2009, 20(2): 189–201
Article Google Scholar
Baah G K, Podgurski A, Harrold M J. Causal inference for statistical fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. 2010, 73–84
Google Scholar
Baah G K, Podgurski A, Harrold M J. Mitigating the confounding effects of program dependences for effective fault localization. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011, 146–156
Google Scholar
Baah G K, Podgurski A, Harrold M J. Matching Test Cases for Effective Fault Localization. Georgia Institute of Technology. 2011
Google Scholar
Shu G. Statistical estimation of software reliability and failure-causing effect. Doctoral Dissertation, Case Western Reserve University, 2014
Google Scholar
Weiser M. Program slicing. In: Proceedings of the 5th International Conference on Software Engineering. 1981, 439–449
Google Scholar
Denmat T, Ducassé M, Ridoux O. Data mining and cross-checking of execution traces: a re-interpretation of Jones, Harrold and Stasko test information. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 396–399
Chapter Google Scholar
Sun S F, Podgurski A. Properties of effective metrics for coveragebased statistical fault localization. In: Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST). 2016, 124–134
Google Scholar
Bai Z, Shu G, Podgurski A. NUMFL: localizing faults in numerical software using a value-based causal model. In: Proceedings of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST). 2015, 1–10
Google Scholar
Bai Z, Shu G, Podgurski A. Causal inference based fault localization for numerical software with NUMFL. Software Testing, Verification and Reliability, 2017, 27(6): e1613
Article Google Scholar
Gore R, Reynolds P F. Reducing confounding bias in predicate-level statistical debugging metrics. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 463–473
Google Scholar
Wang X, Cheung S C, Chan W K, Zhang Z. Taming coincidental correctness: coverage refinement with context patterns to improve fault localization. In: Proceedings of the 31st International Conference on Software Engineering. 2009, 45–55
Google Scholar
Masri W, Assi R A. Prevalence of coincidental correctness and mitigation of its impact on fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM), 2014, 23(1): 8
Article Google Scholar
Miao Y, Chen Z, Li S, Zhao Z, Zhou Y. Identifying coincidental correctness for fault localization by clustering test cases. SEKE. 2012, 267–272
Google Scholar
Zhang X, Gupta N, Gupta R. Locating faulty code by multiple points slicing. Software: Practice and Experience, 2007, 37(9), 935–961
Google Scholar
Zhang X, Gupta N, Gupta R. Pruning dynamic slices with confidence. ACM SIGPLAN Notices, 2006, 41(6): 169–180
Article Google Scholar
Cover T M, Thomas J A. Elements of Information Theory. Hoboken: John Wiley & Sons, 2012
MATH Google Scholar
Burbea J, Rao C R. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 1982, 28: 489–495
Article MathSciNet MATH Google Scholar
Le T D B, Lo D, Li M. Constrained feature selection for localizing faults. In: Proceedings of the International Conference on Software Maintenance and Evolution. 2015, 501–505
Google Scholar
Xu J, Zhang Z, Chan W K, Tse T H, Li S. A general noise-reduction framework for fault localization of Java programs. Information and Software Technology, 2013, 55(5): 880–896
Article Google Scholar
Zeller A, Hildebrandt R. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 2002, 28(2): 183–200
Article Google Scholar
Zeller A. Isolating cause-effect chains from computer programs. In: Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering. 2002, 1–10
Google Scholar
Pearl J. Causality: models, reasoning and inference. Econometric Theory, 2003, 19: 675–685
Article Google Scholar
Ferrante J, Ottenstein K J, Warren J D. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 1987, 9(3): 319–349
Article MATH Google Scholar
Podgurski A, Clarke L A. A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Transactions on Software Engineering, 1990, 16(9): 965–979
Article Google Scholar
Shu G, Sun B, Podgurski A, Cao F. MFL: Method-level fault localization with causal inference. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 124–133
Google Scholar
Austin P C. Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-toone matching on the propensity score. American Journal of Epidemiology, 2010, 172(9): 1092–1097
Article Google Scholar
Hosmer D W, Lemeshow S. Applied Logistic Regression. Hoboken: John Wiley & Sons, 2013
Book MATH Google Scholar
Hansen B B, Klopfer S O. Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 2006, 15(3): 609–627
Article MathSciNet Google Scholar
Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010, 33(1): 1–22
Article Google Scholar
Ho D, Imai K, King G, Stuart E A. MatchIt: nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 2011, 42(8): 1–28
Article Google Scholar
Hutchins M, Foster H, Goradia T, Ostrand T. Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering. 1994, 191–200
Google Scholar
Xie X, Kuo F C, Chen T Y, Yoo S, Harman M. Provably optimal and human-competitive results in sbse for spectrum based fault localisation. In: Proceedings of International Symposium on Search Based Software Engineering, Springer Berlin Heidelberg. 2013, 224–238
Chapter Google Scholar
Yu Y, Jones J A, Harrold M J. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 201–210
Google Scholar
Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann, 2005
MATH Google Scholar
Ascari L C, Araki L Y, Pozo A R, Vergilio S R. Exploring machine learning techniques for fault localization. In: Proceedings of the 10th Latin American Test Workshop. 2009, 1–6
Google Scholar
Lo D, Jiang L, Budi A. Comprehensive evaluation of association measures for fault localization. In: Proceedings of IEEE International Conference on Software Maintenance. 2010, 1–10
Google Scholar
Jones J A, Harrold MJ. Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 273–282
Chapter Google Scholar
Roychowdhury S. A mixed approach to spectrum-based fault localization using information theoretic foundations. Doctoral Dissertation, the University of Texas at Austin, 2013
Google Scholar
Jiang B, Zhang Z, Tse T H, Chen T Y. How well do test case prioritization techniques support statistical fault localization. In: Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference. 2009, 99–106
Google Scholar
Jiang B, Zhang Z, Chan W K, Tse T H. Adaptive random test case prioritization. In: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering. 2009, 233–244
Google Scholar
Yoo S, Harman M, Clark D. Fault localization prioritization: comparing information-theoretic and coverage-based approaches. ACM Transactions on Software Engineering and Methodology (TOSEM), 2013, 22(3): 19
Article Google Scholar
Roychowdhury S, Khurshid S. A novel framework for locating software faults using latent divergences. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2011, 49–64
Chapter Google Scholar
Abreu R, Zoeteweij P, Golsteijn R, Van Gemund A J. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 2009, 82(11): 1780–1792
Article Google Scholar
Yoo S. Evolving human competitive spectra-based fault localisation techniques. In: Proceedings of International Symposium on Search Based Software Engineering. 2012, 244–258
Chapter Google Scholar
Wong W E, Gao R, Li Y, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740
Article Google Scholar
Le T D B, Thung F, Lo D. Theory and practice, do they matchfi a case with spectrum-based fault localization. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM). 2013, 380–383
Google Scholar
Yoo S, Xie X, Kuo F C, Chen T Y, Harman M. No pot of gold at the end of program spectrum rainbow: greatest risk evaluation formula does not exist. Research Note RN, 2014, 14(14): 14
Google Scholar
Ju X, Jiang S, Chen X, Wang X, Zhang Y, Cao H. HSFal: effective fault localization using hybrid spectrum of full slices and execution slices. Journal of Systems and Software, 2014, 90: 3–17
Article Google Scholar
Gupta N, He H, Zhang X, Gupta R. Locating faulty code using failureinducing chops. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 263–272
Chapter Google Scholar
Zhang X, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 272–281
Google Scholar
Jeffrey D, Gupta N, Gupta R. Fault localization using value replacement. In: Proceedings of the International Symposium on Software Testing and Analysis. 2008, 167–178
Google Scholar
Zhang X, Tallam S, Gupta N, Gupta R. Towards locating execution omission errors. ACM Sigplan Notices, 2007, 42(6): 415–424
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
Farid Feyzi & Saeed Parsa

Authors

Farid Feyzi
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Parsa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeed Parsa.

Additional information

Farid Feyzi received the MS degree in Software Engineering from the Sharif University of Technology, Iran in 2012. He is currently a PhD candidate in the Department of Computer Engineering at Iran University of Science and Technology. His research focus is on developing statistical algorithms to improve software quality with an emphasis on statistical fault localization and automated test data generation.

Saeed Parsa received his BS in Mathematics and Computer Science from Sharif University of Technology, Iran, his MS degree in Computer Science from the University of Salford in England, and his PhD in Computer Science from the University of Salford, England. He is an associate professor of Computer Science at the Iran University of Science and Technology. His research interests include software engineering, software testing and debugging, and algorithms.

Electronic supplementary material

Supplementary material, approximately 765 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feyzi, F., Parsa, S. Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference. Front. Comput. Sci. 13, 735–759 (2019). https://doi.org/10.1007/s11704-017-6512-z

Download citation

Received: 25 October 2016
Accepted: 23 June 2017
Published: 18 June 2019
Issue Date: August 2019
DOI: https://doi.org/10.1007/s11704-017-6512-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Abstract

Access this article

Similar content being viewed by others

FTFL: A Fisher’s test-based approach for fault localization

Evaluation of Measures for Statistical Fault Localisation and an Optimising Scheme

Lightweight fault localization combined with fault context to improve fault absolute rank

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 765 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Abstract

Access this article

Similar content being viewed by others

FTFL: A Fisher’s test-based approach for fault localization

Evaluation of Measures for Statistical Fault Localisation and an Optimising Scheme

Lightweight fault localization combined with fault context to improve fault absolute rank

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 765 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation