Skip to main content
Log in

Locating faults with program slicing: an empirical analysis

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Statistical fault localization is an easily deployed technique for quickly determining candidates for faulty code locations. If a human programmer has to search the fault beyond the top candidate locations, though, more traditional techniques of following dependencies along dynamic slices may be better suited. In a large study of 457 bugs (369 single faults and 88 multiple faults) in 46 open source C programs, we compare the effectiveness of statistical fault localization against dynamic slicing. For single faults, we find that dynamic slicing was eight percentage points more effective than the best performing statistical debugging formula; for 66% of the bugs, dynamic slicing finds the fault earlier than the best performing statistical debugging formula. In our evaluation, dynamic slicing is more effective for programs with single fault, but statistical debugging performs better on multiple faults. Best results, however, are obtained by a hybrid approach: If programmers first examine at most the top five most suspicious locations from statistical debugging, and then switch to dynamic slices, on average, they will need to examine 15% (30 lines) of the code. These findings hold for 18 most effective statistical debugging formulas and our results are independent of the number of faults (i.e. single or multiple faults) and error type (i.e. artificial or real errors).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. In our evaluation, dynamic slicing is more effective than SBFL on single faults. However, other factors such as multiple faults (see RQ7), test generation (Yang et al. 2017), test reduction (Yu et al. 2008) and program sizes may influence its effectiveness (see RQ6 and Section 6).

  2. The scores for the faulty statement in Line 8 are tarantula\((s_{8})=\frac {1}{1}/\left (\frac {1}{1}+\frac {1}{5}\right )\), ochiai\((s_{8})=\frac {1}{\sqrt {1(1+1)}}\), and naish2\((s_{8})=1 - \frac {1}{1+4+1}\).

  3. In ordinal ranking, lines with the same score are ranked by line number.

  4. This is to avoid duplication of inspected statements, i.e. avoid double inspection.

  5. http://frama-c.com/

  6. https://pygraphviz.github.io/

  7. https://networkx.github.io/

  8. http://matplotlib.org/

  9. https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

  10. https://git-scm.com/docs/git-diff

  11. https://www.gnu.org/software/gdb/documentation/

  12. The executable statements refers to statements for which coverage information are obtainable by Gcov, in particular, all program statements except spaces, blanks and comments.

  13. https://codeforces.com/

  14. http://www.comp.nus.edu.sg/~release/corebench/

  15. To the best of our knowledge, there is no known benchmark of real-world programs containing multiple faults.

  16. Note that all executable program statements are ranked in the suspiciousness rank, executable statements that are not contained in the dynamic slice are ranked lowest.

  17. Ranking ties are broken in ascending order, i.e. if both lines 10 and 50 have the same score, then line number 10 is ranked above line number 50.

  18. In this case, when several statements have the same rank as the faulty statement, we made the conservative assumption that a developer finds the faulty statement among other statements with the same rank.

  19. Further evaluations (on single faults) in this paper use Kulczynski2 as the default “statistical debugging” formula.

  20. Percentage improvement is measured as \(\frac {1-0.794}{1-0.737}\). Note that score by itself gives the number of statements that need not be examined.

  21. Further evaluations of the hybrid approach use the best values of N (i.e. N = 2 and N = 5).

  22. The SIR benchmark is the most used subject for the evaluation of AFL techniques, especially statistical fault localization (Wong et al. 2016).

  23. Table 4 shows that Tarantula and Kulczynski2 performed best on small programs with effectiveness scores 0.76 and 0.70 for IntroClass and Codeflaws, respectively. Despite the fact that DStar performs best on large programs (i.e. CoREBench) by slightly outperforming Tarantula and Kulczynski2 (0.80 vs. 0.79); it performed significantly worse than Tarantula and Kulczynski2 on small programs, with effectiveness score 0.62 and 0.56 for IntroClass and Codeflaws, respectively.

References

  • Abreu R, Zoeteweij P, van Gemund AJC (2006) An evaluation of similarity coefficients for software fault localization. In: Proceedings of the 12th Pacific Rim international symposium on dependable computing, PRDC’06, pp 39–46

  • Abreu R, Zoeteweij P, van Gemund AJC (2007) On the accuracy of spectrum-based fault localization. In: Proceedings of the testing: academic and industrial conference practice and research techniques—MUTATION, TAICPART-MUTATION ’07, pp 89–98

  • Abreu R, Zoeteweij P, Golsteijn R, van Gemund A J C (2009a) A practical evaluation of spectrum-based fault localization. J Syst Softw 82 (11):1780–1792

    Article  Google Scholar 

  • Abreu R, Zoeteweij P, Van Gemund AJ (2009b) Spectrum-based multiple fault localization. In: 2009 IEEE/ACM international conference on automated software engineering. IEEE, pp 88–99

  • Agrawal H, Horgan JR (1990) Dynamic program slicing. In: Proceedings of the ACM SIGPLAN 1990 conference on programming language design and implementation, PLDI ’90, pp 246–256

  • Agrawal H, DeMillo R A, Spafford E H (1990) Efficient debugging with slicing and backtracking. Tech. rep., Purdue University

  • Agrawal H, Horgan JR, Krauser EW, London S (1993) Incremental regression testing. In: Proceedings of the conference on software maintenance, ICSM ’93, pp 348–357

  • Alves E, Gligoric M, Jagannath V, d’Amorim M (2011) Fault-localization using dynamic slicing and change impact analysis. In: 2011 26th IEEE/ACM international conference on automated software engineering (ASE 2011). IEEE, pp 520–523

  • Assiri F Y, Bieman J M (2017) Fault localization for automated program repair: effectiveness, performance, repair correctness. Software Quality Journal 25 (1):171–199. Springer

    Article  Google Scholar 

  • Binkley D, Gold N, Harman M (2007) An empirical study of static program slice size. ACM Trans Softw Eng Methodol 16(2)

  • Böhme M, Roychoudhury A (2014) Corebench: studying complexity of regression errors. In: Proceedings of the 23rd ACM/SIGSOFT international symposium on software testing and analysis, ISSTA, pp 105–115

  • Böhme M, Soremekun EO, Chattopadhyay S, Ugherughe E, Zeller A (2017) Where is the bug and how is it fixed? An experiment with practitioners. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 117–128

  • Burger M, Zeller A (2011) Minimizing reproduction of software failures. In: Proceedings of the 2011 international symposium on software testing and analysis, ISSTA ’11, pp 221–231

  • Chen MY, Kiciman E, Fratkin E, Fox A, Brewer E (2002) Pinpoint: problem determination in large, dynamic internet services. In: Proceedings of the 2002 international conference on dependable systems and networks, DSN ’02, pp 595–604

  • Christi A, Olson M L, Alipour M A, Groce A (2018) Reduce before you localize: delta-debugging and spectrum-based fault localization. In: 2018 IEEE international symposium on software reliability engineering workshops. IEEE, ISSREW, pp 184–191

  • DiGiuseppe N, Jones JA (2011) On the influence of multiple faults on coverage-based fault localization. In: Proceedings of the 2011 international symposium on software testing and analysis, pp 210–220

  • Gopinath R, Kampmann A, Havrikov N, Soremekun EO, Zeller A (2020) Abstracting failure-inducing inputs. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, pp 237–248

  • Grissom R, Kim J (2005) Effect sizes for research: a broad practical approach. Lawrence Erlbaum

  • Guo A, Mao X, Yang D, Wang S (2018) An empirical study on the effect of dynamic slicing on automated program repair efficiency. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 554–558

  • Gyimóthy T, Beszédes A, Forgács I (1999) An efficient relevant slicing method for debugging. In: Proceedings of the 7th European software engineering conference held jointly with the 7th ACM SIGSOFT international symposium on foundations of software engineering, ESEC/FSE-7, pp 303–321

  • Hammoudi M, Burg B, Bae G, Rothermel G (2015) On the use of delta debugging to reduce recordings and facilitate debugging of web applications. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering. ESEC/FSE 2015. https://doi.org/10.1145/2786805.2786846. ACM, New York, pp 333–344

  • Heiden S, Grunske L, Kehrer T, Keller F, van Hoorn A, Filieri A, Lo D (2019) An evaluation of pure spectrum-based fault localization techniques for large-scale software systems. Softw: Pract Exp 49(8):1197–1224

    Google Scholar 

  • Hofer B, Wotawa F (2012) Spectrum enhanced dynamic slicing for better fault localization. In: ECAI, vol 12, pp 420–425

  • Hutchins M, Foster H, Goradia T, Ostrand T (1994) Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of 16th international conference on software engineering. IEEE, pp 191–200

  • Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM international conference on automated software engineering, ASE ’05, pp 273–282

  • Jones JA, Harrold MJ, Stasko J (2002) Visualization of test information to assist fault localization. In: Proceedings of the 24th international conference on software engineering, ICSE ’02, pp 467–477

  • Jose M, Majumdar R (2011) Cause clue clauses: error localization using maximum satisfiability. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation, PLDI ’11. https://doi.org/10.1145/1993498.1993550. ACM, New York, pp 437–446

  • Kampmann A, Havrikov N, Soremekun EO, Zeller A (2020) When does my program do this? Learning circumstances of software behavior. In: Proceedings of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1228–1239

  • Keller F, Grunske L, Heiden S, Filieri A, van Hoorn A, Lo D (2017) A critical evaluation of spectrum-based fault localization techniques on a large-scale software system. In: 2017 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 114–125

  • Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 802–811

  • Kirschner L, Soremekun E, Zeller A (2020) Debugging inputs. In: 2020 IEEE/ACM 42nd international conference on software engineering (ICSE). IEEE

  • Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176

  • Korel B, Laski J (1988) Dynamic program slicing. Inf Process Lett 29(3):155–163

    Article  Google Scholar 

  • Landsberg D (2016) Methods and measures for statistical fault localisation. PhD thesis, University of Oxford

  • Landsberg D, Chockler H, Kroening D, Lewis M (2015) Evaluation of measures for statistical fault localisation and an optimising scheme. In: International conference on fundamental approaches to software engineering. Springer, pp 115–129

  • Le TDB, Lo D, Le Goues C, Grunske L (2016) A learning-to-rank based fault localization approach using likely invariants. In: Proceedings of the 25th international symposium on software testing and analysis, pp 177–188

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2012) Genprog: a generic method for automatic software repair. IEEE Trans Softw Eng 38(1):54–72

    Article  Google Scholar 

  • Le Goues C, Holtschulte N, Smith E K, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The manybugs and introclass benchmarks for automated repair of c programs. IEEE Trans Softw Eng 41(12):1236– 1256

    Article  Google Scholar 

  • Lei Y, Mao X, Dai Z, Wang C (2012) Effective statistical fault localization using program slices. In: 2012 IEEE 36th annual computer software and applications conference. IEEE, pp 1–10

  • Li X, Orso A (2020) More accurate dynamic slicing for better supporting software debugging. In: 2020 IEEE 13th international conference on software testing, validation and verification (ICST). IEEE, pp 28–38

  • Liblit B, Naik M, Zheng AX, Aiken A, Jordan MI (2005) Scalable statistical bug isolation. In: Proceedings of the 2005 ACM SIGPLAN conference on programming language design and implementation, PLDI ’05, pp 15–26

  • Lin Y, Sun J, Tran L, Bai G, Wang H, Dong J (2018) Break the dead end of dynamic slicing: localizing data and control omission bug. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 509–519

  • Liu B, Lucia Nejati S, Briand L C, Bruckmann T (2016) Simulink fault localization: an iterative statistical debugging approach. Softw Test Verif Reliab 26(6):431–459

    Article  Google Scholar 

  • Liu B, Nejati S, Briand L C, et al. (2017) Improving fault localization for simulink models using search-based testing and prediction models. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 359–370

  • Mann H B, Whitney D R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 50–60

  • Naish L, Lee HJ (2013) Duals in spectral fault localization. In: 2013 22nd Australian software engineering conference. IEEE, pp 51–59

  • Naish L, Lee H J, Ramamohanarao K (2011) A model for spectra-based software diagnosis. ACM Trans Softw Eng Methodol (TOSEM) 20(3):11

    Article  Google Scholar 

  • Nguyen HDT, Qi D, Roychoudhury A, Chandra S (2013) Semfix: program repair via semantic analysis. In: Proceedings of the 2013 international conference on software engineering, ICSE ’13, pp 772–781

  • Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis, ISSTA ’11. https://doi.org/10.1145/2001420.2001445. ACM, New York, pp 199–209

  • Pearson S, Campos J, Just R, Fraser G, Abreu R, Ernst M D, Pang D, Keller B (2017) Evaluating and improving fault localization. In: 2017 IEEE/ACM 39th International conference on software engineering (ICSE). IEEE, pp 609–620

  • Perez A, Abreu R, d’Amorim M (2017) Prevalence of single-fault fixes and its impact on fault localization. In: 2017 IEEE international conference on software testing, verification and validation (ICST). IEEE, pp 12–22

  • Qi D, Nguyen H D T, Roychoudhury A (2013) Path exploration based on symbolic output. ACM Trans Softw Eng Methodol 22(4):32:1–32:41

    Article  Google Scholar 

  • Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The strength of random search on automated program repair. In: Proceedings of the 36th international conference on software engineering, ICSE 2014, pp 254– 265

  • Reis S, Abreu R, d’Amorim M (2019) Demystifying the combination of dynamic slicing and spectrum-based fault localization. In: IJCAI, pp 4760–4766

  • Renieris M, Reiss SP (2003) Fault localization with nearest neighbor queries. In: Proceedings of the 18th IEEE international conference on automated software engineering. IEEE, pp 30–39

  • Russel P F, Rao T R, et al. (1940) On habitat and association of species of anopheline larvae in south-eastern Madras. Journal of the Malaria Institute of India 3:153–178

    Google Scholar 

  • Shu T, Wang L, Xia J (2017) Fault localization using a failed execution slice. In: 2017 International conference on software analysis, testing and evolution (SATE). IEEE, pp 37–44

  • Steimann F, Frenkel M, Abreu R (2013) Threats to the validity and value of empirical assessments of the accuracy of coverage-based fault locators. In: Proceedings of the 2013 international symposium on software testing and analysis, ISSTA 2013, pp 314–324

  • Tan S H, Yi J, Mechtaev S, Roychoudhury A, et al. (2017) Codeflaws: a programming competition benchmark for evaluating automated program repair tools. In: 2017 IEEE/ACM 39th international conference on software engineering companion (ICSE-C). IEEE, pp 180–182

  • Tip F (1995) A survey of program slicing techniques. J Program Lang 3(3):121–189

    Google Scholar 

  • Weiser M (1981) Program slicing. In: Proceedings of the 5th international conference on software engineering, ICSE ’81, pp 439–449

  • Weiser M (1982) Programmers use slices when debugging. Commun ACM 25(7):446–452

    Article  Google Scholar 

  • Wong W E, Qi Y, Zhao L, Cai K Y (2007) Effective fault localization using code coverage. In: Computer software and applications conference, 2007. COMPSAC 2007. 31st annual international, vol 1. IEEE, pp 449–456

  • Wong WE, Debroy V, Li Y, Gao R (2012) Software fault localization using dstar (d*). In: 2012 IEEE sixth international conference on software security and reliability. IEEE, pp 21–30

  • Wong W E, Debroy V, Gao R, Li Y (2013) The dstar method for effective software fault localization. IEEE Trans Reliab 63(1):290–308

    Article  Google Scholar 

  • Wong W E, Gao R, Li Y, Abreu R, Wotawa F (2016) A survey on software fault localization. IEEE Trans Softw Eng p preprint

  • Wotawa F (2010) Fault localization based on dynamic slicing and hitting-set computation. In: 2010 10th international conference on quality software. IEEE, pp 161–170

  • Wotawa F, Stumptner M, Mayer W (2002) Model-based debugging or how to diagnose programs automatically. In: Proceedings of the 15th international conference on industrial and engineering applications of artificial intelligence and expert systems: developments in applied artificial intelligence, IEA/AIE ’02. http://dl.acm.org/citation.cfm?id=646864.708248. Springer, London, pp 746–757

  • Xie X, Chen T Y, Kuo F C, Xu B (2013a) A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization. ACM Trans Softw Eng Methodol (TOSEM) 22(4):31

    Article  Google Scholar 

  • Xie X, Kuo FC, Chen TY, Yoo S, Harman M (2013b) Provably optimal and human-competitive results in sbse for spectrum based fault localisation. In: International symposium on search based software engineering. Springer, pp 224–238

  • Yang J, Zhikhartsev A, Liu Y, Tan L (2017) Better test cases for better automated program repair. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 831–841

  • Yoo S (2012) Evolving human competitive spectra-based fault localisation techniques. In: International symposium on search based software engineering. Springer, pp 244–258

  • Yu Y, Jones J, Harrold MJ (2008) An empirical study of the effects of test-suite reduction on fault localization. In: 2008 ACM/IEEE 30th international conference on software engineering. IEEE, pp 201–210

  • Zeller A, Hildebrandt R (2002) Simplifying and isolating failure-inducing input. IEEE Trans Softw Eng 28(2):183–200. https://doi.org/10.1109/32.988498

    Article  Google Scholar 

  • Zhang X, He H, Gupta N, Gupta R (2005) Experimental evaluation of using dynamic slices for fault location. In: Proceedings of the sixth international symposium on automated analysis-driven debugging, pp 33–42

  • Zhang X, Gupta N, Gupta R (2007) A study of effectiveness of dynamic slicing in locating real faults. Empir Softw Eng 12(2):143–160

    Article  Google Scholar 

  • Zhang M, Li X, Zhang L, Khurshid S (2017) Boosting spectrum-based fault localization using pagerank. In: Proceedings of the 26th ACM SIGSOFT international symposium on software testing and analysis, pp 261–272

  • Zheng AX, Jordan MI, Liblit B, Aiken A (2003) Statistical debugging of sampled programs. In: Advances in neural information processing systems, pp 9–11

  • Zheng AX, Jordan MI, Liblit B, Naik M, Aiken A (2006) Statistical debugging: simultaneous identification of multiple bugs. In: Proceedings of the 23rd international conference on machine learning, pp 1105–1112

  • Zou D, Liang J, Xiong Y, Ernst MD, Zhang L (2019) An empirical study of fault localization families and their combinations. IEEE Trans Softw Eng 47(2, Feb. 1 2021):332–347

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their helpful comments. This work was funded by Deutsche Forschungsgemeinschaft, Project “Extracting and Mining of Probabilistic Event Structures from Software Systems (EMPRESS)”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ezekiel Soremekun.

Additional information

Communicated by: Hadi Hemmati

Additional material

All of our scripts, tools, benchmarks, and results are freely available as an artifact, in order to support scrutiny, evaluation, reproduction, and extension: https://tinyurl.com/HybridFaultLocalization

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Soremekun, E., Kirschner, L., Böhme, M. et al. Locating faults with program slicing: an empirical analysis. Empir Software Eng 26, 51 (2021). https://doi.org/10.1007/s10664-020-09931-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-020-09931-7

Keywords

Navigation