Skip to main content

Learning a graph-based classifier for fault localization

Abstract

Because software emerged, locating software faults has been intensively researched, culminating in various approaches and tools that have been applied in real development. Despite the success of these developments, improved tools are still demanded by programmers. Meanwhile, some programmers are reluctant to use any tools when locating faults in their development. The state-of-the-art situation can be naturally improved by learning how programmers locate faults. The rapid development of open-source software has accumulated many bug fixes. A bug fix is a specific type of comments containing a set of buggy files and their corresponding fixed files, which reveal how programmers repair bugs. Feasibly, an automatic model can learn fault locations from bug fixes, but prior attempts to achieve this vision have been prevented by various technical challenges. For example, most bug fixes are not compilable after checking out, which hinders analyzing bug fixes by most advanced static/dynamic tools. This paper proposes an approach called ClaFa that trains a graph-based fault classifier from bug fixes. ClaFa is built on a recent partial-code tool called Grapa, which enables the analysis of partial programs by the complete code tool called WALA. Once Grapa has built a program dependency graph from a bug fix, ClaFa compares the graph from the buggy code with the graph from the fixed code, locates the buggy nodes, and extracts the various graph features of the buggy and clean nodes. Based on the extraction result, ClaFa trains a classifier that combines Adaboost and decision tree learning. The trained ClaFa can predict whether a node of a program dependency graph is buggy or clean. We evaluate ClaFa on thousands of buggy files collected from four open-source projects: Aries, Mahout, Derby, and Cassandra. The f-scores of ClaFa achieves are approximately 80% on all projects.

This is a preview of subscription content, access via your institution.

References

  1. Hovemeyer D, Pugh W. Finding bugs is easy. In: Proceedings of Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2004. 132–136

    Google Scholar 

  2. DiGiuseppe N, Jones J A. On the influence of multiple faults on coverage-based fault localization. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA), 2011. 210–220

    Google Scholar 

  3. Abreu R, Zoeteweij P, van Gemund A J C. Spectrum-based multiple fault localization. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, 2009. 88–99

    Chapter  Google Scholar 

  4. Do H, Elbaum S, Rothermel G. Supporting controlled experimentation with testing techniques: an infrastructure and its potential impact. Empir Softw Eng, 2005, 10: 405–435

    Article  Google Scholar 

  5. Wang Q, Parnin C, Orso A. Evaluating the usefulness of IR-based fault localization techniques. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA), 2015. 1–11

    Google Scholar 

  6. Johnson B, Song Y, Murphy-Hill E, et al. Why don’t software developers use static analysis tools to find bugs? In: Proceedings of the International Conference on Software Engineering (ICSE), 2013. 672–681

    Google Scholar 

  7. Rochkind M J. The source code control system. IEEE Trans Softw Eng, 1975, 1: 364–370

    Article  Google Scholar 

  8. Wu R, Zhang H, Kim S, et al. Relink: recovering links between bugs and changes. In: Proceedings of ESEC/FSE, 2011. 15–25

    Google Scholar 

  9. Tian Y, Lawall J, Lo D. Identifying linux bug fixing patches. In: Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012. 386–396

    Google Scholar 

  10. Mei H, Zhang L. Can big data bring a breakthrough for software automation? Sci China Inf Sci, 2018, 61: 056101

    Article  Google Scholar 

  11. Guo P J, Zimmermann T, Nagappan N, et al. Characterizing and predicting which bugs get fixed: an empirical study of microsoft windows. In: Proceedings of the International Conference on Software Engineering (ICSE), 2010. 495–504

    Google Scholar 

  12. Zhong H, Su Z. An empirical study on real bug fixes. In: Proceedings of the International Conference on Software Engineering (ICSE), 2015. 913–923

    Google Scholar 

  13. Martinez M, Monperrus M. Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng, 2015, 20: 176–205

    Article  Google Scholar 

  14. Rahm E, Do H H. Data cleaning: problems and current approaches. IEEE Data Eng Bullet, 2000, 23: 3–13

    Google Scholar 

  15. Ottenstein K J, Ottenstein L M. The program dependence graph in a software development environment. ACM SIGPLAN Not, 1984, 19: 177–184

    Article  Google Scholar 

  16. Tufano M, Palomba F, Bavota G, et al. There and back again: can you compile that snapshot? J Softw Evol Proc, 2017, 29: e1838

    Article  Google Scholar 

  17. Hsu H-Y, Jones J A, Orso A. Rapid: identifying bug signatures to support debugging activities. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering, 2008. 439–442

    Google Scholar 

  18. Sun C, Khoo S-C. Mining succinct predicated bug signatures. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, 2013. 576–586

    Chapter  Google Scholar 

  19. Hutchins M, Foster H, Goradia T, et al. Experiments of the effectiveness of dataflow-and controlflow-based test adequacy criteria. In: Proceedings of the International Conference on Software Engineering (ICSE), 1994. 191–200

    Chapter  Google Scholar 

  20. Li J, Ernst M D. CBCD: cloned buggy code detector. In: Proceedings of the International Conference on Software Engineering (ICSE), 2012. 310–320

    Google Scholar 

  21. Fluri B, Wuersch M, PInzger M, et al. Change distilling: tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng, 2007, 33: 725–743

    Article  Google Scholar 

  22. Mishne A, Shoham S, Yahav E. Typestate-based semantic code search over partial programs. In: Proceedings of Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2012. 997–1016

    Chapter  Google Scholar 

  23. Dagenais B, Hendren L J. Enabling static analysis for partial Java programs. In: Proceedings of Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), 2008. 313–328

    Google Scholar 

  24. Yang Q, Wu X. 10 challenging problems in data mining research. Int J Info Tech Dec Mak, 2006, 05: 597–604

    Article  Google Scholar 

  25. Zhong H, Wang X. Boosting complete-code tools for partial program. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering, 2017. 671–681

    Google Scholar 

  26. Zhong H, Meng N. Towards reusing hints from past fixes. Empir Softw Eng, 2018, 23: 2521–2549

    Article  Google Scholar 

  27. Wang Y, Meng N, Zhong H. An empirical study of multi-entity changes in real bug fixes. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), 2018

    Google Scholar 

  28. Kim D S, Tao Y D, Kim S H, et al. Where should we fix this bug? A two-phase recommendation model. IEEE Trans Softw Eng, 2013, 39: 1597–1610

    Article  Google Scholar 

  29. Hao D, Xie T, Zhang L, et al. Test input reduction for result inspection to facilitate fault localization. Autom Softw Eng, 2010, 17: 5–31

    Article  Google Scholar 

  30. Pearson S, Campos J, Just R, et al. Evaluating and improving fault localization. In: Proceedings of the International Conference on Software Engineering (ICSE), 2017. 609–620

    Google Scholar 

  31. Berglund A, Boag S, Chamberlin D, et al. XML path language (xpath). World Wide Web Consortium (W3C), 2003

    Google Scholar 

  32. Lovins J B. Development of a stemming algorithm. Mech Transl Comput Linguist, 1968, 11: 1–10

    Google Scholar 

  33. Newman D, Asuncion A, Smyth P, et al. Distributed algorithms for topic models. J Mach Learn Res, 2009, 10: 1801–1828

    MathSciNet  MATH  Google Scholar 

  34. Nguyen A T, Nguyen T T, Al-Kofahi J, et al. A topic-based approach for narrowing the search space of buggy files from a bug report. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering, 2011. 263–272

    Google Scholar 

  35. Kuhn H W. The Hungarian method for the assignment problem. Naval Res Logist, 1955, 2: 83–97

    MathSciNet  MATH  Article  Google Scholar 

  36. Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. 2013. ArXiv: 1301.3781

    Google Scholar 

  37. Gu Z, Barr E T, Hamilton D J, et al. Has the bug really been fixed? In: Proceedings of the 32nd International Conference on Software Engineering (ICSE), 2010. 55–64

    Google Scholar 

  38. He H B, Garcia E A. Learning from imbalanced data. IEEE Trans Knowl Data Eng, 2009, 21: 1263–1284

    Article  Google Scholar 

  39. Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16: 321–357

    MATH  Article  Google Scholar 

  40. Liu X-Y, Wu J X, Zhou Z-H. Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B, 2009, 39: 539–550

    Article  Google Scholar 

  41. Mease D, Wyner A J, Buja A. Boosted classification trees and class probability/quantile estimation. J Mach Learn Res, 2007, 8: 409–439

    MATH  Google Scholar 

  42. Sun Y, Kamel M S, Wong A K C, et al. Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn, 2007, 40: 3358–3378

    MATH  Article  Google Scholar 

  43. Weiss G M. Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsletter, 2004, 6: 7–19

    Article  Google Scholar 

  44. Frank E. Pruning decision trees and lists. Dissertation for Ph.D. Degree. Hamilton: University of Waikato, 2000

    Google Scholar 

  45. Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning, San Francisco, 1996. 148–156

    Google Scholar 

  46. Di Nucci D, Palomba F, Tamburri D A, et al. Detecting code smells using machine learning techniques: are we there yet? In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution, and Reengineering, 2018. 612–621

    Google Scholar 

  47. Di Nucci D, Palomba F, de Rosa G, et al. A developer centered bug prediction model. IEEE Trans Softw Eng, 2018, 44: 5–24

    Article  Google Scholar 

  48. Hassan A E. Predicting faults using the complexity of code changes. In: Proceedings of the International Conference on Software Engineering (ICSE), 2009. 78–88

    Google Scholar 

  49. Lucia L, Lo D, Jiang L, et al. Extended comprehensive study of association measures for fault localization. J Softw Evol Proc, 2014, 26: 172–219

    Article  Google Scholar 

  50. Di Giuseppe N, Jones J A. Fault density, fault types, and spectra-based fault localization. Empir Softw Eng, 2015, 20: 928–967

    Article  Google Scholar 

  51. Wang S, Liu T, Tan L. Automatically learning semantic features for defect prediction. In: Proceedings of the International Conference on Software Engineering (ICSE), 2016. 297–308

    Google Scholar 

  52. Benesty J, Chen J, Huang Y, et al. Pearson correlation coefficient. In: Noise Reduction in Speech Processing. Berlin: Springer, 2009. 1–4

    Google Scholar 

  53. Hall M A. Correlation-based feature selection for machine learning. Dissertation for Ph.D. Degree. 1999

    Google Scholar 

  54. Zhong H, Zhang L, Xie T, et al. Inferring resource specifications from natural language API documentation. In: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering, 2009. 307–318

    Google Scholar 

  55. Platt J C. Fast training of support vector machines using sequential minimal optimization. Advances in Kernel Methods, 1999. 185–208

    Google Scholar 

  56. Suykens J A K, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett, 1999, 9: 293–300

    Article  Google Scholar 

  57. John G H, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, 1995. 338–345

    Google Scholar 

  58. Kohavi R. The power of decision tables. In: Proceedings of the 8th European Conference on Machine Learning, 1995. 174–189

    Google Scholar 

  59. Le Cessie S, van Houwelingen J C. Ridge estimators in logistic regression. Appl Stat, 1992, 41: 191–201

    MATH  Article  Google Scholar 

  60. Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett, 2006, 27: 861–874

    Article  Google Scholar 

  61. Flach P A, Wu S. Repairing concavities in roc curves. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, 2005. 702–707

    Google Scholar 

  62. Ghotra B, McIntosh S, Hassan A E. Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the International Conference on Software Engineering (ICSE), 2015. 789–800

    Google Scholar 

  63. Hall T, Beecham S, Bowes D, et al. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 2012, 38: 1276–1304

    Article  Google Scholar 

  64. Rao S, Kak A. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models. In: Proceedings of the 8th International Working Conference on Mining Software Repositories, 2011. 43–52

    Chapter  Google Scholar 

  65. Zhou J, Zhang H, Lo D. Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the International Conference on Software Engineering (ICSE), 2012. 14–24

    Google Scholar 

  66. Wong C, Xiong Y, Zhang H, et al. Boosting bug-report-oriented fault localization with segmentation and stack-trace analysis. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), 2014. 181–190

    Google Scholar 

  67. Sisman B, Kak A C. Incorporating version histories in information retrieval based bug localization. In: Proceedings of 9th IEEE Working Conference on Mining Software Repositories, 2012. 50–59

    Google Scholar 

  68. Kim S, Zimmermann T, Whitehead Jr E J, et al. Predicting faults from cached history. In: Proceedings of the 29th International Conference on Software Engineering (ICSE), 2007. 489–498

    Chapter  Google Scholar 

  69. Bachmann A, Bird C, Rahman F, et al. The missing links: bugs and bug-fix commits. In: Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2010. 97–106

    Google Scholar 

  70. Antoniol G, Ayari K, Di Penta M D, et al. Is it a bug or an enhancement? a text-based approach to classify change requests. In: Proceedings of Conference of the Center for Advanced Studies on Collaborative Research, 2008. 304–318

    Google Scholar 

  71. Herzig K, Just S, Zeller A. It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the International Conference on Software Engineering (ICSE), 2013. 392–401

    Google Scholar 

  72. Weimer W, Nguyen T, Le Goues C, et al. Automatically finding patches using genetic programming. In: Proceedings of the International Conference on Software Engineering (ICSE), 2009. 364–374

    Google Scholar 

  73. Qi Y, Mao X, Lei Y, et al. The strength of random search on automated program repair. In: Proceedings of the 36th International Conference on Software Engineering (ICSE), 2014. 254–265

    Chapter  Google Scholar 

  74. Sarro F, Di Martino S, Ferrucci F, et al. A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, 2012. 1215–1220

    Chapter  Google Scholar 

  75. Tantithamthavorn C, McIntosh S, Hassan A E, et al. Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the International Conference on Software Engineering (ICSE), 2016. 321–332

    Google Scholar 

  76. Thornton C, Hutter F, Hoos H H, et al. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013. 847–855

    Chapter  Google Scholar 

  77. Tantithamthavorn C, McIntosh S, Hassan A E, et al. The impact of automated parameter optimization on defect prediction models. IEEE Trans Softw Eng, 2019, 45: 683–711

    Article  Google Scholar 

  78. Le T-D B, Oentaryo R J, Lo D. Information retrieval and spectrum based bug localization: better together. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015. 579–590

    Chapter  Google Scholar 

  79. Shapiro E. Algorithmic program debugging. Dissertation for Ph.D. Degree. New Haven: Yale University, 1983

    MATH  Google Scholar 

  80. Wong W E, Gao R, Li Y, et al. A survey on software fault localization. IEEE Trans Softw Eng, 2016, 42: 707–740

    Article  Google Scholar 

  81. Jones J A, Harrold M J, Stasko J. Visualization of test information to assist fault localization. In: Proceedings of the International Conference on Software Engineering (ICSE), 2002. 467–477

    Google Scholar 

  82. Naish L, Lee H J, Ramamohanarao K. A model for spectra-based software diagnosis. ACM Trans Softw Eng Methodol, 2011, 20: 1–32

    Article  Google Scholar 

  83. Wong W E, Debroy V, Xu D. Towards better fault localization: a crosstab-based statistical approach. IEEE Trans Syst Man Cybern C, 2012, 42: 378–396

    Article  Google Scholar 

  84. Abreu R, Zoeteweij P, van Gemund A J C. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the 12th Pacific Rim International Symposium on Dependable Computing, 2006. 39–46

    Google Scholar 

  85. Abreu R, Zoeteweij P, Golsteijn R, et al. A practical evaluation of spectrum-based fault localization. J Syst Softw, 2009, 82: 1780–1792

    Article  Google Scholar 

  86. Wong W E, Qi Y. BP neural network-based effective fault localization. Int J Soft Eng Knowl Eng, 2009, 19: 573–597

    Article  Google Scholar 

  87. Mao X, Lei Y, Dai Z, et al. Slice-based statistical fault localization. J Syst Softw, 2014, 89: 51–62

    Article  Google Scholar 

  88. Dickinson W, Leon D, Podgurski A. Finding failures by cluster analysis of execution profiles. In: Proceedings of the International Conference on Software Engineering (ICSE), 2001. 339–348

    Google Scholar 

  89. Gao R, Wong W E. MSeer-an advanced technique for locating multiple bugs in parallel. IEEE Trans Softw Eng, 2019, 45: 301–318

    Article  Google Scholar 

  90. Debroy V, Wong W E. Insights on fault interference for programs with multiple bugs. In: Proceedings of IEEE International Conference on Software Reliability Engineering, 2009. 165–174

    Google Scholar 

  91. Perez A, Abreu R, d’Amorim M. Prevalence of single-fault fixes and its impact on fault localization. In: Proceedings of IEEE International Conference on Software Testing, 2017. 12–22

    Google Scholar 

  92. Just R, Parnin C, Drosos I, et al. Comparing developer-provided to user-provided tests for fault localization and automated program repair. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA), 2018. 287–297

    Google Scholar 

  93. Campos J, Abreu R, Fraser G, et al. Entropy-based test generation for improved fault localization. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013. 257–267

    Google Scholar 

  94. Perez A, Abreu R, van Deursen A. A test-suite diagnosability metric for spectrum-based fault localization approaches. In: Proceedings of the International Conference on Software Engineering (ICSE), 2017. 654–664

    Google Scholar 

  95. Lukins S K, Kraft N A, Etzkorn L H. Bug localization using latent dirichlet allocation. Inf Softw Tech, 2010, 52: 972–990

    Article  Google Scholar 

  96. Wang S, Lo D, Lawall J. Compositional vector space models for improved bug localization. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), 2014. 171–180

    Google Scholar 

  97. Saha R K, Lease M, Khurshid S, et al. Improving bug localization using structured information retrieval. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013. 345–355

    Google Scholar 

  98. Wang S, Lo D. AmaLgam+: composing rich information sources for accurate bug localization. J Softw Evol Proc, 2016, 28: 921–942

    Article  Google Scholar 

  99. Ammons G, Bod´ık R, Larus J R. Mining specifications. In: Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2002. 4–16

    Chapter  Google Scholar 

  100. Pandita R, Xiao X, Zhong H, et al. Inferring method specifications from natural language API descriptions. In: Proceedings of the 34th International Conference on Software Engineering (ICSE), 2012. 815–825

    Google Scholar 

  101. Nguyen T T, Nguyen H A, Pham N H, et al. Graph-based mining of multiple object usage patterns. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 2009. 383–392

    Chapter  Google Scholar 

  102. Nguyen H V, Nguyen H A, Nguyen A T, et al. Mining interprocedural, data-oriented usage patterns in JavaScript web applications. In: Proceedings of the International Conference on Software Engineering (ICSE), 2014. 791–802

    Chapter  Google Scholar 

  103. Corbett J C, Dwyer M B, Hatcliff J, et al. Bandera: Extracting finite-state models from Java source code. In: Proceedings of the 22nd International Conference on Software Engineering (ICSE), 2000. 439–448

    Google Scholar 

  104. Robillard M P, Bodden E, Kawrykow D, et al. Automated API property inference techniques. IEEE Trans Softw Eng, 2013, 39: 613–637

    Article  Google Scholar 

  105. Li Z, Zhou Y. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. In: Proceedings of the 10th European Software Engineering Conference Held Jointly With 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005. 306–315

    Chapter  Google Scholar 

  106. Saied A, Benomar O, Abdeen H, et al. Mining multi-level API usage patterns. In: Proceedings of IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), 2015. 23–32

    Google Scholar 

  107. Engler D, Chen D, Chou A. Bugs as inconsistent behavior: a general approach to inferring errors in systems code. In: Proceedings of 18th Symposium on Operating Systems Principles, 2001. 57–72

    Chapter  Google Scholar 

  108. Wasylkowski A, Zeller A, Lindig C. Detecting object usage anomalies. In: Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, 2007. 35–44

    Google Scholar 

  109. Ramanathan M, Grama A, Jagannathan S. Path-sensitive inference of function precedence protocols. In: Proceedings of the 29th International Conference on Software Engineering (ICSE), 2007. 240–250

    Chapter  Google Scholar 

  110. Maoz S, Ringert J O. GR(1) synthesis for LTL specification patterns. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, 2015. 96–106

    Google Scholar 

  111. Lemieux C, Park D, Beschastnikh I. General LTL specification mining. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015. 81–92

    Google Scholar 

  112. Agrawal R, Srikant R. Mining sequential patterns. In: Proceedings of the 11th International Conference on Data Engineering, 1995. 3–14

    Chapter  Google Scholar 

  113. Ernst M D, Perkins J H, Guo P J, et al. The Daikon system for dynamic detection of likely invariants. Sci Comput Programm, 2007, 69: 35–45

    MathSciNet  MATH  Article  Google Scholar 

  114. Le T, Le X, Lo D, et al. Synergizing specification miners through model fissions and fusions. In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015. 115–125

    Google Scholar 

  115. Dallmeier V, Knopp N, Mallon C, et al. Generating test cases for specification mining. In: Proceedings of the 19th International Symposium on Software Testing and Analysis, 2010. 85–96

    Chapter  Google Scholar 

  116. Pradel M, Gross T R. Leveraging test generation and specification mining for automated bug detection without false positives. In: Proceedings of the International Conference on Software Engineering (ICSE), 2012. 288–298

    Google Scholar 

  117. Br¨unink M, Rosenblum D S. Mining performance specifications. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2016. 39–49

    Google Scholar 

  118. Pham N H, Nguyen T T, Nguyen H A, et al. Detecting recurring and similar software vulnerabilities. In: Proceedings of the International Conference on Software Engineering (ICSE), 2010. 227–230

    Google Scholar 

  119. Cheng H, Lo D, Zhou Y, et al. Identifying bug signatures using discriminative graph mining. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA), 2009. 141–152

    Google Scholar 

  120. Zuo Z, Khoo S-C, Sun C. Efficient predicated bug signature mining via hierarchical instrumentation. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA), 2014. 215–224

    Google Scholar 

  121. El Emam K, Melo W, Machado J C. The prediction of faulty classes using object-oriented design metrics. J Syst Softw, 2001, 56: 63–75

    Article  Google Scholar 

  122. Marcus A, Poshyvanyk D, Ferenc R. Using the conceptual cohesion of classes for fault prediction in object-oriented systems. IEEE Trans Softw Eng, 2008, 34: 287–300

    Article  Google Scholar 

  123. Nagappan N, Ball T, Zeller A. Mining metrics to predict component failures. In: Proceedings of the International Conference on Software Engineering (ICSE), 2006. 452–461

    Chapter  Google Scholar 

  124. Rahman F, Posnett D, Hindle A, et al. Bugcache for inspections: hit or miss? In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, 2011. 322–331

    Chapter  Google Scholar 

  125. Hayes J H, Dekhtyar A, Osborne J. Improving requirements tracing via information retrieval. In: Proceedings of 11th IEEE International Requirements Engineering Conference, 2003. 138–147

    Google Scholar 

  126. Williams C C, Hollingsworth J K. Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans Softw Eng, 2005, 31: 466–480

    Article  Google Scholar 

  127. Last M, Friedman M, Kandel A. The data mining approach to automated software testing. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. 388–396

    Google Scholar 

  128. Podgurski A, Leon D, Francis P, et al. Automated support for classifying software failure reports. In: Proceedings of the 25th International Conference on Software Engineering (ICSE), 2003. 465–475

    Google Scholar 

  129. Hindle A, German D M, Holt R. What do large commits tell us? a taxonomical study of large commits. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, 2008. 99–108

    Chapter  Google Scholar 

  130. Menzies T, Di Stefano J S. More success and failure factors in software reuse. IEEE Trans Softw Eng, 2003, 29: 11474–477

    Article  Google Scholar 

Download references

Acknowledgements

This work was sponsored by National Key R&D Program of China (Grant No. 2018YFC0830500), National Nature Science Foundation of China (Grant No. 61572313), and Science and Technology Commission of Shanghai Municipality (Grant No. 15DZ1100305). We appreciated the anonymous reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhong.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhong, H., Mei, H. Learning a graph-based classifier for fault localization. Sci. China Inf. Sci. 63, 162101 (2020). https://doi.org/10.1007/s11432-019-2720-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-019-2720-1

Keywords

  • fault classifier
  • partial code analysis
  • bug fix analysis