Skip to main content
Log in

Automatic bug localization using a combination of deep learning and model transformation through node classification

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Bug localization is the task of automatically locating suspicious commands in the source code. Many automated bug localization approaches have been proposed for reducing costs and speeding up the bug localization process. These approaches allow developers to focus on critical commands. In this paper, we propose to treat the bug localization problem as a node classification problem. As in the existing training sets, where whole graphs are labeled as buggy and bug-free, it is required first to label all nodes in each graph. To do this, we use the Gumtree algorithm, which labels the nodes by comparing the buggy graphs with their corresponding fixed graphs. In classification, we propose to use a type of graph neural networks (GNNs), GraphSAGE. The used dataset for training and testing is JavaScript buggy code and their corresponding fixed code. The results demonstrate that the proposed method outperforms other related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

Dataset for this research is included in Dinella et al. (2020).

References

  • Abreu, R., & van Gemund, A. J. C. (2009). A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In Proceedings of the Eight Symposium on Abstraction, Reformulation, and Approximation.

  • Abreu, R., Zoeteweij, P. & van Gemund, A. J. C. (2007). On the accuracy of spectrum-based fault localization. In Academic and Industrial Conference Practice and Research Techniques -Mutation (Taicpart-mutation).

  • Agrawal, H., De Millo, R. A., & Spafford, E. (1991). An execution backtracking approach to program debugging. IEEE Software, 8(5).

  • Agarwal, P., & Agrawal, A. (2014). Fault-localization techniques for software systems: A literature review. In SIGSOFT Software Engineering Notes.

  • Allamanis, M., Brockschmidt, M., & Khademi, M. (2018). Learning to represent programs with graphs. In International Conference on Learning Representations(ICLR).

  • Ascari, L. C., Araki, L. Y., Pozo, A. R., & Vergilio, S. R. (2009). Exploring machine learning techniques for fault localization. In Proceedings of 10th Latin American Test Workshop.

  • Baah, G. K., Podgurski, A., & Harrold, M. J. (2010). The probabilistic program dependence graph and its application to fault diagnosis. IEEE Transactions on Software Engineering, 36(4).

  • Chen, M., Kiciman, E., Fratkin, E., Fox, A., & Brewer, E. (2002). Pinpoint: Problem determination in large, dynamic internet services. In International Conference on Dependable Systems and Networks (DSN).

  • DiGiuseppe, N., & Jones, J. A. (2011). On the influence of multiple faults on coverage-based fault localization. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA).

  • Dinella, E., Dai, H., Li, Z., Naik, M., Song, L., & Wang, K. (2020). Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR).

  • Falleri, J. R., Morandat, F., Blanc, X., Martinez, M., & Monperrus, M. (2014). Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering.

  • Gazzola, L., Micucci, D., & Mariani, L. (2017). Automatic software repair: A survey. IEEE Transactions on Software Engineering, 45(1).

  • Hao, D., Xie, T., Zhang, L., Wang, X., Sun, J., & Mei, H. (2012). Test input reduction for result inspection to facilitate fault localization. Automated Software Engineering.

  • Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems.

  • Hovemeyer, D., & Pugh, W. (2004). Finding bugs is easy. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM.

  • https://github.com/AI-nstein/hoppity. [Online].

  • Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., & Leskovec, J. (2020). Open graph benchmark: Datasets for machine learning on graphs. In Thirty-fifth Annual Conference on Neural Information Processing Systems. NeurIPS.

  • Jensen, S. H., Møller, A., & Thiemann, P. (2009). Type analysis for javascript. In Proceedings of the 16th International Symposium on Static Analysis.

  • Jones, J. A., & Harrold, M. J. (2005). Empirical evaluation of the Tarantula automatic fault-localization technique. In International Conference on Automated Software Engineering (ASE).

  • Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In The International Conference on Learning Representations (ICLR).

  • Korel, B. (1998). PELAS – program error-locating assistant system. IEEE Transactions on Software Engineering, 14(9).

  • Kim, D., Tao, Y., Kim, S., & Zeller, A. (2013). Where should we fix this bug? A two-phase recommendation model. IEEE Transactions on Software Engineering, 39(11).

  • Lee, C.-C., Chung, P.-C., Tsai, J.-R., & Chang, C.-I. (1999). Robust radial basis function neural networks. IEEE Transactions on Systems, 29(6).

  • Le Goues, C. (2013). Automatic program repair using genetic programming. University of Virginia: Ph.D. dissertation.

  • Lukins, S. K., Kraft, N. A., & Etzkorn, L. H. (2012). Bug localization using latent Dirichlet allocation. Information and Software Technology, 52(9).

  • Mayer, W., & Stumptner, M. (2007). Model-based debugging: State of the art and future challenges. Electronic Notes in Theoretical Computer Science, 174(4).

  • Mateis, C., Stumptner, M., & Wotawa, F. (2000). Modeling Java Programs for Diagnosis. In Proceedings of European Conference on Artificial Intelligence.

  • Meyers, R. A. (2001). Encyclopedia of physical science and technology, third edition. Academic Press.

  • Mayer, W., & Stumptner, M. (2008). Evaluating models for model-based debugging. in Proceedings of ACM International Conference on Automated Software Engineering.

  • Mayer, W., Stumptner, M., Wieland, D., & Wotawa, F. (2002). Can AI help to improve debugging substantially? Debugging experiences with value-based models. In Proceedings of European Conference on Artificial Intelligence.

  • Naish, L., Lee, H., & Ramamohanarao, K. (2011). A model for spectra-based software diagnosis. Journal of the ACM Transactions on Software Engineering and Methodology, 20(3).

  • Paszke, A., Sam, G., Francisco, M., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library.

  • Pmd. (2021). PMD 6.41.0.  https://pmd.github.io/.  Accessed 22 January 2022.

  • Qiong, G., Xian-Ming, W., Zhao, W., Bing, N., & Chun-Sheng, X. (2016). An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification. Journal of Digital Information Management, 14(2).

  • Rao, S., & Kak, A. (2011). “Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In MSR.

  • Renieris, M., & Reiss, S. (2003). Fault localization with nearest neighbor queries. In Proceedings of International Conference on Automated Software Engineering.

  • Saha, R. K., Lease, M., Khurshid, S., & Perry, D. E. (2013). Improving bug localization using structured information retrieval. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE).

  • Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 61–80.

  • Sisman, B., & Kak, A. C. (2012). Incorporating version histories in information retrieval based bug localization. In Proceedings of 9th IEEE Working Conference on Mining Software Repositories.

  • State of the octoverse. (2021). [Online]. Available: https://octoverse.github.com/#top-languages-over-the-years

  • Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). Classification of imbalancd data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4).

  • Wang, S., Lo, D., & Lawall, J. (2014). Compositional vector space models for improved bug localization. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME).

  • Wang, Q., Parnin, C., & Orso, A. (2015). Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of International Symposium on Software Testing and Analysis (ISSTA).

  • Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y., Xiao, T., He, T., Karypis, G., Li, J., & Zhang, Z. (2020). Deep graph library: A graph-centric, highly-performant package for graph neural networks. In arXiv:1909.01315

  • Wong, W. E., Debroy, V., & Choi, B. (2010). A family of code coveragebased heuristics for effective fault. The Journal of Systems and Software (JSS), 83(2), 188–208.

    Article  Google Scholar 

  • Wong, W. E., Debroy, V., & Xu, D. (2012). Towards better fault localization: A crosstab-based statistical approach. IEEE Trans, 42(3).

  • Wong, W. E., & Qi, Y. (2009). BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 19(4).

  • Wong, W. E., & Qi, Y. (2019). BP Neural Network-based Effective Fault Localization. International Journal of Software Engineering and Knowledge Engineering, 19(4).

  • Wotawa, F., Stumptner, M., & Mayer, W. (2002). Model-based debugging or how to diagnose programs automatically. In Proceedings of International Conference on Industrial and Engineering, Applications of Artificial Intelligence and Expert Systems.

  • Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 1–21.

  • Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2018). Graph attention networks. In ICLR.

  • Vessey, I. (1985). Expertise in debugging computer programs: A process analysis. International Journal of Man-Machine Studies, 23(5).

  • Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Advances in Neural Information Processing Systems.

  • Zakas, N. C. (2013). ESLint. https://eslint.org/

  • Zhang, M., Cui, Z., Neumann, M., & Chen, Y. (2018). An end-to-end deep learning architecture for graph classification. In Proceedings of AAAI, Marion Neumann, and Yixin Chen.

  • Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81.

    Article  Google Scholar 

  • Zhong, H., & Mei, H. (2020). Learning a graph-based classifier for fault localization. Science China Information Sciences, 63.

  • Zhong, H., & Su, Z. (2015). An empirical study on real bug fixes. In Proceedings of the International Conference on Software Engineering (ICSE).

  • Zhao, T., Zhang, X., & Wang, S. (2021). GraphSMOTE: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the Fourteenth ACM International Conference on Web Search and Data Mining (WSDM ’21).

Download references

Author information

Authors and Affiliations

Authors

Contributions

The initial idea was from V. R. The design, provision of resources, and data collection were performed by L. Y. Data analysis was done by L. Y., S. S., and V. R. The manuscript was written and revised by L. Y. and S. S.

Corresponding author

Correspondence to Seyfollah Soleimani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yousofvand, L., Soleimani, S. & Rafe, V. Automatic bug localization using a combination of deep learning and model transformation through node classification. Software Qual J 31, 1045–1063 (2023). https://doi.org/10.1007/s11219-023-09625-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-023-09625-5

Keywords

Navigation