Automatic bug localization using a combination of deep learning and model transformation through node classification

Yousofvand, Leila; Soleimani, Seyfollah; Rafe, Vahid

doi:10.1007/s11219-023-09625-5

Automatic bug localization using a combination of deep learning and model transformation through node classification

Published: 24 March 2023

Volume 31, pages 1045–1063, (2023)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Leila Yousofvand¹,
Seyfollah Soleimani¹ &
Vahid Rafe^1,2

389 Accesses
1 Citation
Explore all metrics

Abstract

Bug localization is the task of automatically locating suspicious commands in the source code. Many automated bug localization approaches have been proposed for reducing costs and speeding up the bug localization process. These approaches allow developers to focus on critical commands. In this paper, we propose to treat the bug localization problem as a node classification problem. As in the existing training sets, where whole graphs are labeled as buggy and bug-free, it is required first to label all nodes in each graph. To do this, we use the Gumtree algorithm, which labels the nodes by comparing the buggy graphs with their corresponding fixed graphs. In classification, we propose to use a type of graph neural networks (GNNs), GraphSAGE. The used dataset for training and testing is JavaScript buggy code and their corresponding fixed code. The results demonstrate that the proposed method outperforms other related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Two-Phase Bug Localization Approach Based on Multi-layer Perceptrons and Distributional Features

BugPre: an intelligent software version-to-version bug prediction system using graph convolutional neural networks

Article Open access 27 August 2022

The flowing nature matters: feature learning from the control flow graph of source code for bug localization

Article 17 February 2022

Data availability

Dataset for this research is included in Dinella et al. (2020).

References

Abreu, R., & van Gemund, A. J. C. (2009). A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis. In Proceedings of the Eight Symposium on Abstraction, Reformulation, and Approximation.
Abreu, R., Zoeteweij, P. & van Gemund, A. J. C. (2007). On the accuracy of spectrum-based fault localization. In Academic and Industrial Conference Practice and Research Techniques -Mutation (Taicpart-mutation).
Agrawal, H., De Millo, R. A., & Spafford, E. (1991). An execution backtracking approach to program debugging. IEEE Software, 8(5).
Agarwal, P., & Agrawal, A. (2014). Fault-localization techniques for software systems: A literature review. In SIGSOFT Software Engineering Notes.
Allamanis, M., Brockschmidt, M., & Khademi, M. (2018). Learning to represent programs with graphs. In International Conference on Learning Representations(ICLR).
Ascari, L. C., Araki, L. Y., Pozo, A. R., & Vergilio, S. R. (2009). Exploring machine learning techniques for fault localization. In Proceedings of 10th Latin American Test Workshop.
Baah, G. K., Podgurski, A., & Harrold, M. J. (2010). The probabilistic program dependence graph and its application to fault diagnosis. IEEE Transactions on Software Engineering, 36(4).
Chen, M., Kiciman, E., Fratkin, E., Fox, A., & Brewer, E. (2002). Pinpoint: Problem determination in large, dynamic internet services. In International Conference on Dependable Systems and Networks (DSN).
DiGiuseppe, N., & Jones, J. A. (2011). On the influence of multiple faults on coverage-based fault localization. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA).
Dinella, E., Dai, H., Li, Z., Naik, M., Song, L., & Wang, K. (2020). Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR).
Falleri, J. R., Morandat, F., Blanc, X., Martinez, M., & Monperrus, M. (2014). Fine-grained and accurate source code differencing. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering.
Gazzola, L., Micucci, D., & Mariani, L. (2017). Automatic software repair: A survey. IEEE Transactions on Software Engineering, 45(1).
Hao, D., Xie, T., Zhang, L., Wang, X., Sun, J., & Mei, H. (2012). Test input reduction for result inspection to facilitate fault localization. Automated Software Engineering.
Hamilton, W. L., Ying, R., & Leskovec, J. (2017). Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems.
Hovemeyer, D., & Pugh, W. (2004). Finding bugs is easy. In Companion to the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM.
https://github.com/AI-nstein/hoppity. [Online].
Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., & Leskovec, J. (2020). Open graph benchmark: Datasets for machine learning on graphs. In Thirty-fifth Annual Conference on Neural Information Processing Systems. NeurIPS.
Jensen, S. H., Møller, A., & Thiemann, P. (2009). Type analysis for javascript. In Proceedings of the 16th International Symposium on Static Analysis.
Jones, J. A., & Harrold, M. J. (2005). Empirical evaluation of the Tarantula automatic fault-localization technique. In International Conference on Automated Software Engineering (ASE).
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In The International Conference on Learning Representations (ICLR).
Korel, B. (1998). PELAS – program error-locating assistant system. IEEE Transactions on Software Engineering, 14(9).
Kim, D., Tao, Y., Kim, S., & Zeller, A. (2013). Where should we fix this bug? A two-phase recommendation model. IEEE Transactions on Software Engineering, 39(11).
Lee, C.-C., Chung, P.-C., Tsai, J.-R., & Chang, C.-I. (1999). Robust radial basis function neural networks. IEEE Transactions on Systems, 29(6).
Le Goues, C. (2013). Automatic program repair using genetic programming. University of Virginia: Ph.D. dissertation.
Lukins, S. K., Kraft, N. A., & Etzkorn, L. H. (2012). Bug localization using latent Dirichlet allocation. Information and Software Technology, 52(9).
Mayer, W., & Stumptner, M. (2007). Model-based debugging: State of the art and future challenges. Electronic Notes in Theoretical Computer Science, 174(4).
Mateis, C., Stumptner, M., & Wotawa, F. (2000). Modeling Java Programs for Diagnosis. In Proceedings of European Conference on Artificial Intelligence.
Meyers, R. A. (2001). Encyclopedia of physical science and technology, third edition. Academic Press.
Mayer, W., & Stumptner, M. (2008). Evaluating models for model-based debugging. in Proceedings of ACM International Conference on Automated Software Engineering.
Mayer, W., Stumptner, M., Wieland, D., & Wotawa, F. (2002). Can AI help to improve debugging substantially? Debugging experiences with value-based models. In Proceedings of European Conference on Artificial Intelligence.
Naish, L., Lee, H., & Ramamohanarao, K. (2011). A model for spectra-based software diagnosis. Journal of the ACM Transactions on Software Engineering and Methodology, 20(3).
Paszke, A., Sam, G., Francisco, M., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library.
Pmd. (2021). PMD 6.41.0. https://pmd.github.io/. Accessed 22 January 2022.
Qiong, G., Xian-Ming, W., Zhao, W., Bing, N., & Chun-Sheng, X. (2016). An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification. Journal of Digital Information Management, 14(2).
Rao, S., & Kak, A. (2011). “Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In MSR.
Renieris, M., & Reiss, S. (2003). Fault localization with nearest neighbor queries. In Proceedings of International Conference on Automated Software Engineering.
Saha, R. K., Lease, M., Khurshid, S., & Perry, D. E. (2013). Improving bug localization using structured information retrieval. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE).
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 61–80.
Sisman, B., & Kak, A. C. (2012). Incorporating version histories in information retrieval based bug localization. In Proceedings of 9th IEEE Working Conference on Mining Software Repositories.
State of the octoverse. (2021). [Online]. Available: https://octoverse.github.com/#top-languages-over-the-years
Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). Classification of imbalancd data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(4).
Wang, S., Lo, D., & Lawall, J. (2014). Compositional vector space models for improved bug localization. In Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME).
Wang, Q., Parnin, C., & Orso, A. (2015). Evaluating the usefulness of IR-based fault localization techniques. In Proceedings of International Symposium on Software Testing and Analysis (ISSTA).
Wang, M., Zheng, D., Ye, Z., Gan, Q., Li, M., Song, X., Zhou, J., Ma, C., Yu, L., Gai, Y., Xiao, T., He, T., Karypis, G., Li, J., & Zhang, Z. (2020). Deep graph library: A graph-centric, highly-performant package for graph neural networks. In arXiv:1909.01315
Wong, W. E., Debroy, V., & Choi, B. (2010). A family of code coveragebased heuristics for effective fault. The Journal of Systems and Software (JSS), 83(2), 188–208.
Article Google Scholar
Wong, W. E., Debroy, V., & Xu, D. (2012). Towards better fault localization: A crosstab-based statistical approach. IEEE Trans, 42(3).
Wong, W. E., & Qi, Y. (2009). BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 19(4).
Wong, W. E., & Qi, Y. (2019). BP Neural Network-based Effective Fault Localization. International Journal of Software Engineering and Knowledge Engineering, 19(4).
Wotawa, F., Stumptner, M., & Mayer, W. (2002). Model-based debugging or how to diagnose programs automatically. In Proceedings of International Conference on Industrial and Engineering, Applications of Artificial Intelligence and Expert Systems.
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Yu, P. S. (2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 1–21.
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2018). Graph attention networks. In ICLR.
Vessey, I. (1985). Expertise in debugging computer programs: A process analysis. International Journal of Man-Machine Studies, 23(5).
Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. In Advances in Neural Information Processing Systems.
Zakas, N. C. (2013). ESLint. https://eslint.org/
Zhang, M., Cui, Z., Neumann, M., & Chen, Y. (2018). An end-to-end deep learning architecture for graph classification. In Proceedings of AAAI, Marion Neumann, and Yixin Chen.
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57–81.
Article Google Scholar
Zhong, H., & Mei, H. (2020). Learning a graph-based classifier for fault localization. Science China Information Sciences, 63.
Zhong, H., & Su, Z. (2015). An empirical study on real bug fixes. In Proceedings of the International Conference on Software Engineering (ICSE).
Zhao, T., Zhang, X., & Wang, S. (2021). GraphSMOTE: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the Fourteenth ACM International Conference on Web Search and Data Mining (WSDM ’21).

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Faculty of Engineering, Arak University, Arak, 38156-8-8349, Iran
Leila Yousofvand, Seyfollah Soleimani & Vahid Rafe
Department of Computing, Goldsmiths University of London, London, UK
Vahid Rafe

Authors

Leila Yousofvand
View author publications
You can also search for this author in PubMed Google Scholar
Seyfollah Soleimani
View author publications
You can also search for this author in PubMed Google Scholar
Vahid Rafe
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The initial idea was from V. R. The design, provision of resources, and data collection were performed by L. Y. Data analysis was done by L. Y., S. S., and V. R. The manuscript was written and revised by L. Y. and S. S.

Corresponding author

Correspondence to Seyfollah Soleimani.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yousofvand, L., Soleimani, S. & Rafe, V. Automatic bug localization using a combination of deep learning and model transformation through node classification. Software Qual J 31, 1045–1063 (2023). https://doi.org/10.1007/s11219-023-09625-5

Download citation

Accepted: 28 February 2023
Published: 24 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11219-023-09625-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic bug localization using a combination of deep learning and model transformation through node classification

Abstract

Access this article

Similar content being viewed by others

A Two-Phase Bug Localization Approach Based on Multi-layer Perceptrons and Distributional Features

BugPre: an intelligent software version-to-version bug prediction system using graph convolutional neural networks

The flowing nature matters: feature learning from the control flow graph of source code for bug localization

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic bug localization using a combination of deep learning and model transformation through node classification

Abstract

Access this article

Similar content being viewed by others

A Two-Phase Bug Localization Approach Based on Multi-layer Perceptrons and Distributional Features

BugPre: an intelligent software version-to-version bug prediction system using graph convolutional neural networks

The flowing nature matters: feature learning from the control flow graph of source code for bug localization

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation