Skip to main content

A Survey of Software Defect Prediction Based on Deep Learning

Abstract

Software defect prediction (SDP) is a procedure to develop a model that can be utilized by software practitioners and researchers in the initial phases of the software development life cycle (SDLC) for distinguishing defective modules or classes. With the increase in software complexity, defect prediction (DP) has become one of the software industry's essential processes. Thus, for the past two decades, researchers have taken an increasing interest in SDP's problem. Several techniques have been used in the past for SDP. This paper systematically investigates the literature from the last six years (2015–2020) that used deep learning (DL) techniques for SDP. The functional capabilities of different DL techniques and their pros and cons are evaluated for SDP. An extensive comparative study of DL techniques for file-level and change-level SDP is also performed. The challenges and the issues in SDP in the context of DL techniques are highlighted. The extensive comparative analysis of benchmark open-source projects shows that the DL techniques achieve better and more significant results than machine learning approaches. However, the utilization of the DL techniques in SDP is inadequate, and a more significant number of investigations should be carried out to acquire well-formed and generalizable results.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. http://openscience.us/repo/defect/

  2. https://www.codechef.com/problems/(problem-name).

References

  1. Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25:675–689. https://doi.org/10.1109/32.815326

    Article  Google Scholar 

  2. Arora I, Tetarwal V, Saha A (2015) Open issues in software defect prediction. Proc Comput Sci 46:906–912. https://doi.org/10.1016/j.procs.2015.02.161

    Article  Google Scholar 

  3. Hall T, Beecham S, Bowes D (2012) systematic literature review on fault prediction performance in software engineering Gray, S. Counsell, A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Trans Softw Eng 38:1276–1304. https://doi.org/10.1109/TSE.2011.103

    Article  Google Scholar 

  4. Nair V, Menzies T, Siegmund N, Apel S (2017) Using bad learners to find good configurations. In: Proceedings of 2017 11th Jt. Meet. Found. Softw. Eng., ACM, New York, USA. pp 257–267. https://doi.org/10.1145/3106237.3106238

  5. Rees-Jones M, Martin M, Menzies T (2017) Better Predictors for Issue Lifetimeac. http://arxiv.org/abs/1702.07735

  6. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527

    MathSciNet  Article  MATH  Google Scholar 

  7. Ravi D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang G-Z (2017) Deep learning for health informatics. IEEE J Biomed Heal Informatics 21:4–21. https://doi.org/10.1109/JBHI.2016.2636665

    Article  Google Scholar 

  8. He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25:2529–2541. https://doi.org/10.1109/TIP.2016.2547588

    MathSciNet  Article  MATH  Google Scholar 

  9. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116:1–20. https://doi.org/10.1007/s11263-015-0823-z

    MathSciNet  Article  Google Scholar 

  10. Li J, He P, Zhu J, Lyu MR, Predictionvia Convolutional Neural Network SD (2017) IEEE Int Conf Softw Qual Reliab Secur. IEEE 2017:318–328. https://doi.org/10.1109/QRS.2017.42

    Article  Google Scholar 

  11. Turabieh H, Mafarja M, Li X (2019) Iterated feature selection algorithms with layered recurrent neural network for software fault prediction. Expert Syst Appl 122:27–42. https://doi.org/10.1016/j.eswa.2018.12.033

    Article  Google Scholar 

  12. Hinton GE (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507. https://doi.org/10.1126/science.1127647

    MathSciNet  Article  MATH  Google Scholar 

  13. Sun S, Cao Z, Zhu H, Zhao J (2020) A survey of optimization methods from a machine learning perspective. IEEE Trans Cybern 50:3668–3681. https://doi.org/10.1109/TCYB.2019.2950779

    Article  Google Scholar 

  14. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 2007:153–160

    Google Scholar 

  15. Yu T, Wen W, Han X, Hayes J (2018) ConPredictor: concurrency defect prediction in real-world applications. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2018.2791521

    Article  Google Scholar 

  16. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, New York, pp 1–767

    MATH  Google Scholar 

  17. Yan M, Fang Y, Lo D, Xia X, Zhang X (2017) File-level defect prediction: unsupervised vs supervised models. Int Symp Empir Softw Eng Meas 2017:344–353. https://doi.org/10.1109/ESEM.2017.48

    Article  Google Scholar 

  18. D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. Proc Int Conf Softw Eng. https://doi.org/10.1109/MSR.2010.5463279

    Article  Google Scholar 

  19. Viet Phan A, Le Nguyen M, Thu Bui L (2017) Convolutional neural networks over control flow graphs for software defect prediction. In: 2017 IEEE 29th international conference on tools with artificial intelligence, IEEE, 2017, pp 45–52

  20. Phan AV, Le Nguyen M (2017) Convolutional neural networks on assembly code for predicting software defects. In: 2017 21st Asia Pacific symposium in intelligence and evolutionary system, IEEE, 2017, pp 37–42.https://doi.org/10.1109/IESYS.2017.8233558

  21. Manjula C, Florence L (2018) Deep neural network based hybrid approach for software defect prediction using software metrics. Clust Comput. https://doi.org/10.1007/s10586-018-1696-z

    Article  Google Scholar 

  22. Zhao L, Shang Z, Zhao L, Qin A, Tang YY (2019) Siamese dense neural network for software defect prediction with small data. IEEE Access 7:7663–7677. https://doi.org/10.1109/ACCESS.2018.2889061

    Article  Google Scholar 

  23. Zhao L, Shang Z, Zhao L, Zhang T, Tang YY (2019) Software defect prediction via cost-sensitive Siamese parallel fully-connected neural networks. Neurocomputing 352:64–74. https://doi.org/10.1016/j.neucom.2019.03.076

    Article  Google Scholar 

  24. Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111. https://doi.org/10.1016/j.infsof.2017.11.008

    Article  Google Scholar 

  25. Pandey SK, Mishra RB, Tripathi AK (2020) BPDET: an effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl 144:113085. https://doi.org/10.1016/j.eswa.2019.113085

    Article  Google Scholar 

  26. Hua W, Chun S, Hu C, Yu Z, Xiao Y (2019) Software Defect Prediction via Deep Belief Network 28:5–12. https://doi.org/10.1049/cje.2019.06.012

    Article  Google Scholar 

  27. Tran HD, Hanh LTM, Binh NT (2019) Combining feature selection, feature learning and ensemble learning for software fault prediction, Proc 11th Int Conf Knowl Syst Eng KSE. https://doi.org/10.1109/KSE.2019.8919292.

  28. Zhou T, Sun X, Xia X, Li B, Chen X (2019) Improving defect prediction with deep forest. Inf Softw Technol 114:204–216. https://doi.org/10.1016/j.infsof.2019.07.003

    Article  Google Scholar 

  29. Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C-J, Learned L, from Using a Deep Tree-Based Model for Software Defect Prediction in Practice, in, (2019) IEEE/ACM 16th Int. Conf. Min. Softw. Repos. IEEE 2019:46–57. https://doi.org/10.1109/MSR.2019.00017

    Article  Google Scholar 

  30. Wang S, Liu T, Nam J, Tan L (2018) Deep semantic feature learning for software defect prediction. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2018.2877612

    Article  Google Scholar 

  31. Humphreys J, Dam HK, Model AED, for Defect Prediction, in, (2019) IEEE/ACM 7th Int. Work. Realiz. Artif. Intell. Synerg. Softw. Eng. IEEE 2019:49–55. https://doi.org/10.1109/RAISE.2019.00016

    Article  Google Scholar 

  32. Xu Z, Li S, Xu J, Liu J, Luo X, Zhang Y, Zhang T, Keung J, Tang Y (2019) LDFR: Learning deep feature representation for software defect prediction. J Syst Softw 158:110402. https://doi.org/10.1016/j.jss.2019.110402

    Article  Google Scholar 

  33. Pan C, Lu M, Xu B, Gao H (2019) An improved CNN model for within-project software defect prediction. Appl Sci 9:1–28. https://doi.org/10.3390/app9102138

    Article  Google Scholar 

  34. Cai Z, Lu L, Qiu S (2019) An abstract syntax tree encoding method for cross-project defect prediction. IEEE Access 7:170844–170853. https://doi.org/10.1109/ACCESS.2019.2953696

    Article  Google Scholar 

  35. Qiu S, Lu L, Cai Z, Jiang S (2019) Cross-project defect prediction via transferable deep learning-generated and handcrafted features. Proc Int Conf Softw Eng Knowl Eng SEKE, pp 431–436. https://doi.org/10.18293/SEKE2019-070

  36. Chen D, Chen X, Li H, Xie J, Mu Y (2019) DeepCPDP: deep learning based cross-project defect prediction. IEEE Access 7:184832–184848. https://doi.org/10.1109/ACCESS.2019.2961129

    Article  Google Scholar 

  37. Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C-J (2018) A deep tree-based model for software defect prediction, ArXiv Prepr. ArXiv1802.00921. arXiv:1802.00921v1

  38. Pandey SK, Tripathi AK (2020) BCV-predictor: a bug count vector predictor of a successive version of the software system. Knowl-Based Syst 197:105924. https://doi.org/10.1016/j.knosys.2020.105924

    Article  Google Scholar 

  39. Pan C, Lu M, Xu B (2021) An empirical study on software defect prediction using codeBERT model. Appl Sci 11:4793. https://doi.org/10.3390/app11114793

    Article  Google Scholar 

  40. Qiu S, Xu H, Deng J, Jiang S, Lu L (2019) Transfer convolutional neural network for cross-project defect prediction. Appl Sci 9:2660. https://doi.org/10.3390/app9132660

    Article  Google Scholar 

  41. Liang H, Yu Y, Jiang L, Xie Z (2019) Seml: a semantic LSTM model for software defect prediction. IEEE Access 7:83812–83824. https://doi.org/10.1109/ACCESS.2019.2925313

    Article  Google Scholar 

  42. Shi K, Lu Y, Chang J, Wei Z (2020) PathPair2Vec: an AST path pair-based code representation method for defect prediction. J Comput Lang 59:100979. https://doi.org/10.1016/j.cola.2020.100979

    Article  Google Scholar 

  43. Zhu K, Zhang N, Ying S, Zhu D (2020) Within-project and cross-project just-in-time defect prediction based on denoising autoencoder and convolutional neural network. IET Softw 14:185–195. https://doi.org/10.1049/iet-sen.2019.0278

    Article  Google Scholar 

  44. Qiu Y, Liu Y, Liu A, Zhu J, Xu J (2019) Automatic feature exploration and an application in defect prediction. IEEE Access 7:112097–112112. https://doi.org/10.1109/ACCESS.2019.2934530

    Article  Google Scholar 

  45. Yang X, Lo D, Xia X, Zhang Y, Sun J, Learning D, for Just-in-Time Defect Prediction, in, (2015) IEEE Int Conf Softw Qual Reliab Secur. IEEE 2015:17–26. https://doi.org/10.1109/QRS.2015.14

    Article  Google Scholar 

  46. Liang H, Sun L, Wang M, Yang Y (2019) Deep learning with customized abstract syntax tree for bug localization. IEEE Access 7:116309–116320. https://doi.org/10.1109/access.2019.2936948

    Article  Google Scholar 

  47. Majd A, Vahidi-Asl M, Khalilian A, Poorsarvi-Tehrani P, Haghighi H (2020) SLDeep: statement-level software defect prediction using deep-learning model on static code features. Expert Syst Appl 147:113156. https://doi.org/10.1016/j.eswa.2019.113156

    Article  Google Scholar 

  48. Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) HYDRA: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42:977–998. https://doi.org/10.1109/TSE.2016.2543218

    Article  Google Scholar 

  49. Nevendra M, Singh P (2021) Defect count prediction via metric-based convolutional neural network. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06158-5

    Article  Google Scholar 

  50. Provost F, Fawcett T (1997) Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions. Proc 3rd Int Conf Knowl Discov Data Min 1997:43–48

    Google Scholar 

  51. Ma S, Liu Y, Lee WC, Zhang X, Grama A, MODE: Automated neural network model debugging via state differential analysis and input selection, ESEC, FSE, (2018) Proc. 2018 26th ACM Jt. Meet Eur Softw Eng Conf Symp Found Softw Eng 2018:175–186. https://doi.org/10.1145/3236024.3236082

    Article  Google Scholar 

  52. White M, Vendome C, Linares-Vásquez M, Poshyvanyk D (2015) Toward deep learning software repositories. IEEE Int Work Conf Min Softw Repos. https://doi.org/10.1109/MSR.2015.38

    Article  Google Scholar 

  53. Huo X, Li M, Zhou Z (2016) Learning unified features from natural and programming languages for locating buggy source code. IJCAI Int Jt Conf Artif Intell 16:1606–1612

    Google Scholar 

  54. Balog M, Gaunt AL, Brockschmidt M, Nowozin S, Tarlow D (2016) DeepCoder: learning to write programs, 5th Int Conf Learn Represent. ICLR 2017 Conf Track Proc. http://arxiv.org/abs/1611.01989

  55. Jana S, Tian Y, Pei K, Ray B (2018) DeepTest: automated testing of deep-neural-network-driven autonomous cars. Proc Intl Conf Softw Eng 2018:303–314. https://doi.org/10.1145/3180155.3180220

    Article  Google Scholar 

  56. Hellendoorn VJ, Bird C, Barr ET, Allamanis M, Deep learning type inference, ESEC, FSE, (2018) Proc. 2018 26th ACM Jt. Meet Eur Softw Eng Conf Symp Found Softw Eng 2018:152–162. https://doi.org/10.1145/3236024.3236051

    Article  Google Scholar 

  57. Zhao G, Huang J, DeepSim: Deep learning code functional similarity, ESEC, FSE, (2018) Proc. 2018 26th ACM Jt. Meet Eur Softw Eng Conf Symp Found Softw Eng 2018:141–151. https://doi.org/10.1145/3236024.3236068

    Article  Google Scholar 

  58. Henkel J, Lahiri SK, Liblit B, Reps T, Code vectors: Understanding programs through embedded abstracted symbolic traces, ESEC, FSE, (2018) Proc. 2018 26th ACM Jt. Meet Eur Softw Eng Conf Symp Found Softw Eng 2018:163–174. https://doi.org/10.1145/3236024.3236085

    Article  Google Scholar 

  59. Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2016) Combining deep learning with information retrieval to localize buggy files for bug reports. Proc 2015 30th IEEE/ACM Intl Conf Autom Softw Eng ASE 2015:476–481. https://doi.org/10.1109/ASE.2015.73

    Article  Google Scholar 

  60. Xu B, Ye D, Xing Z, Xia X, Chen G, Li S, Predicting semantically linkable knowledge in developer online forums via convolutional neural network, ASE, (2016) Proc. 31st IEEE/ACM Int. Conf Autom Softw Eng 2016:51–62. https://doi.org/10.1145/2970276.2970357

    Article  Google Scholar 

  61. Gu X, Zhang H, Kim S (2018) Deep code search. Proc Int Conf Softw Eng. https://doi.org/10.1145/3180155.3180167

    Article  Google Scholar 

  62. Ding Z, Xing L (2020) Improved software defect prediction using Pruned Histogram-based isolation forest. Reliab Eng Syst Saf 204:107170. https://doi.org/10.1016/j.ress.2020.107170

    Article  Google Scholar 

  63. Arar ÖF, Ayan K (2017) A feature dependent Naive Bayes approach and its application to the software defect prediction problem. Appl Soft Comput 59:197–209. https://doi.org/10.1016/j.asoc.2017.05.043

    Article  Google Scholar 

  64. Singh P, Pal NR, Verma S, Vyas OP (2017) Fuzzy rule-based approach for software fault prediction. IEEE Trans Syst Man Cybern Syst 47:826–837. https://doi.org/10.1109/TSMC.2016.2521840

    Article  Google Scholar 

  65. Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced? Bias in Bug-Fix Datasets, Proc 7th Jt Meet Eur Softw Eng Conf ACM SIGSOFT Symp Found Softw Eng Eur Softw Eng Conf Found Softw Eng Symp E, ACM Press, New York, USA. p 121. https://doi.org/10.1145/1595696.1595716

  66. Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings on 11th working conference Min Softw Repos - MSR 2014, ACM Press, New York, pp 172–181. https://doi.org/10.1145/2597073.2597075

  67. Mittal S, Vaishay S (2019) A survey of techniques for optimizing deep learning on GPUs. J Syst Archit 99:101635. https://doi.org/10.1016/j.sysarc.2019.101635

    Article  Google Scholar 

  68. Wang Y, Zhang D, Liu Y, Dai B, Lee LH (2019) Enhancing transportation systems via deep learning: a survey. Transp Res Part C Emerg Technol 99:144–163. https://doi.org/10.1016/j.trc.2018.12.004

    Article  Google Scholar 

  69. Jason Dai, BigDL (2016). https://github.com/intel-analytics/BigDL

  70. Berkeley Vision and Learning Center, Caffe (2013). https://github.com/BVLC/caffe

  71. Networks P, Chainer (2015). https://github.com/chainer/chainer

  72. Skymind engineering team, Deeplearning4j (2014). https://github.com/eclipse/deeplearning4j

  73. Davis King, Dlib (2002). https://github.com/davisking/dlib

  74. Mike Innes, Flux (2017). https://github.com/FluxML/Flux.jl

  75. Intel, Intel Data Analytics Acceleration Library (2015). https://software.intel.com/en-us/daal

  76. Intel, Intel Math Kernel Library (n.d.). https://software.intel.com/mkl

  77. Chollet F (2015) Keras: Deep Learning library for Theano and TensorFlow. GitHub Repos 7:1–21

    Google Scholar 

  78. MathWorks, MATLAB (n.d.). https://www.mathworks.com/products/deep-learning.html

  79. Microsoft Research, Microsoft Cognitive Toolkit (2016). https://github.com/Microsoft/CNTK

  80. Apache Software Foundation, Apache MXNet (2015). https://github.com/apache/incubator-mxnet

  81. Artelnics, Neural Designer (n.d.). https://www.neuraldesigner.com/

  82. Artelnics, OpenNN (2003). https://github.com/Artelnics/OpenNN

  83. Vertex iAI, Plaid ML (2017). https://github.com/plaidml/plaidml

  84. Paszke A, Gross S, Chintala S, Chanan G, PyTorch (2016). https://github.com/pytorch/pytorch

  85. Apache Software Foundation, Apache SINGA (2015). http://singa.apache.org/

  86. Google Brain, TensorFlow (2015). https://github.com/tensorflow/tensorflow

  87. Université de Montréal, Theano (2007). https://github.com/Theano/Theano

  88. Collobert R, Bengio S, Mariéthoz J, Torch (2002). https://github.com/torch/torch7

  89. Wolfram Research, Wolfram Mathematica (n.d.) 1988. https://www.wolfram.com/mathematica/

  90. Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33:102–106. https://doi.org/10.1109/MS.2016.156

    Article  Google Scholar 

  91. Kim S, Whitehead EJ (2006) How long did it take to fix bugs? In: Proceedings of 2006 international work. Min. Softw. Repos. - MSR ’06, ACM Press, New York, USA, p 173. https://doi.org/10.1145/1137983.1138027

  92. Hosseini S, Turhan B, Gunarathna D (2019) A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans Softw Eng 45:111–147. https://doi.org/10.1109/TSE.2017.2770124

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meetesh Nevendra.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nevendra, M., Singh, P. A Survey of Software Defect Prediction Based on Deep Learning. Arch Computat Methods Eng (2022). https://doi.org/10.1007/s11831-022-09787-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11831-022-09787-8