Skip to main content
Log in

Linear and non-linear bayesian regression methods for software fault prediction

  • Original article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Faults are most likely to occur during the coding phase of software development. If, before the testing process, we can predict parts of code that are more prone to faults, then a large amount of time, software cost could be saved, and the software’s overall quality could be improved. Various researchers have previously attempted to predict software faults using numerous machine learning techniques in order to identify whether software modules are fault-prone or not. Ranking the software modules based on their fault content has rarely been explored before. Additionally, Bayesian methods have not been explored before for this task. We aim to investigate both linear and non-linear Bayesian regression methods for software fault prediction in this work. We develop and evaluate fault prediction models for two scenarios: intra-release prediction and cross-release prediction. The experimental investigation is conducted on 46 different software project versions. We use mean absolute error, and root means square error, and fault percentage average as performance measures. The results showed that Bayesian NLR outperformed linear regression and other used machine learning approaches or produced at least comparable performance. Bayesian linear regression method performed moderately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and material

Not applicable

References

  • Abbasimehr H, Paki R (2021) Improving time series forecasting using lstm and attention models. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02761-x

    Article  MATH  Google Scholar 

  • Ali KS, Sampath P (2021) Sparse bayesian learning kalman filter-based channel estimation for hybrid millimeter wave mimo systems: a frequency domain approach. IETE J Res. https://doi.org/10.1080/03772063.2021.1951367

    Article  Google Scholar 

  • Al-Jamimi HA, Ghouti L (2011) Efficient prediction of software fault proneness modules using support vector machines and probabilistic neural networks. In: malaysian conference in software engineering. IEEE 2011:251–256

  • Altland HW (1999) Regression analysis: statistical modeling of a response variable

  • Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to mcmc for machine learning. Mach Learn 50(1–2):5–43

    MATH  Google Scholar 

  • Boehm B, Basili VR (2001) Software defect reduction top 10 list. Computer 34(1):135–137. https://doi.org/10.1109/2.962984

    Article  Google Scholar 

  • Brassington G (2017) Mean absolute error and root mean square error: which is the better metric for assessing model performance?. EGUGA, p 3574

  • Bromiley P (2003) Products and convolutions of gaussian probability density functions. Tina-Vision Memo 3(4):1

    Google Scholar 

  • Chatterjee S, Maji B (2018) A bayesian belief network based model for predicting software faults in early phase of software development process. Appl Intell 48(8):2214–2228

    Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    MATH  Google Scholar 

  • Chen X, Zhang D, Cui Z-Q, Gu Q, Ju X-L (2019) Dp-share: Privacy-preserving software defect prediction model sharing through differential privacy. J Comput Sci Technol 34(5):1020–1038

    Google Scholar 

  • Chen X, Zhang D, Zhao Y, Cui Z, Ni C (2019) Software defect number prediction: unsupervised vs supervised methods. Inf Softw Technol 106:161–181

    Google Scholar 

  • Chen J, Ma S, Wu Y (2021) International carbon financial market prediction using particle swarm optimization and support vector machine. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03240-7

    Article  Google Scholar 

  • Chu W, Ghahramani Z (2005) Gaussian processes for ordinal regression. J Mach Learn Res 6:1019–1041

    MathSciNet  MATH  Google Scholar 

  • Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 3rd international symposium on empirical software engineering and measurement. IEEE 2009:460–463

  • Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C-J (2018) A deep tree-based model for software defect prediction. arXiv preprintarXiv:1802.00921

  • Dejaeger K, Verbraken T, Baesens B (2012) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Software Eng 39(2):237–257

    Google Scholar 

  • Dellaportas P, Forster JJ, Ntzoufras I (2002) On bayesian model and variable selection using mcmc. Stat Comput 12(1):27–36

    MathSciNet  MATH  Google Scholar 

  • Erturk E, Sezer EA (2015) A comparison of some soft computing methods for software fault prediction. Expert Syst Appl 42(4):1872–1879

    Google Scholar 

  • Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using bayesian nets. Inf Softw Technol 49(1):32–43

    Google Scholar 

  • Goseva-Popstojanova K, Ahmad M, Alshehri Y (2019) Software fault proneness prediction with group lasso regression: On factors that affect classification performance. In: IEEE 43rd annual computer software and applications conference (COMPSAC), vol. 2. IEEE 2019:336–343

  • Harris T, Hardin JW (2013) Exact wilcoxon signed-rank and wilcoxon mann-whitney ranksum tests. Stand Genomic Sci 13(2):337–343

    Google Scholar 

  • Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 international conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257

  • Jakkula V (2006) Tutorial on support vector machine (svm). Washington State University, School of EECS

    Google Scholar 

  • Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595

    Google Scholar 

  • Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: 2013 28th IEEE/ACM international conference on automated software engineering (ASE). Ieee, pp. 279–289

  • Jiarpakdee J, Tantithamthavorn C, Dam HK, Grundy J (2020) An empirical study of model-agnostic techniques for defect prediction models. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2020.2982385

    Article  Google Scholar 

  • Jin C (2021) Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst Appl 171:114637

    Google Scholar 

  • Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: proceedings of the 6th international conference on predictive models in software engineering, pp 1–10

  • Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: proceedings the eighth international symposium on software reliability engineering. IEEE, pp 27–35

  • Kumar S, Rathore SS (2018) Software fault prediction: a road map. Springer, Berlin

    Google Scholar 

  • Kumar P, Singh S (2016) Defect prediction model for aop-based software development using hybrid fuzzy c-means with genetic algorithm and k-nearest neighbors classifier. Int J Appl Inform Syst (IJAIS) Found Comput Sci, New York, USA 11(2):26–30

    Google Scholar 

  • Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Software 12(3):161–175

    Google Scholar 

  • Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inform Softw Technol 122:106287

    Google Scholar 

  • Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R news 2(3):18–22

    Google Scholar 

  • Lin L, Dunson DB (2014) Bayesian monotone regression using gaussian process projection. Biometrika 101(2):303–317

    MathSciNet  MATH  Google Scholar 

  • Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518

    Google Scholar 

  • Manthalkar R, Biswas P (2002) A survey of rotation invariant texture classification methods. IETE J Res 48(3–4):189–198

    Google Scholar 

  • Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (svm) classification. IEEE Trans Neural Netw 17(3):671–682

    Google Scholar 

  • Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis. Wiley, New York

    MATH  Google Scholar 

  • Moradzadeh A, Mohammadi-Ivatloo B, Abapour M, Anvari-Moghaddam A, Farkoush SG, Rhee S-B (2021) A practical solution based on convolutional neural network for non-intrusive load monitoring. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02720-6

    Article  Google Scholar 

  • Ni C, Liu W-S, Chen X, Gu Q, Chen D-X, Huang Q-G (2017) A cluster based feature selection method for cross-project software defect prediction. J Comput Sci Technol 32(6):1090–1107

    Google Scholar 

  • Okutan A, Yıldız OT (2014) Software defect prediction using bayesian networks. Empir Softw Eng 19(1):154–181

    Google Scholar 

  • Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans Softw Eng 33(10):675–686

    Google Scholar 

  • Pandey SK, Mishra RB, Triphathi AK (2018) Software bug prediction prototype using bayesian network classifier: a comprehensive model. Procedia Comput Sci 132:1412–1421

    Google Scholar 

  • Patil S, Rao AN, Bindu CS (2018) Class level software fault prediction using step wise linear regression. Int J Eng Technol 7(4):2552–2557

    Google Scholar 

  • Prabaharan L, Raghunathan A (2021) An improved convolutional neural network for abnormality detection and segmentation from human sperm images. J Ambient Intell Humaniz Comput 12(3):3341–3352

    Google Scholar 

  • Prykhodko S (2016) Developing the software defect prediction models using regression analysis based on normalizing transformations. In: research and practice seminar on modern problems in testing of the applied software (PTTAS-2016), pp. 6–7

  • Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418

    Google Scholar 

  • Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82:357–382

    Google Scholar 

  • Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327

    Google Scholar 

  • Roy SS, Samui P, Nagtode I, Jain H, Shivaramakrishnan V, Mohammadi-Ivatloo B (2020) Forecasting heating and cooling loads of buildings: a comparative performance analysis. J Ambient Intell Humaniz Comput 11(3):1253–1264

    Google Scholar 

  • Ryu D, Jang J-I, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980

    Google Scholar 

  • Sajja TK, Kalluri HK (2021) Image classification using regularized convolutional neural network design with dimensionality reduction modules: Rcnn-drm. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02663-y

    Article  Google Scholar 

  • Shao Y, Liu B, Wang S, Li G (2020) Software defect prediction based on correlation weighted class association rule mining. Knowl-Based Syst 196:105742

    Google Scholar 

  • Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Software Eng 27(11):1014–1022

    Google Scholar 

  • Singh Y, Kaur A, Malhotra R (2009) Software fault proneness prediction using support vector machines. In: proceedings of the world congress on engineering, 1:1–3

  • Song Q, Shepperd M, Cartwright M, Mair C (2006) Software defect association mining and defect correction effort prediction. IEEE Trans Software Eng 32(2):69–82

    Google Scholar 

  • Sun Z, Zhang J, Sun H, Zhu X (2020) Collaborative filtering based recommendation of sampling methods for software defect prediction. Appl Soft Comput 90:106163

    Google Scholar 

  • Sur C (2019) Deepseq: learning browsing log data based personalized security vulnerabilities and counter intelligent measures. J Ambient Intell Humaniz Comput 10(9):3573–3602

    Google Scholar 

  • Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and qsar modeling. J Chem Inf Comput Sci 43(6):1947–1958

    Google Scholar 

  • Thakur AK, Kundu PK, Das A (2021) Prediction of unknown fault of induction motor using svm following decision-directed acyclic graph. J Inst Eng (India): Series B 102(3):573–583

    Google Scholar 

  • Valles-Barajas F (2015) A comparative analysis between two techniques for the prediction of software defects: fuzzy and statistical linear regression. Innov Syst Softw Eng 11(4):277–287

    Google Scholar 

  • Wang W, Zhao M, Wang J (2019) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput 10(8):3035–3043

    Google Scholar 

  • Wang H, Zhuang W, Zhang X (2021) Software defect prediction based on gated hierarchical lstms. IEEE Trans Reliab 70(2):711–727

    Google Scholar 

  • Weisberg S (2005) Applied linear regression. Wiley, New York

    MATH  Google Scholar 

  • Weyuker EJ, Ostrand TJ, Bell RM (2010) Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng 15(3):277–295

    Google Scholar 

  • Wold S, Ruhe A, Wold H, Dunn W III (1984) The collinearity problem in linear regression. The partial least squares (pls) approach to generalized inverses. SIAM J Sci Stat Comput 5(3):735–743

    MATH  Google Scholar 

  • Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200

    Google Scholar 

  • Xu Z, Pang S, Zhang T, Luo X-P, Liu J, Tang Y-T, Yu X, Xue L (2019) Cross project defect prediction via balanced distribution adaptation based transfer learning. J Comput Sci Technol 34(5):1039–1062

    Google Scholar 

  • Yan Z, Chen X, Guo P (2010) Software defect prediction using fuzzy support vector regression. In: international symposium on neural networks. Springer, pp. 17–24

  • Yang X, Wen W (2018) Ridge and lasso regression models for cross-version defect prediction. IEEE Trans Reliab 67(3):885–896

    Google Scholar 

  • Yang X, Tang K, Yao X (2014) A learning-to-rank approach to software defect prediction. IEEE Trans Reliab 64(1):234–246

    Google Scholar 

  • Yang X-L, Lo D, Xia X, Huang Q, Sun J-L (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198

    Google Scholar 

  • Yucalar F, Ozcift A, Borandag E, Kilinc D (2020) Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability. Eng Sci Technol Int J 23(4):938–950

    Google Scholar 

  • Zeinali M, Shafiee M (2017) A new kalman filter based 2d ar model parameter estimation method. IETE J Res 63(2):151–159

    Google Scholar 

  • Zhou H, Jiang S, Liu X (2021) Regression analysis of intelligent education based on linear mixed effect model. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03038-7

    Article  Google Scholar 

Download references

Acknowledgements

The authors are very grateful to the editor and reviewers to provide the valuable insight and remarks, which aided in the advancement of the manuscript.

Funding

Not applicable

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Santosh Singh Rathore.

Ethics declarations

Conflicts of interest/Competing interests

Not applicable

Code availability

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, R., Rathore, S.S. Linear and non-linear bayesian regression methods for software fault prediction. Int J Syst Assur Eng Manag 13, 1864–1884 (2022). https://doi.org/10.1007/s13198-021-01582-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01582-1

Keywords

Navigation