Abstract
Faults are most likely to occur during the coding phase of software development. If, before the testing process, we can predict parts of code that are more prone to faults, then a large amount of time, software cost could be saved, and the software’s overall quality could be improved. Various researchers have previously attempted to predict software faults using numerous machine learning techniques in order to identify whether software modules are fault-prone or not. Ranking the software modules based on their fault content has rarely been explored before. Additionally, Bayesian methods have not been explored before for this task. We aim to investigate both linear and non-linear Bayesian regression methods for software fault prediction in this work. We develop and evaluate fault prediction models for two scenarios: intra-release prediction and cross-release prediction. The experimental investigation is conducted on 46 different software project versions. We use mean absolute error, and root means square error, and fault percentage average as performance measures. The results showed that Bayesian NLR outperformed linear regression and other used machine learning approaches or produced at least comparable performance. Bayesian linear regression method performed moderately.
Similar content being viewed by others
Availability of data and material
Not applicable
References
Abbasimehr H, Paki R (2021) Improving time series forecasting using lstm and attention models. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02761-x
Ali KS, Sampath P (2021) Sparse bayesian learning kalman filter-based channel estimation for hybrid millimeter wave mimo systems: a frequency domain approach. IETE J Res. https://doi.org/10.1080/03772063.2021.1951367
Al-Jamimi HA, Ghouti L (2011) Efficient prediction of software fault proneness modules using support vector machines and probabilistic neural networks. In: malaysian conference in software engineering. IEEE 2011:251–256
Altland HW (1999) Regression analysis: statistical modeling of a response variable
Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to mcmc for machine learning. Mach Learn 50(1–2):5–43
Boehm B, Basili VR (2001) Software defect reduction top 10 list. Computer 34(1):135–137. https://doi.org/10.1109/2.962984
Brassington G (2017) Mean absolute error and root mean square error: which is the better metric for assessing model performance?. EGUGA, p 3574
Bromiley P (2003) Products and convolutions of gaussian probability density functions. Tina-Vision Memo 3(4):1
Chatterjee S, Maji B (2018) A bayesian belief network based model for predicting software faults in early phase of software development process. Appl Intell 48(8):2214–2228
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Chen X, Zhang D, Cui Z-Q, Gu Q, Ju X-L (2019) Dp-share: Privacy-preserving software defect prediction model sharing through differential privacy. J Comput Sci Technol 34(5):1020–1038
Chen X, Zhang D, Zhao Y, Cui Z, Ni C (2019) Software defect number prediction: unsupervised vs supervised methods. Inf Softw Technol 106:161–181
Chen J, Ma S, Wu Y (2021) International carbon financial market prediction using particle swarm optimization and support vector machine. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03240-7
Chu W, Ghahramani Z (2005) Gaussian processes for ordinal regression. J Mach Learn Res 6:1019–1041
Cruz AEC, Ochimizu K (2009) Towards logistic regression models for predicting fault-prone code across software projects. In: 3rd international symposium on empirical software engineering and measurement. IEEE 2009:460–463
Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C-J (2018) A deep tree-based model for software defect prediction. arXiv preprintarXiv:1802.00921
Dejaeger K, Verbraken T, Baesens B (2012) Toward comprehensible software fault prediction models using bayesian network classifiers. IEEE Trans Software Eng 39(2):237–257
Dellaportas P, Forster JJ, Ntzoufras I (2002) On bayesian model and variable selection using mcmc. Stat Comput 12(1):27–36
Erturk E, Sezer EA (2015) A comparison of some soft computing methods for software fault prediction. Expert Syst Appl 42(4):1872–1879
Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using bayesian nets. Inf Softw Technol 49(1):32–43
Goseva-Popstojanova K, Ahmad M, Alshehri Y (2019) Software fault proneness prediction with group lasso regression: On factors that affect classification performance. In: IEEE 43rd annual computer software and applications conference (COMPSAC), vol. 2. IEEE 2019:336–343
Harris T, Hardin JW (2013) Exact wilcoxon signed-rank and wilcoxon mann-whitney ranksum tests. Stand Genomic Sci 13(2):337–343
Ibrahim DR, Ghnemat R, Hudaib A (2017) Software defect prediction using feature selection and random forest algorithm. In: 2017 international conference on new trends in computing sciences (ICTCS). IEEE, pp 252–257
Jakkula V (2006) Tutorial on support vector machine (svm). Washington State University, School of EECS
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Jiang T, Tan L, Kim S (2013) Personalized defect prediction. In: 2013 28th IEEE/ACM international conference on automated software engineering (ASE). Ieee, pp. 279–289
Jiarpakdee J, Tantithamthavorn C, Dam HK, Grundy J (2020) An empirical study of model-agnostic techniques for defect prediction models. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2020.2982385
Jin C (2021) Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Syst Appl 171:114637
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: proceedings of the 6th international conference on predictive models in software engineering, pp 1–10
Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: proceedings the eighth international symposium on software reliability engineering. IEEE, pp 27–35
Kumar S, Rathore SS (2018) Software fault prediction: a road map. Springer, Berlin
Kumar P, Singh S (2016) Defect prediction model for aop-based software development using hybrid fuzzy c-means with genetic algorithm and k-nearest neighbors classifier. Int J Appl Inform Syst (IJAIS) Found Comput Sci, New York, USA 11(2):26–30
Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Software 12(3):161–175
Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inform Softw Technol 122:106287
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R news 2(3):18–22
Lin L, Dunson DB (2014) Bayesian monotone regression using gaussian process projection. Biometrika 101(2):303–317
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Manthalkar R, Biswas P (2002) A survey of rotation invariant texture classification methods. IETE J Res 48(3–4):189–198
Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (svm) classification. IEEE Trans Neural Netw 17(3):671–682
Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis. Wiley, New York
Moradzadeh A, Mohammadi-Ivatloo B, Abapour M, Anvari-Moghaddam A, Farkoush SG, Rhee S-B (2021) A practical solution based on convolutional neural network for non-intrusive load monitoring. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02720-6
Ni C, Liu W-S, Chen X, Gu Q, Chen D-X, Huang Q-G (2017) A cluster based feature selection method for cross-project software defect prediction. J Comput Sci Technol 32(6):1090–1107
Okutan A, Yıldız OT (2014) Software defect prediction using bayesian networks. Empir Softw Eng 19(1):154–181
Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans Softw Eng 33(10):675–686
Pandey SK, Mishra RB, Triphathi AK (2018) Software bug prediction prototype using bayesian network classifier: a comprehensive model. Procedia Comput Sci 132:1412–1421
Patil S, Rao AN, Bindu CS (2018) Class level software fault prediction using step wise linear regression. Int J Eng Technol 7(4):2552–2557
Prabaharan L, Raghunathan A (2021) An improved convolutional neural network for abnormality detection and segmentation from human sperm images. J Ambient Intell Humaniz Comput 12(3):3341–3352
Prykhodko S (2016) Developing the software defect prediction models using regression analysis based on normalizing transformations. In: research and practice seminar on modern problems in testing of the applied software (PTTAS-2016), pp. 6–7
Radjenović D, Heričko M, Torkar R, Živkovič A (2013) Software fault prediction metrics: a systematic literature review. Inf Softw Technol 55(8):1397–1418
Rathore SS, Kumar S (2017) Towards an ensemble based system for predicting the number of software faults. Expert Syst Appl 82:357–382
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327
Roy SS, Samui P, Nagtode I, Jain H, Shivaramakrishnan V, Mohammadi-Ivatloo B (2020) Forecasting heating and cooling loads of buildings: a comparative performance analysis. J Ambient Intell Humaniz Comput 11(3):1253–1264
Ryu D, Jang J-I, Baik J (2015) A hybrid instance selection using nearest-neighbor for cross-project defect prediction. J Comput Sci Technol 30(5):969–980
Sajja TK, Kalluri HK (2021) Image classification using regularized convolutional neural network design with dimensionality reduction modules: Rcnn-drm. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02663-y
Shao Y, Liu B, Wang S, Li G (2020) Software defect prediction based on correlation weighted class association rule mining. Knowl-Based Syst 196:105742
Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Software Eng 27(11):1014–1022
Singh Y, Kaur A, Malhotra R (2009) Software fault proneness prediction using support vector machines. In: proceedings of the world congress on engineering, 1:1–3
Song Q, Shepperd M, Cartwright M, Mair C (2006) Software defect association mining and defect correction effort prediction. IEEE Trans Software Eng 32(2):69–82
Sun Z, Zhang J, Sun H, Zhu X (2020) Collaborative filtering based recommendation of sampling methods for software defect prediction. Appl Soft Comput 90:106163
Sur C (2019) Deepseq: learning browsing log data based personalized security vulnerabilities and counter intelligent measures. J Ambient Intell Humaniz Comput 10(9):3573–3602
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP (2003) Random forest: a classification and regression tool for compound classification and qsar modeling. J Chem Inf Comput Sci 43(6):1947–1958
Thakur AK, Kundu PK, Das A (2021) Prediction of unknown fault of induction motor using svm following decision-directed acyclic graph. J Inst Eng (India): Series B 102(3):573–583
Valles-Barajas F (2015) A comparative analysis between two techniques for the prediction of software defects: fuzzy and statistical linear regression. Innov Syst Softw Eng 11(4):277–287
Wang W, Zhao M, Wang J (2019) Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput 10(8):3035–3043
Wang H, Zhuang W, Zhang X (2021) Software defect prediction based on gated hierarchical lstms. IEEE Trans Reliab 70(2):711–727
Weisberg S (2005) Applied linear regression. Wiley, New York
Weyuker EJ, Ostrand TJ, Bell RM (2010) Comparing the effectiveness of several modeling methods for fault prediction. Empir Softw Eng 15(3):277–295
Wold S, Ruhe A, Wold H, Dunn W III (1984) The collinearity problem in linear regression. The partial least squares (pls) approach to generalized inverses. SIAM J Sci Stat Comput 5(3):735–743
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200
Xu Z, Pang S, Zhang T, Luo X-P, Liu J, Tang Y-T, Yu X, Xue L (2019) Cross project defect prediction via balanced distribution adaptation based transfer learning. J Comput Sci Technol 34(5):1039–1062
Yan Z, Chen X, Guo P (2010) Software defect prediction using fuzzy support vector regression. In: international symposium on neural networks. Springer, pp. 17–24
Yang X, Wen W (2018) Ridge and lasso regression models for cross-version defect prediction. IEEE Trans Reliab 67(3):885–896
Yang X, Tang K, Yao X (2014) A learning-to-rank approach to software defect prediction. IEEE Trans Reliab 64(1):234–246
Yang X-L, Lo D, Xia X, Huang Q, Sun J-L (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198
Yucalar F, Ozcift A, Borandag E, Kilinc D (2020) Multiple-classifiers in software quality engineering: combining predictors to improve software fault prediction ability. Eng Sci Technol Int J 23(4):938–950
Zeinali M, Shafiee M (2017) A new kalman filter based 2d ar model parameter estimation method. IETE J Res 63(2):151–159
Zhou H, Jiang S, Liu X (2021) Regression analysis of intelligent education based on linear mixed effect model. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03038-7
Acknowledgements
The authors are very grateful to the editor and reviewers to provide the valuable insight and remarks, which aided in the advancement of the manuscript.
Funding
Not applicable
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest/Competing interests
Not applicable
Code availability
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, R., Rathore, S.S. Linear and non-linear bayesian regression methods for software fault prediction. Int J Syst Assur Eng Manag 13, 1864–1884 (2022). https://doi.org/10.1007/s13198-021-01582-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-021-01582-1