Abstract
The quality of the software being developed varies with the size and complexity of the software. It is a matter of concern in software development as it impairs the faith of customers on the software companies. The quality of software can be improved if the prediction of faults and flaws in it are done in the early phases of the software development and thus reducing the resources to be used in the testing phase. The rise in the use of Object-Oriented technology for developing software has paved the way for considering the Object-Oriented metrics for software fault prediction. Numerous machine learning and statistical techniques have been used to predict the defects in software using these software metrics as independent variables and bug proneness as dependent variable. Our work aims at finding the best category and hence the best classifier for classification of faults. This work uses twenty-one classifiers belonging to five categories of classification on five open source software having Object-Oriented metrics. The classification LearnerApp of MATLAB has been used to evaluate various classification models. The work proposes the use of Ensemble and SVM techniques over KNN, Regression, and Tree. The bagged trees (ensemble) and cubic (SVM) are found to be the best predictors amongst the twenty-one classifiers.
Similar content being viewed by others
Notes
LR: Logistic Regression; SVM: Support Vector Machine; NB: Naive Bayes; DT: Decison Tree; DS: Decision Stump ; RF: Random Forest;MLP: Multilayer Perceptron; NN: Neural Networks; BN: Bayes Net; VFI: Voting Features Intervals
References
Aggarwal K, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 14(1):39–62
Akour M, Alsmadi I, Alazzam I (2017) Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods. Int J Data Anal Tech Strateg 9(1):1–16
Anifowose F, Khoukhi A, Abdulraheem A (2017) Investigating the effect of training-testing data stratification on the performance of soft computing techniques: an experimental study. J Exp Theor Artif Intell 29(3):517–535
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting, pp 11–18
Black F (2009) Business cycles and equilibrium. Wiley
Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng 14(10):1462–1477
Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Conover W (1999) Practical nonparametric statistics . New York: Wiley. 584 p
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
El Emam K, Benlarbi S, Goel N, Rai SN (2001a) Comparing case-based reasoning classifiers for predicting high risk software components. J Syst Softw 55(3):301–320
El Emam K, Benlarbi S, Goel N, Rai SN (2001b) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650
El Emam K, Melo W, Machado JC (2001c) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56(1):63–75
Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814
Ganesan K, Khoshgoftaar TM, Allen EB (2000) Case-based software quality prediction. Int J Softw Eng Knowl Eng 10(02):139–152
Goyal R, Chandra P, Singh Y (2014) Suitability of knn regression in the development of interaction based software fault prediction models. Ieri Procedia 6(1):15–21
Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: 15th international symposium on software reliability engineering, IEEE, pp 417–428
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910
Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: 21st IEEE International Conference on Software Maintenance (ICSM’05), IEEE, pp 263–272
Iqbal A, Aftab S, Ali U, Nawaz Z, Sana L, Ahmad M, Husen A (2019) Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int J Adv Comput Sci Appl 10(5)
Jaiswal A, Tandon A, et al. (2020) Object oriented fault prediction analysis using machine learning algorithms. In: ICDSMLA 2019, Springer, pp 886–892
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492
Kaur A, Kaur I (2014) Empirical evaluation of machine learning algorithms for fault prediction. Lect Notes Softw Eng 2(2):176
Kaur A, Kaur I (2018) An empirical evaluation of classification algorithms for fault prediction in open source projects. J King Saud Univ Comput Inf Sci 30(1):2–17
Kaur A, Kaur K (2015) An empirical study of robustness and stability of machine learning classifiers in software defect prediction. In: Advances in intelligent informatics, Springer, pp 383–397
Khoshgoftaar TM, Seliya N (2003) Analogy-based practical classification rules for software quality estimation. Empir Softw Eng 8(4):325–350
Khoshgoftaar TM, Seliya N (2004) Comparative assessment of software quality classification techniques: an empirical case study. Empir Softw Eng 9(3):229–257
Khoshgoftaar TM, Pandya AS, Lanning DL (1995) Application of neural networks for predicting program faults. Ann Softw Eng 1(1):141–154
Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings the eighth international symposium on software reliability engineering, IEEE, pp 27–35
Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 489–498
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215
Kulamala VK, Maru A, Singla Y, Mohapatra DP, et al. (2018) Predicting software reliability using computational intelligence techniques: a review. In: 2018 international conference on information technology (ICIT), IEEE, pp 114–119
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR’94, Springer, pp 3–12
Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122
Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall, Inc
Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J
Malhotra R, Jain A (2012) Fault prediction using statistical and machine learning methods for improving software quality. J Inf Process Syst 8(2):241–262
Malhotra R, Singh Y (2011) On the applicability of machine learning techniques for object oriented software fault prediction. Softw Eng Int J 1(1):24–37
Malhotra R, Shukla S, Sawhney G (2016) Assessment of defect prediction models using machine learning techniques for object-oriented systems. In: 2016 5th international conference on reliability. Infocom technologies and optimization (trends and future directions)(ICRITO), IEEE, pp 577–583
MATLAB (2010) version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts
McCabe TJ, Watson AH (1994) Combining comprehension and testing in object-oriented development. Object Mag 4(1):63–66
Menzies T, DiStefano J, Orrego A, Chapman R (2004) Assessing predictors of software defects. In: Proceedings of workshop predictive software models
Mitchell TM (2006) The discipline of machine learning, vol 9. Carnegie Mellon University, School of Computer Science, Machine Learning
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on Software engineering, pp 181–190
Myrtveit I, Stensrud E (1999) A controlled experiment to assess the benefits of estimating with analogy and regression models. IEEE Trans Softw Eng 25(4):510–525
Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering, pp 284–292
Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(6):402–419
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Paulson JW, Succi G, Eberlein A (2004) An empirical study of open-source and closed-source software products. IEEE Trans Softw Eng 30(4):246–256
Provost F, Kohavi R (1998) Guest editors’ introduction: on applied research in machine learning. Mach Learn 30(2–3):127–132
Rizwan M, Nadeem A, Sindhu MA (2019) Analyses of classifier’s performance measures used in software fault prediction studies. IEEE Access 7:82764–82775
Schneidewind NF (1992) Methodology for validating software metrics. IEEE Trans Softw Eng 18(5):410–422
Sellers BH (1996) Object-oriented metrics: measures of complexity. PH PTR, New Jersey
Shanthini A (2014) Effect of ensemble methods for software fault prediction at various metrics level. Int J Appl Inf Syst 5:51–55, published by Foundation of Computer Science, New York, USA
Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27(11):1014–1022
Singh Y, Kaur A, Malhotra R (2009) Software fault proneness prediction using support vector machines. Proc World Congr Eng 1:1–3
Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3
Slashdot M (2020) Sourceforge tool: https://www.sourceforgenet/
Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11
Yohannese CW, Li T, Simfukwe M, Khurshid F (2017) Ensembles based combined learning for improved software fault prediction: a comparative study. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE), IEEE, pp 1–6
Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35
Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kaur, I., Kaur, A. Comparative analysis of software fault prediction using various categories of classifiers. Int J Syst Assur Eng Manag 12, 520–535 (2021). https://doi.org/10.1007/s13198-021-01110-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-021-01110-1