Skip to main content
Log in

Comparative analysis of software fault prediction using various categories of classifiers

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

The quality of the software being developed varies with the size and complexity of the software. It is a matter of concern in software development as it impairs the faith of customers on the software companies. The quality of software can be improved if the prediction of faults and flaws in it are done in the early phases of the software development and thus reducing the resources to be used in the testing phase. The rise in the use of Object-Oriented technology for developing software has paved the way for considering the Object-Oriented metrics for software fault prediction. Numerous machine learning and statistical techniques have been used to predict the defects in software using these software metrics as independent variables and bug proneness as dependent variable. Our work aims at finding the best category and hence the best classifier for classification of faults. This work uses twenty-one classifiers belonging to five categories of classification on five open source software having Object-Oriented metrics. The classification LearnerApp of MATLAB has been used to evaluate various classification models. The work proposes the use of Ensemble and SVM techniques over KNN, Regression, and Tree. The bagged trees (ensemble) and cubic (SVM) are found to be the best predictors amongst the twenty-one classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. LR: Logistic Regression; SVM: Support Vector Machine; NB: Naive Bayes; DT: Decison Tree; DS: Decision Stump ; RF: Random Forest;MLP: Multilayer Perceptron; NN: Neural Networks; BN: Bayes Net; VFI: Voting Features Intervals

  2. https://drive.google.com/drive/folders/12Gs6nJwfWr8_crPCiFKeeI0elpmDsZtp?usp=sharing

  3. https://drive.google.com/drive/folders/12Gs6nJwfWr8_crPCiFKeeI0elpmDsZtp?usp=sharing

References

  • Aggarwal K, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 14(1):39–62

    Article  Google Scholar 

  • Akour M, Alsmadi I, Alazzam I (2017) Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods. Int J Data Anal Tech Strateg 9(1):1–16

    Article  Google Scholar 

  • Anifowose F, Khoukhi A, Abdulraheem A (2017) Investigating the effect of training-testing data stratification on the performance of soft computing techniques: an experimental study. J Exp Theor Artif Intell 29(3):517–535

    Article  Google Scholar 

  • Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting, pp 11–18

  • Black F (2009) Business cycles and equilibrium. Wiley

  • Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng 14(10):1462–1477

    Article  Google Scholar 

  • Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273

    Article  Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Conover W (1999) Practical nonparametric statistics . New York: Wiley. 584 p

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30

  • El Emam K, Benlarbi S, Goel N, Rai SN (2001a) Comparing case-based reasoning classifiers for predicting high risk software components. J Syst Softw 55(3):301–320

    Article  Google Scholar 

  • El Emam K, Benlarbi S, Goel N, Rai SN (2001b) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650

    Article  Google Scholar 

  • El Emam K, Melo W, Machado JC (2001c) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56(1):63–75

    Article  Google Scholar 

  • Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814

    Article  Google Scholar 

  • Ganesan K, Khoshgoftaar TM, Allen EB (2000) Case-based software quality prediction. Int J Softw Eng Knowl Eng 10(02):139–152

    Article  Google Scholar 

  • Goyal R, Chandra P, Singh Y (2014) Suitability of knn regression in the development of interaction based software fault prediction models. Ieri Procedia 6(1):15–21

    Article  Google Scholar 

  • Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: 15th international symposium on software reliability engineering, IEEE, pp 417–428

  • Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910

    Article  Google Scholar 

  • Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: 21st IEEE International Conference on Software Maintenance (ICSM’05), IEEE, pp 263–272

  • Iqbal A, Aftab S, Ali U, Nawaz Z, Sana L, Ahmad M, Husen A (2019) Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int J Adv Comput Sci Appl 10(5)

  • Jaiswal A, Tandon A, et al. (2020) Object oriented fault prediction analysis using machine learning algorithms. In: ICDSMLA 2019, Springer, pp 886–892

  • Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595

    Article  Google Scholar 

  • Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492

    Article  Google Scholar 

  • Kaur A, Kaur I (2014) Empirical evaluation of machine learning algorithms for fault prediction. Lect Notes Softw Eng 2(2):176

    Article  Google Scholar 

  • Kaur A, Kaur I (2018) An empirical evaluation of classification algorithms for fault prediction in open source projects. J King Saud Univ Comput Inf Sci 30(1):2–17

    Google Scholar 

  • Kaur A, Kaur K (2015) An empirical study of robustness and stability of machine learning classifiers in software defect prediction. In: Advances in intelligent informatics, Springer, pp 383–397

  • Khoshgoftaar TM, Seliya N (2003) Analogy-based practical classification rules for software quality estimation. Empir Softw Eng 8(4):325–350

    Article  Google Scholar 

  • Khoshgoftaar TM, Seliya N (2004) Comparative assessment of software quality classification techniques: an empirical case study. Empir Softw Eng 9(3):229–257

    Article  Google Scholar 

  • Khoshgoftaar TM, Pandya AS, Lanning DL (1995) Application of neural networks for predicting program faults. Ann Softw Eng 1(1):141–154

    Article  Google Scholar 

  • Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings the eighth international symposium on software reliability engineering, IEEE, pp 27–35

  • Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 489–498

  • Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215

    Article  Google Scholar 

  • Kulamala VK, Maru A, Singla Y, Mohapatra DP, et al. (2018) Predicting software reliability using computational intelligence techniques: a review. In: 2018 international conference on information technology (ICIT), IEEE, pp 114–119

  • Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR’94, Springer, pp 3–12

  • Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122

    Article  Google Scholar 

  • Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall, Inc

  • Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J

  • Malhotra R, Jain A (2012) Fault prediction using statistical and machine learning methods for improving software quality. J Inf Process Syst 8(2):241–262

    Article  Google Scholar 

  • Malhotra R, Singh Y (2011) On the applicability of machine learning techniques for object oriented software fault prediction. Softw Eng Int J 1(1):24–37

    Google Scholar 

  • Malhotra R, Shukla S, Sawhney G (2016) Assessment of defect prediction models using machine learning techniques for object-oriented systems. In: 2016 5th international conference on reliability. Infocom technologies and optimization (trends and future directions)(ICRITO), IEEE, pp 577–583

  • MATLAB (2010) version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts

  • McCabe TJ, Watson AH (1994) Combining comprehension and testing in object-oriented development. Object Mag 4(1):63–66

    Google Scholar 

  • Menzies T, DiStefano J, Orrego A, Chapman R (2004) Assessing predictors of software defects. In: Proceedings of workshop predictive software models

  • Mitchell TM (2006) The discipline of machine learning, vol 9. Carnegie Mellon University, School of Computer Science, Machine Learning

    Google Scholar 

  • Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on Software engineering, pp 181–190

  • Myrtveit I, Stensrud E (1999) A controlled experiment to assess the benefits of estimating with analogy and regression models. IEEE Trans Softw Eng 25(4):510–525

    Article  Google Scholar 

  • Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391

    Article  Google Scholar 

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering, pp 284–292

  • Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(6):402–419

    Article  Google Scholar 

  • Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355

    Article  Google Scholar 

  • Paulson JW, Succi G, Eberlein A (2004) An empirical study of open-source and closed-source software products. IEEE Trans Softw Eng 30(4):246–256

    Article  Google Scholar 

  • Provost F, Kohavi R (1998) Guest editors’ introduction: on applied research in machine learning. Mach Learn 30(2–3):127–132

    Article  Google Scholar 

  • Rizwan M, Nadeem A, Sindhu MA (2019) Analyses of classifier’s performance measures used in software fault prediction studies. IEEE Access 7:82764–82775

    Article  Google Scholar 

  • Schneidewind NF (1992) Methodology for validating software metrics. IEEE Trans Softw Eng 18(5):410–422

    Article  Google Scholar 

  • Sellers BH (1996) Object-oriented metrics: measures of complexity. PH PTR, New Jersey

    Google Scholar 

  • Shanthini A (2014) Effect of ensemble methods for software fault prediction at various metrics level. Int J Appl Inf Syst 5:51–55, published by Foundation of Computer Science, New York, USA

  • Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27(11):1014–1022

    Article  Google Scholar 

  • Singh Y, Kaur A, Malhotra R (2009) Software fault proneness prediction using support vector machines. Proc World Congr Eng 1:1–3

    Google Scholar 

  • Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3

    Article  Google Scholar 

  • Slashdot M (2020) Sourceforge tool: https://www.sourceforgenet/

  • Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11

    Article  Google Scholar 

  • Yohannese CW, Li T, Simfukwe M, Khurshid F (2017) Ensembles based combined learning for improved software fault prediction: a comparative study. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE), IEEE, pp 1–6

  • Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35

    Article  Google Scholar 

  • Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Inderpreet Kaur.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaur, I., Kaur, A. Comparative analysis of software fault prediction using various categories of classifiers. Int J Syst Assur Eng Manag 12, 520–535 (2021). https://doi.org/10.1007/s13198-021-01110-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01110-1

Keywords

Navigation