Comparative analysis of software fault prediction using various categories of classifiers

Kaur, Inderpreet; Kaur, Arvinder

doi:10.1007/s13198-021-01110-1

Comparative analysis of software fault prediction using various categories of classifiers

Original Article
Published: 10 May 2021

Volume 12, pages 520–535, (2021)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

308 Accesses
7 Citations
Explore all metrics

Abstract

The quality of the software being developed varies with the size and complexity of the software. It is a matter of concern in software development as it impairs the faith of customers on the software companies. The quality of software can be improved if the prediction of faults and flaws in it are done in the early phases of the software development and thus reducing the resources to be used in the testing phase. The rise in the use of Object-Oriented technology for developing software has paved the way for considering the Object-Oriented metrics for software fault prediction. Numerous machine learning and statistical techniques have been used to predict the defects in software using these software metrics as independent variables and bug proneness as dependent variable. Our work aims at finding the best category and hence the best classifier for classification of faults. This work uses twenty-one classifiers belonging to five categories of classification on five open source software having Object-Oriented metrics. The classification LearnerApp of MATLAB has been used to evaluate various classification models. The work proposes the use of Ensemble and SVM techniques over KNN, Regression, and Tree. The bagged trees (ensemble) and cubic (SVM) are found to be the best predictors amongst the twenty-one classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Software Fault Prediction Using Machine Learning Models and Comparative Analysis

Object Oriented Fault Prediction Analysis Using Machine Learning Algorithms

Improved prediction of software defects using ensemble machine learning techniques

Article 02 March 2021

Sweta Mehta & K. Sridhar Patnaik

Notes

LR: Logistic Regression; SVM: Support Vector Machine; NB: Naive Bayes; DT: Decison Tree; DS: Decision Stump ; RF: Random Forest;MLP: Multilayer Perceptron; NN: Neural Networks; BN: Bayes Net; VFI: Voting Features Intervals
https://drive.google.com/drive/folders/12Gs6nJwfWr8_crPCiFKeeI0elpmDsZtp?usp=sharing
https://drive.google.com/drive/folders/12Gs6nJwfWr8_crPCiFKeeI0elpmDsZtp?usp=sharing

References

Aggarwal K, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 14(1):39–62
Article Google Scholar
Akour M, Alsmadi I, Alazzam I (2017) Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods. Int J Data Anal Tech Strateg 9(1):1–16
Article Google Scholar
Anifowose F, Khoukhi A, Abdulraheem A (2017) Investigating the effect of training-testing data stratification on the performance of soft computing techniques: an experimental study. J Exp Theor Artif Intell 29(3):517–535
Article Google Scholar
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Article Google Scholar
Bernstein A, Ekanayake J, Pinzger M (2007) Improving defect prediction using temporal features and non linear models. In: Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting, pp 11–18
Black F (2009) Business cycles and equilibrium. Wiley
Boehm BW, Papaccio PN (1988) Understanding and controlling software costs. IEEE Trans Softw Eng 14(10):1462–1477
Article Google Scholar
Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273
Article Google Scholar
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
Conover W (1999) Practical nonparametric statistics . New York: Wiley. 584 p
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(Jan):1–30
El Emam K, Benlarbi S, Goel N, Rai SN (2001a) Comparing case-based reasoning classifiers for predicting high risk software components. J Syst Softw 55(3):301–320
Article Google Scholar
El Emam K, Benlarbi S, Goel N, Rai SN (2001b) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650
Article Google Scholar
El Emam K, Melo W, Machado JC (2001c) The prediction of faulty classes using object-oriented design metrics. J Syst Softw 56(1):63–75
Article Google Scholar
Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814
Article Google Scholar
Ganesan K, Khoshgoftaar TM, Allen EB (2000) Case-based software quality prediction. Int J Softw Eng Knowl Eng 10(02):139–152
Article Google Scholar
Goyal R, Chandra P, Singh Y (2014) Suitability of knn regression in the development of interaction based software fault prediction models. Ieri Procedia 6(1):15–21
Article Google Scholar
Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: 15th international symposium on software reliability engineering, IEEE, pp 417–428
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910
Article Google Scholar
Hassan AE, Holt RC (2005) The top ten list: dynamic fault prediction. In: 21st IEEE International Conference on Software Maintenance (ICSM’05), IEEE, pp 263–272
Iqbal A, Aftab S, Ali U, Nawaz Z, Sana L, Ahmad M, Husen A (2019) Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Int J Adv Comput Sci Appl 10(5)
Jaiswal A, Tandon A, et al. (2020) Object oriented fault prediction analysis using machine learning algorithms. In: ICDSMLA 2019, Springer, pp 886–892
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Article Google Scholar
Kanmani S, Uthariaraj VR, Sankaranarayanan V, Thambidurai P (2007) Object-oriented software fault prediction using neural networks. Inf Softw Technol 49(5):483–492
Article Google Scholar
Kaur A, Kaur I (2014) Empirical evaluation of machine learning algorithms for fault prediction. Lect Notes Softw Eng 2(2):176
Article Google Scholar
Kaur A, Kaur I (2018) An empirical evaluation of classification algorithms for fault prediction in open source projects. J King Saud Univ Comput Inf Sci 30(1):2–17
Google Scholar
Kaur A, Kaur K (2015) An empirical study of robustness and stability of machine learning classifiers in software defect prediction. In: Advances in intelligent informatics, Springer, pp 383–397
Khoshgoftaar TM, Seliya N (2003) Analogy-based practical classification rules for software quality estimation. Empir Softw Eng 8(4):325–350
Article Google Scholar
Khoshgoftaar TM, Seliya N (2004) Comparative assessment of software quality classification techniques: an empirical case study. Empir Softw Eng 9(3):229–257
Article Google Scholar
Khoshgoftaar TM, Pandya AS, Lanning DL (1995) Application of neural networks for predicting program faults. Ann Softw Eng 1(1):141–154
Article Google Scholar
Khoshgoftaar TM, Ganesan K, Allen EB, Ross FD, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings the eighth international symposium on software reliability engineering, IEEE, pp 27–35
Kim S, Zimmermann T, Whitehead Jr EJ, Zeller A (2007) Predicting faults from cached history. In: 29th international conference on software engineering (ICSE’07), IEEE, pp 489–498
Kubat M, Holte RC, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30(2–3):195–215
Article Google Scholar
Kulamala VK, Maru A, Singla Y, Mohapatra DP, et al. (2018) Predicting software reliability using computational intelligence techniques: a review. In: 2018 international conference on information technology (ICIT), IEEE, pp 114–119
Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: SIGIR’94, Springer, pp 3–12
Li W, Henry S (1993) Object-oriented metrics that predict maintainability. J Syst Softw 23(2):111–122
Article Google Scholar
Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall, Inc
Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J
Malhotra R, Jain A (2012) Fault prediction using statistical and machine learning methods for improving software quality. J Inf Process Syst 8(2):241–262
Article Google Scholar
Malhotra R, Singh Y (2011) On the applicability of machine learning techniques for object oriented software fault prediction. Softw Eng Int J 1(1):24–37
Google Scholar
Malhotra R, Shukla S, Sawhney G (2016) Assessment of defect prediction models using machine learning techniques for object-oriented systems. In: 2016 5th international conference on reliability. Infocom technologies and optimization (trends and future directions)(ICRITO), IEEE, pp 577–583
MATLAB (2010) version 7.10.0 (R2010a). The MathWorks Inc., Natick, Massachusetts
McCabe TJ, Watson AH (1994) Combining comprehension and testing in object-oriented development. Object Mag 4(1):63–66
Google Scholar
Menzies T, DiStefano J, Orrego A, Chapman R (2004) Assessing predictors of software defects. In: Proceedings of workshop predictive software models
Mitchell TM (2006) The discipline of machine learning, vol 9. Carnegie Mellon University, School of Computer Science, Machine Learning
Google Scholar
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on Software engineering, pp 181–190
Myrtveit I, Stensrud E (1999) A controlled experiment to assess the benefits of estimating with analogy and regression models. IEEE Trans Softw Eng 25(4):510–525
Article Google Scholar
Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391
Article Google Scholar
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering, pp 284–292
Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(6):402–419
Article Google Scholar
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Article Google Scholar
Paulson JW, Succi G, Eberlein A (2004) An empirical study of open-source and closed-source software products. IEEE Trans Softw Eng 30(4):246–256
Article Google Scholar
Provost F, Kohavi R (1998) Guest editors’ introduction: on applied research in machine learning. Mach Learn 30(2–3):127–132
Article Google Scholar
Rizwan M, Nadeem A, Sindhu MA (2019) Analyses of classifier’s performance measures used in software fault prediction studies. IEEE Access 7:82764–82775
Article Google Scholar
Schneidewind NF (1992) Methodology for validating software metrics. IEEE Trans Softw Eng 18(5):410–422
Article Google Scholar
Sellers BH (1996) Object-oriented metrics: measures of complexity. PH PTR, New Jersey
Google Scholar
Shanthini A (2014) Effect of ensemble methods for software fault prediction at various metrics level. Int J Appl Inf Syst 5:51–55, published by Foundation of Computer Science, New York, USA
Shepperd M, Kadoda G (2001) Comparing software prediction techniques using simulation. IEEE Trans Softw Eng 27(11):1014–1022
Article Google Scholar
Singh Y, Kaur A, Malhotra R (2009) Software fault proneness prediction using support vector machines. Proc World Congr Eng 1:1–3
Google Scholar
Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3
Article Google Scholar
Slashdot M (2020) Sourceforge tool: https://www.sourceforgenet/
Spinellis D (2005) Tool writing: a forgotten art? (software tools). IEEE Softw 22(4):9–11
Article Google Scholar
Yohannese CW, Li T, Simfukwe M, Khurshid F (2017) Ensembles based combined learning for improved software fault prediction: a comparative study. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE), IEEE, pp 1–6
Youden WJ (1950) Index for rating diagnostic tests. Cancer 3(1):32–35
Article Google Scholar
Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789
Article Google Scholar

Download references

Author information

Authors and Affiliations

USICT, GGSIPU, Dwarka, India
Inderpreet Kaur & Arvinder Kaur

Authors

Inderpreet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Arvinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Inderpreet Kaur.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaur, I., Kaur, A. Comparative analysis of software fault prediction using various categories of classifiers. Int J Syst Assur Eng Manag 12, 520–535 (2021). https://doi.org/10.1007/s13198-021-01110-1

Download citation

Received: 01 June 2019
Revised: 09 March 2021
Accepted: 16 April 2021
Published: 10 May 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s13198-021-01110-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of software fault prediction using various categories of classifiers

Abstract

Access this article

Similar content being viewed by others

Software Fault Prediction Using Machine Learning Models and Comparative Analysis

Object Oriented Fault Prediction Analysis Using Machine Learning Algorithms

Improved prediction of software defects using ensemble machine learning techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparative analysis of software fault prediction using various categories of classifiers

Abstract

Access this article

Similar content being viewed by others

Software Fault Prediction Using Machine Learning Models and Comparative Analysis

Object Oriented Fault Prediction Analysis Using Machine Learning Algorithms

Improved prediction of software defects using ensemble machine learning techniques

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation