Soft Computing

, Volume 21, Issue 24, pp 7417–7434 | Cite as

An empirical study of some software fault prediction techniques for the number of faults prediction

Methodologies and Application


During the software development process, prediction of the number of faults in software modules can be more helpful instead of predicting the modules being faulty or non-faulty. Such an approach may help in more focused software testing process and may enhance the reliability of the software system. Most of the earlier works on software fault prediction have used classification techniques for classifying software modules into faulty or non-faulty categories. The techniques such as Poisson regression, negative binomial regression, genetic programming, decision tree regression, and multilayer perceptron can be used for the prediction of the number of faults. In this paper, we present an experimental study to evaluate and compare the capability of six fault prediction techniques such as genetic programming, multilayer perceptron, linear regression, decision tree regression, zero-inflated Poisson regression, and negative binomial regression for the prediction of number of faults. The experimental investigation is carried out for eighteen software project datasets collected from the PROMISE data repository. The results of the investigation are evaluated using average absolute error, average relative error, measure of completeness, and prediction at level l measures. We also perform Kruskal–Wallis test and Dunn’s multiple comparison test to compare the relative performance of the considered fault prediction techniques.


Software fault prediction Zero-inflated Poisson regression Genetic programming Multilayer perceptron Kruskal–Wallis test Dunn’s multiple comparison test 


Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain any studies with human participants.


  1. Afzal W, Torkar R, Feldt R (2008) prediction of fault count data using genetic programming. In: IEEE International conference on Multitopic, INMIC’08, pp 349–356Google Scholar
  2. Bacchelli A, DAmbros, M, Lanza M (2010) Are popular classes more defect prone?. In: Fundamental approaches to software engineering, Springer, pp 59–73Google Scholar
  3. Basili V, Briand L, Melo W (1993) Object-oriented metrics that predict maintainability. J Syst Soft 23(2):111–122CrossRefGoogle Scholar
  4. Bland JM, Altman DG (1995) Multiple significance tests: the bonferroni method. BMJ 310(6973):170CrossRefGoogle Scholar
  5. Briand L, Jurgen W (2002) Empirical studies of quality models in object-oriented systems. Adv Comput J 56:97–166CrossRefGoogle Scholar
  6. Cameron AC, Trivedi PK (2013) Regression analysis of count. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  7. Casella G (2008) Statistical design. Springer, New YorkCrossRefMATHGoogle Scholar
  8. Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl J 38(4):4626–4636CrossRefGoogle Scholar
  9. Chen M, Yutao M (2015) An empirical study on predicting defect numbers. In: Proceedings of software engineering and knowledge engineering conference, SEKE’15, 2015, pp 397–402Google Scholar
  10. Cohen J, Cohen P, West SG, Aiken LS (2002) Applied multiple regression and correlation analysis for the behavioral sciences, 3rd edn. Routledge, LondonGoogle Scholar
  11. Conte SD, Dunsmore HE, Shen VY (1986) Software engineering metrics and models. Benjamin-Cummings Publishing Co. Inc, Redwood CityGoogle Scholar
  12. Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley, HobokenMATHGoogle Scholar
  13. Elish MO, Aljamaan H, Ahmad I (2015) Three empirical studies on predicting software maintainability using ensemble methods. Soft Comput J 19(9):1–14Google Scholar
  14. Gao K, Khoshgoftaar TM (2007) A comprehensive empirical study of count models for software fault prediction. IEEE Trans Softw Eng 50(2):223–237Google Scholar
  15. Goldberg DE (1989) Genetic algorithms in search optimization and machine learning, 1st edn. Addison-Wesley Longman Publishing Co.Inc, BostonMATHGoogle Scholar
  16. Graves T, Karr A, Marron J, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRefGoogle Scholar
  17. Greene WH (2011) Econometric analysis. 7th edn. Pearson, New YorkGoogle Scholar
  18. Hilbe JM (2012) Negative binomial regression, 2nd edn. Jet Propulsion Laboratory California Institute of Technology and Arizona State University, CaliforniaMATHGoogle Scholar
  19. Janes A, Scotto M, Pedrycz W, Russo B, Stefanovic M, Succi G (2006) Identification of defect-prone classes in telecommunication software systems using design metrics. Inf Sci J 176(24):3711–3734CrossRefGoogle Scholar
  20. Jureczko M (2011) Significance of different software metrics in defect prediction. Softw Eng Int J 1(1):86–95Google Scholar
  21. Juristo N, Moreno AM (2013) Basics of software engineering experimentation. Springer, New YorkMATHGoogle Scholar
  22. Khoshgoftaar T, Pandya A, More H (1992a) A neural network approach for predicting software development faults. In: Third international symposium on software reliability engineering, pp 83–89Google Scholar
  23. Khoshgoftaar TM, Munson JC, Bhattacharya BB, Richardson GD (1992b) Predictive modeling techniques of software quality from software measures. IEEE Trans Softw Eng 18(11):979–987CrossRefGoogle Scholar
  24. Khoshgoftaar TM, Ganesan K, Allen BE, Ross DF, Munikoti R, Goel N, Nandi A (1997) Predicting fault-prone modules with case-based reasoning. In: Proceedings of the eighth international symposium on software reliability engineering, ISSRE ’97. IEEE computer societyGoogle Scholar
  25. Khoshgoftaar TM, Gao K (2007) Count models for software quality estimation. IEEE Trans Reliab 56(2):212–222CrossRefGoogle Scholar
  26. Kohavi R et al (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14:1137–1145Google Scholar
  27. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 conference on emerging artificial intelligence applications in computer engineering: real word AI systems with applications in e health, HCI, Information Retrieval and Pervasive Technologies, The Netherlands, pp 3–24Google Scholar
  28. Kpodjedo S, Ricca F, Antoniol G, Galinier P (2009) Evolution and search based metrics to improve defects prediction. In: 1st International symposium on search based software engineering, 2009, pp 23–32Google Scholar
  29. Lambert D (1992) Zero-inflated poisson regression, with an application to defects in manufacturing. Technom J 34(1):1–14CrossRefMATHGoogle Scholar
  30. Liguo Y (2012) Using negative binomial regression analysis to predict software faults: a study of apache ant. Inf Technol Comput Sci J 4(8):63–70Google Scholar
  31. Marinescu C (2014) How good is genetic programming at predicting changes and defects?. In: 2014 16th International symposium on symbolic and numeric algorithms for scientific computing, IEEE, pp 544–548Google Scholar
  32. Menzies T, Milton Z, Burak T, Cukic B, Jiang Y, Bener A (2010) Defect prediction from static code features: current results, limitations, new approaches. Autom Softw Eng J 17(4):375–407Google Scholar
  33. Menzies T, Krishna R, Pryor D (2016) The promise repository of empirical software engineering data. North Carolina State University.
  34. Ostrand TJ, Weyuker EJ, Bell RM (2004) Where the bugs are. In: Proceedings of 2004 international symposium on software testing and analysis, pp 86–96Google Scholar
  35. Ostrand TJ, Weyuker EJ, Bell RM (2005a) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355CrossRefGoogle Scholar
  36. Ostrand TJ, Weyuker EJ, Bell RM (2005b) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355Google Scholar
  37. Quinlan JR et al. (1992) Learning with continuous classes. In: 5th Australian joint conference on artificial intelligence, vol 92, pp 343–348Google Scholar
  38. Rathore SS, Kumar S (2015a) Predicting number of faults in software system using genetic programming. In: 2015 International conference on soft computing and software engineering, pp 52–59Google Scholar
  39. Rathore SS, Kumar S (2015b) Comparative analysis of neural network and genetic programming for number of software faults prediction. In: Presented in 2015 national conference on recent advances in electronics and computer engineering (RAECE’15) held at IIT Roorkee, IndiaGoogle Scholar
  40. Rathore SS, Kumar S (2016a) A decision tree logic based recommendation system to select software fault prediction techniques. Computing, 1–31. doi: 10.1007/s00607-016-0489-6
  41. Rathore SS, Kumar S (2016b) A decision tree regression based approach for the number of software faults prediction. ACM SIGSOFT Softw Eng Notes 41(1):1–6CrossRefGoogle Scholar
  42. Scanniello G, Gravino C, Marcus A, Menzies T (2013) Class level fault prediction using software clustering. In: 2013 IEEE/ACM 28th international conference on automated software engineering, IEEE, pp 640–645Google Scholar
  43. Smith SF (1980) A learning system based on genetic adaptive algorithms. PhD thesis, Pittsburgh, PA, USA. AAI8112638Google Scholar
  44. Strutz T (2011) Data fitting and uncertainty. Vieweg and Teubner Verlag Springer, New YorkCrossRefGoogle Scholar
  45. Venkata UB, Bastani BF, Yen IL (2006) A unified framework for defect data analysis using the mbr technique. In: Proceeding of 18th IEEE international conference on tools with artificial intelligence, ICTAI ’06, 2006, pp 39–46Google Scholar
  46. Veryard R (2014) The economics of information systems and software. Elsevier Science, AmsterdamGoogle Scholar
  47. Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443CrossRefGoogle Scholar
  48. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, BurlingtonMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology RoorkeeRoorkeeIndia

Personalised recommendations