Software Defect Prediction Using Principal Component Analysis and Naïve Bayes Algorithm

  • N. DhamayanthiEmail author
  • B. Lavanya
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 28)


How can I deliver defect-free software? Can I achieve more with less resources? How can I reduce time, effort, and cost involved in developing software? Software defect prediction is an important area of research which can significantly help the software development teams grappling with these questions in an effective way. A small increase in prediction accuracy will go a long way in helping software development teams improve their efficiency. In this paper, we have proposed a framework which uses PCA for dimensionality reduction and Naïve Bayes classification algorithm for building the prediction model. We have used seven projects from NASA Metrics Data Program for conducting experiments. We have seen an average increase of 10.3% in prediction accuracy when the learning algorithm is applied with the key features extracted from the datasets.


Software defect prediction Fault proneness Classification Feature selection Naïve Bayes classification algorithm Principal component analysis Software quality Machine learning algorithms Fault prediction Dimensionality reduction Data mining Machine learning techniques NASA Metrics Data Program Stratified tenfold cross-validation Reliable software Prediction modeling 


  1. 1.
    Dhamayanthi N, Lavanya B (2019) Improvement in software defect prediction outcome using principal component analysis and ensemble machine learning algorithms. In: Hemanth J, Fernando X, Lafata P, Baig Z (eds) International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018. ICICI 2018. Lecture notes on data engineering and communications technologies, vol 26. Springer, Cham.
  2. 2.
    Murillo-Morera J, Castro-Herrera C, Arroyo J, Fucntcs-Fernandez R (2016) An automated defect prediction framework using genetic algorithms: a validation of empirical studies. Inteligencia Artif 19(57):114–137CrossRefGoogle Scholar
  3. 3.
    Song Q, Jia Z, Shepperd M, Ying S, Liu J (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370CrossRefGoogle Scholar
  4. 4.
    Shirabad JS, Menzies TJ (2005) The PROMISE repository of software engineering databases. School of Information Technology and Engineering, University of Ottawa, Canada. Available:
  5. 5.
    Wang S, Ping HE, Zelin L (2016) An enhanced software defect prediction model with multiple metrics and learners. Int J Ind Syst Eng 22(3):358–371Google Scholar
  6. 6.
    Shatnawi R, Li W (2016) An empirical investigation of predicting fault count, fix cost and effort using software metrics. (IJACSA) Int J Adv Comput Sci Appl 7(2)Google Scholar
  7. 7.
    Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1)CrossRefGoogle Scholar
  8. 8.
    Jiang Y, Lin J, Cukic B, Menzies T (2009) Variance analysis in software fault prediction models. In: 20th international symposium of software reliability engineeringGoogle Scholar
  9. 9.
    Koru AG, Liu H (2005) An investigation of the effect of module size on defect prediction using static measures. In: Promise ’05Google Scholar
  10. 10.
    Singh P, Verma S (2014) An efficient software fault prediction model using cluster based classification. Int J Appl Inf Syst (IJAIS) 7(3)Google Scholar
  11. 11.
    Zhang H, Nelson A, Menzies T (2010) On the value of learning from defect dense components for software defect prediction. In: Promise 2010, 12–13 SeptGoogle Scholar
  12. 12.
    Jin C, Dong E-M, Qin L-N (2010) Software fault prediction model based on adaptive dynamical and median particle swarm optimization. In: Second international conference on multimedia and information technologyGoogle Scholar
  13. 13.
    Witten IH, Frank E (2005) Data mining, practical machine learning tools and techniques. Morgan Kaufmann, San FranciscoGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of MadrasChennaiIndia

Personalised recommendations