Effect of Feature Selection in Software Fault Detection

  • Shamse Tasnim Cynthia
  • Md. Golam Rasul
  • Shamim RiponEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11909)


The quality of software is enormously affected by the faults associated with it. Detection of faults at a proper stage in software development is a challenging task and plays a vital role in the quality of the software. Machine learning is, now a days, a commonly used technique for fault detection and prediction. However, the effectiveness of the fault detection mechanism is impacted by the number of attributes in the publicly available datasets. Feature selection is the process of selecting a subset of all the features that are most influential to the classification and it is a challenging task. This paper thoroughly investigates the effect of various feature selection techniques on software fault classification by using NASA’s some benchmark publicly available datasets. Various metrics are used to analyze the performance of the feature selection techniques. The experiment discovers that the most important and relevant features can be selected by the adopted feature selection techniques without sacrificing the performance of fault detection.


Fault detection Feature selection Feature classification 


  1. 1.
    Agarwal, S., Tomar, D.: A feature selection based model for software defect prediction. Int. J. Adv. Sci. Technol. 65, 39–58 (2014)CrossRefGoogle Scholar
  2. 2.
    Anbu, M., Anandha Mala, G.S.: Feature selection using firefly algorithm in software defect prediction. Cluster Comput., 1–10 (2017)Google Scholar
  3. 3.
    Arasteh, B.: Software fault-prediction using combination of neural network and Naive Bayes algorithm. J. Netw. Technol. 9(3), 94 (2018)CrossRefGoogle Scholar
  4. 4.
    Chen, X., Shen, Y., Cui, Z., Ju, X.: Applying feature selection to software defect prediction using multi-objective optimization. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), pp. 54–59. IEEE, July 2017Google Scholar
  5. 5.
    Crack, T.F.: A note on Karl Pearson’s 1900 Chi-squared test: two derivations of the asymptotic distribution, and uses in goodness of fit and contingency tests of independence, and a comparison with the exact sample variance chi-square result. SSRN Electron. J. (2018)Google Scholar
  6. 6.
    Akalya Devi, C., Surendiran, B., Kannammal, K.E.: A study of feature selection methods for software fault prediction model. In: Proceedings of the International Conference on Network, Intelligence and Computing Technologies (ICNICT 2011), Tamil Nadu, India, pp. 1–5 (2011)Google Scholar
  7. 7.
    Fawagreh, K., Gaber, M.M., Elyan, E.: Random forests: from early developments to recent advancements. Syst. Sci. Control Eng. 2(1), 602–609 (2014)CrossRefGoogle Scholar
  8. 8.
    Felix, E.A., Lee, S.P.: Integrated approach to software defect prediction. IEEE Access 5, 21524–21547 (2017)CrossRefGoogle Scholar
  9. 9.
    Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: The misuse of the NASA metrics data program data sets for automated software defect prediction. In: 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011), pp. 96–103. IET (2011)Google Scholar
  10. 10.
    Ibrahim, D.R., Ghnemat, R., Hudaib, A.: Software defect prediction using feature selection and random forest algorithm. In: 2017 International Conference on New Trends in Computing Sciences (ICTCS), pp. 252–257. IEEE, October 2017Google Scholar
  11. 11.
    Jakhar, A.K., Rajnish, K.: Software fault prediction with data mining techniques by using feature selection based models. Int. J. Electr. Eng. Inf. 10(3), 447–465 (2018)Google Scholar
  12. 12.
    Jia, L.: A hybrid feature selection method for software defect prediction. IOP Conf. Ser. Mater. Sci. Eng. 394(3), 032035 (2018)CrossRefGoogle Scholar
  13. 13.
    Jiang, Y., Li, M., Zhou, Z.-H.: Software defect detection with ROCUS. J. Comput. Sci. Technol. 26(2), 328–342 (2011)CrossRefGoogle Scholar
  14. 14.
    Kakkar, M., Jain, S.: Feature selection in software defect prediction: a comparative study. In 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence), pp. 658–663. IEEE, January 2016Google Scholar
  15. 15.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256 (1992)CrossRefGoogle Scholar
  16. 16.
    McHugh, M.L.: The Chi-square test of independence. Biochemia Medica, 143–149 (2013)Google Scholar
  17. 17.
    Mishra, M., Srivastava, M.: A view of artificial neural network. In: 2014 International Conference on Advances in Engineering & Technology Research (ICAETR - 2014), pp. 1–3. IEEE, August 2014Google Scholar
  18. 18.
    Nugroho, A., Chaudron, M.R.V., Arisholm, E.: Assessing UML design metrics for predicting fault-prone classes in a Java system. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 21–30. IEEE, May 2010Google Scholar
  19. 19.
    Joanne Peng, C.-Y., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 3–14 (2002)CrossRefGoogle Scholar
  20. 20.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  21. 21.
    Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)CrossRefGoogle Scholar
  22. 22.
    Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the NASA software defect data sets. 2010(9), 1–13 (2013)Google Scholar
  23. 23.
    Singhal, R., Rana, R.: Chi-square test and its application in hypothesis testing. J. Pract. Cardiovasc. Sci. 1(1), 69 (2015)CrossRefGoogle Scholar
  24. 24.
    Son, L.H., et al.: Empirical study of software defect prediction: a systematic mapping. Symmetry 11(2) (2019)CrossRefGoogle Scholar
  25. 25.
    Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Software Eng. 37(3), 356–370 (2011)CrossRefGoogle Scholar
  26. 26.
    Wahono, R.S., Herman, N.S.: Genetic feature selection for software defect prediction. Adv. Sci. Lett. 20(1), 239–244 (2014)CrossRefGoogle Scholar
  27. 27.
    Webb, G.I., Keogh, E., Miikkulainen, R., Sebag, M.: Naïve Bayes. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 713–714. Springer, Boston (2011). Scholar
  28. 28.
    Xu, Z., Xuan, J., Liu, J., Cui, X.: MICHAC: defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 370–381. IEEE, March 2016Google Scholar
  29. 29.
    Yousef, A.H.: Extracting software static defect models using data mining. Ain Shams Eng. J. 6(1), 133–144 (2015)CrossRefGoogle Scholar
  30. 30.
    Qiao, Y., Jiang, S., Wang, R., Wang, H.: A feature selection approach based on a similarity measure for software defect prediction. Front. Inf. Technol. Electron. Eng. 18(11), 1744–1753 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Shamse Tasnim Cynthia
    • 1
  • Md. Golam Rasul
    • 1
  • Shamim Ripon
    • 1
    Email author
  1. 1.Department of Computer Science and EngineeringEast West UniversityDhakaBangladesh

Personalised recommendations