Towards Benchmarking Feature Subset Selection Methods for Software Fault Prediction

  • Wasif AfzalEmail author
  • Richard Torkar
Part of the Studies in Computational Intelligence book series (SCI, volume 617)


Despite the general acceptance that software engineering datasets often contain noisy, irrelevant or redundant variables, very few benchmark studies of feature subset selection (FSS) methods on real-life data from software projects have been conducted. This paper provides an empirical comparison of state-of-the-art FSS methods: information gain attribute ranking (IG); Relief (RLF); principal component analysis (PCA); correlation-based feature selection (CFS); consistency-based subset evaluation (CNS); wrapper subset evaluation (WRP); and an evolutionary computation method, genetic programming (GP), on five fault prediction datasets from the PROMISE data repository. For all the datasets, the area under the receiver operating characteristic curve—the AUC value averaged over 10-fold cross-validation runs—was calculated for each FSS method-dataset combination before and after FSS. Two diverse learning algorithms, C4.5 and naïve Bayes (NB) are used to test the attribute sets given by each FSS method. The results show that although there are no statistically significant differences between the AUC values for the different FSS methods for both C4.5 and NB, a smaller set of FSS methods (IG, RLF, GP) consistently select fewer attributes without degrading classification accuracy. We conclude that in general, FSS is beneficial as it helps improve classification accuracy of NB and C4.5. There is no single best FSS method for all datasets but IG, RLF and GP consistently select fewer attributes without degrading classification accuracy within statistically significant boundaries.


Feature subset selection Fault prediction Empirical 


  1. 1.
    Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Softw. Eng. 8(3), 255–283 (2004)CrossRefGoogle Scholar
  2. 2.
    Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)CrossRefGoogle Scholar
  3. 3.
    Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic review of fault prediction performance in software engineering. IEEE Trans. Softw. Eng. (99) (2011)Google Scholar
  4. 4.
    Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)CrossRefGoogle Scholar
  5. 5.
    Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)CrossRefGoogle Scholar
  6. 6.
    Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37(3), 356–370 (2011)CrossRefGoogle Scholar
  7. 7.
    Foss, T., Stensrud, E., Kitchenham, B.A., Myrtveit, I.: A simulation study of the model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29(11) (2003)Google Scholar
  8. 8.
    Afzal, W., Torkar, R., Feldt, R.: Resampling methods in software quality classification. Int. J. Software Eng. Knowl. Eng. 22, 203–223 (2012)CrossRefGoogle Scholar
  9. 9.
    Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: The misuse of the NASA metrics data program data sets for automated software defect prediction. IET Semin. Dig. 1, 96–103 (2011)Google Scholar
  10. 10.
    Khoshgoftaar, T.M., Gao, K., Seliya, N.: Attribute selection and imbalanced data: Problems in software defect prediction. IEEE Computer Society, Los Alamitos, CA, USA (2010)Google Scholar
  11. 11.
    Shivaji, S., Whitehead, J.E.J, Akella, R., Kim, S. Reducing features to improve bug prediction. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE’09), IEEE Computer Society, Washington, DC, USA (2009)Google Scholar
  12. 12.
    Rodriguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J.: Detecting fault modules applying feature selection to classifiers. In: IEEE International Conference on Information Reuse and Integration (IRI’07) (2007a)Google Scholar
  13. 13.
    Rodriguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J., Garre, M.: Attribute selection in software engineering datasets for detecting fault modules. In: 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO’07) (2007b)Google Scholar
  14. 14.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)CrossRefzbMATHGoogle Scholar
  15. 15.
    Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 1437–1447 (2003)CrossRefGoogle Scholar
  16. 16.
    Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4–37 (2000)CrossRefGoogle Scholar
  17. 17.
    Chen, Z., Boehm, B., Menzies, T., Port, D.: Finding the right data for software cost modeling. IEEE Softw. 22, 38–46 (2005)CrossRefGoogle Scholar
  18. 18.
    Janecek, A., Gansterer, W., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. In: Proceedings of the 3rd Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery (FSDM’08), Microtome Publishing, Brookline, MA, USA (2008)Google Scholar
  19. 19.
    Burke, E.K., Kendall, G. (eds.): Search methodologies—Introductory tutorials in optimization and decision support techniques. Springer Science and Business Media, Inc., 233 Spring Street, New York, USA (2005)Google Scholar
  20. 20.
    Dybå, T., Kampenes, V.B., Sjøberg, D.I.: A systematic review of statistical power in software engineering experiments. Inf. Softw. Technol. 48(8), 745–755 (2006)CrossRefGoogle Scholar
  21. 21.
    Afzal, W., Torkar, R., Feldt, R., Gorschek, T.: Genetic programming for cross-release fault count predictions in large and complex software projects. In: Chis, M. (ed.) Evolutionary Computation and Optimization Algorithms in Software Engineering: Applications and Techniques, pp. 94–126. IGI Global, Hershey, USA (2009)Google Scholar
  22. 22.
    Muni, D., Pal, N., Das, J.: Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man Cybern. B Cybern. 36(1), 106–117 (2006)CrossRefGoogle Scholar
  23. 23.
    Smith, M.G., Bull. L.: Feature construction and selection using genetic programming and a genetic algorithm. In: Proceedings of the 6th European Conference on Genetic Programming (EuroGP’03), Springer-Verlag, Berlin, Heidelberg (2003)Google Scholar
  24. 24.
    Vivanco, R., Kamei, Y., Monden, A., Matsumoto, K., Jin, D.: Using search-based metric selection and oversampling to predict fault prone modules. In: 2010 23rd Canadian Conference on Electrical and Computer Engineering (CCECE’10) (2010)Google Scholar
  25. 25.
    Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell. Syst. and Their Appl. 13(2), 44–49 (1998)CrossRefGoogle Scholar
  26. 26.
    Boetticher, G., Menzies, T., Ostrand, T.: PROMISE repository of empirical software engineering data. repository, West Virginia University, Department of Computer Science (2007)
  27. 27.
    Molina, L.C., Belanche, L., Nebot, Àngela: Feature selection algorithms: a survey and experimental evaluation. Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp. 306–313. IEEE Computer Society, Washington, DC, USA (2002)CrossRefGoogle Scholar
  28. 28.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  29. 29.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(1–4), 131–156 (1997)CrossRefGoogle Scholar
  31. 31.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)CrossRefGoogle Scholar
  32. 32.
    Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Softw. Eng. 38, 375–397 (2012)CrossRefGoogle Scholar
  33. 33.
    Chen, Z., Menzies, T., Port, D., Boehm, B.: Feature subset selection can improve software cost estimation accuracy. SIGSOFT Softw. Eng. Notes 30(4), 1–6 (2005)Google Scholar
  34. 34.
    Menzies, T., Jalali, O., Hihn, J., Baker, D., Lum, K.: Stable rankings for different effort models. Autom. Softw. Eng. 17, 409–437 (2010)CrossRefGoogle Scholar
  35. 35.
    Kirsopp, C., Shepperd, M.J., Hart, J.: Search heuristics, case-based reasoning and software project effort prediction. Proceedings of the 2002 Genetic and Evolutionary Computation Conference (GECCO’02), pp. 1367–1374. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002)Google Scholar
  36. 36.
    Azzeh, M., Neagu, D., Cowling, P.: Improving analogy software effort estimation using fuzzy feature subset selection algorithm. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (PROMISE’08), ACM, New York, NY, USA (2008)Google Scholar
  37. 37.
    Li, Y., Xie, M., Goh, T.: A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Systems with Applications 36(3, Part 2):5921–5931 (2009)Google Scholar
  38. 38.
    Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)CrossRefGoogle Scholar
  39. 39.
    Catal, C., Diri, B.: Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179, 1040–1058 (2009)CrossRefGoogle Scholar
  40. 40.
    Khoshgoftaar, T.M., Seliya, N., Sundaresh, N.: An empirical study of predicting software faults with case-based reasoning. Softw. Qual. Control 14, 85–111 (2006)CrossRefGoogle Scholar
  41. 41.
    Wang, H., Khoshgoftaar, T., Gao, K., Seliya, N.: High-dimensional software engineering data and feature selection. In: 21st International Conference on Tools with Artificial Intelligence (ICTAI’09), pp. 83–90 (2009)Google Scholar
  42. 42.
    Khoshgoftaar, T.M., Nguyen, L., Gao, K., Rajeevalochanam, J.: Application of an attribute selection method to CBR-based software quality classification. In: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’03), IEEE Computer Society, Washington, DC, USA (2003)Google Scholar
  43. 43.
    Altidor, W., Khoshgoftaar, T.M., Gao, K.: Wrapper-based feature ranking techniques for determining relevance of software engineering metrics. Int. J. Reliab. Qual. Saf. Eng. 17, 425–464 (2010)CrossRefGoogle Scholar
  44. 44.
    Gao, K., Khoshgoftaar, T., Seliya, N.: Predicting high-risk program modules by selecting the right software measurements. Softw. Qual. J. 20, 3–42 (2012)CrossRefGoogle Scholar
  45. 45.
    Gao, K., Khoshgoftaar, T.M., Wang, H., Seliya, N.: Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw. Pract. Experience 41(5), 579–606 (2011)CrossRefGoogle Scholar
  46. 46.
    Khoshgoftaar, T.M., Gao, K., Napolitano, A.: An empirical study of feature ranking techniques for software quality prediction. Int. J. Softw. Eng. Knowl. Eng. (IJSEKE) 22, 161–183 (2012)CrossRefGoogle Scholar
  47. 47.
    Wang, H., Khoshgoftaar, T.M., Napolitano, A.: Software measurement data reduction using ensemble techniques. Neurocomputing 92, 124–132 (2012)CrossRefGoogle Scholar
  48. 48.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993)Google Scholar
  49. 49.
    Novakovic, J.: Using information gain attribute evaluation to classify sonar targets. In: Proceedings of the 17th Telecommunications forum (TELFOR’09) (2009)Google Scholar
  50. 50.
    Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th National Conference on Artificial Intelligence (AAAI’92) (1992)Google Scholar
  51. 51.
    Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Proceedings of the 14th International Conference on Machine Learning (ICML’97) (1997)Google Scholar
  52. 52.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 2000 International Conference on Machine Learning (ICML’00), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)Google Scholar
  53. 53.
    Liu, H., Setiono, R.: A probabilistic approach to feature selection—A filter solution. Proceedings of the 1996 International Conference on Machine Learning (ICML’96), pp. 319–327. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1996)Google Scholar
  54. 54.
    Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via and freely available at URL:, (with contributions by Koza, J.R.) (2008)
  55. 55.
    Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA (1992)zbMATHGoogle Scholar
  56. 56.
    Silva, S.: GPLAB—A genetic programming toolbox for MATLAB., Last checked: 22 Dec 2014 (2007)
  57. 57.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)CrossRefzbMATHGoogle Scholar
  58. 58.
    Rish, I.: An empirical study of the naive Bayes classifier. In: Proceedings of the workshop on empirical methods in AI (IJCAI’01) (2001)Google Scholar
  59. 59.
    Kotsiantis, S., Zaharakis, I., Pintelas, P.: Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26(3), 159–190 (2007)CrossRefGoogle Scholar
  60. 60.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  61. 61.
    Menzies, T., DiStefano, J., Orrego, A., Chapman, R.M.: Assessing predictors of software defects. In: Proceedings of the Workshop on Predictive Software Models, collocated with ICSM’04. URL: (2004)
  62. 62.
    El-Emam, K., Benlarbi, S., Goel, N., Rai, S.N.: Comparing case-based reasoning classifiers for predicting high risk software components. J. Syst. Softw. 55(3), 301–320 (2001)CrossRefGoogle Scholar
  63. 63.
    Ma, Y., Cukic, B.: Adequate and precise evaluation of quality models in software engineering studies. In: Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering (PROMISE’07), IEEE Computer Society, pp 1, Washington, DC, USA(2007)Google Scholar
  64. 64.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  65. 65.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36 (1982)Google Scholar
  66. 66.
    Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI’03) (2003)Google Scholar
  67. 67.
    Yousef, W.A., Wagner, R.F., Loew, M.H.: Comparison of non-parametric methods for assessing classifier performance in terms of ROC parameters. In: Proceedings of the 33rd Applied Imagery Pattern Recognition Workshop (AIPR’04), IEEE Computer Society, Washington, DC, USA (2004)Google Scholar
  68. 68.
    Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE’08), ACM, New York, NY, USA (2008)Google Scholar
  69. 69.
    Jiang, Y., Cukic, B., Menzies, T.: Fault prediction using early lifecycle data. In: Proceedings of the 18th IEEE International Symposium on Software Reliability (ISSRE’07), IEEE Computer Society, Washington, DC, USA (2007)Google Scholar
  70. 70.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)CrossRefGoogle Scholar
  71. 71.
    Kitchenham, B.A., Pickard, L.M., MacDonell, S., Shepperd, M.: What accuracy statistics really measure? IEE Proc. Softw. 148(3) (2001)Google Scholar
  72. 72.
    Myrtveit, I., Stensrud, E., Shepperd, M.: Reliability and validity in comparative studies of software prediction models. IEEE Trans. Softw. Eng. 31(5), 380–391 (2005)CrossRefGoogle Scholar
  73. 73.
    Langdon, W.B., Buxton, B.F.: Genetic programming for mining DNA chip data from cancer patients. Genet. Program Evolvable Mach. 5, 251–257 (2004)CrossRefGoogle Scholar
  74. 74.
    Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A.: Experimentation in software engineering: an introduction. Kluwer Academic Publishers, USA (2000)CrossRefzbMATHGoogle Scholar
  75. 75.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint conference on Artificial Intelligence (IJCAI’95), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1995)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Innovation, Design & EngineeringMälardalen UniversityVästeråsSweden
  2. 2.Blekinge Institute of TechnologyKarlskronaSweden
  3. 3.Chalmers University of TechnologyGothenburgSweden
  4. 4.University of GothenburgGothenburgSweden
  5. 5.Department of Computer ScienceBahria UniversityIslamabadPakistan

Personalised recommendations