The impact of automated feature selection techniques on the interpretation of defect models

Jiarpakdee, Jirayus; Tantithamthavorn, Chakkrit; Treude, Christoph

doi:10.1007/s10664-020-09848-1

The impact of automated feature selection techniques on the interpretation of defect models

Published: 01 August 2020

Volume 25, pages 3590–3638, (2020)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Jirayus Jiarpakdee ORCID: orcid.org/0000-0002-2907-915X¹,
Chakkrit Tantithamthavorn¹ &
Christoph Treude²

1166 Accesses
41 Citations
Explore all metrics

Abstract

The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models may be inconsistent if the selected metrics are inconsistent and correlated. In this paper, we systematically investigate 12 automated feature selection techniques with respect to the consistency, correlation, performance, computational cost, and the impact on the interpretation dimensions. Through an empirical investigation of 14 publicly-available defect datasets, we find that (1) 94–100% of the selected metrics are inconsistent among the studied techniques; (2) 37–90% of the selected metrics are inconsistent among training samples; (3) 0–68% of the selected metrics are inconsistent when the feature selection techniques are applied repeatedly; (4) 5–100% of the produced subsets of metrics contain highly correlated metrics; and (5) while the most important metrics are inconsistent among correlation threshold values, such inconsistent most important metrics are highly-correlated with the Spearman correlation of 0.85–1. Since we find that the subsets of metrics produced by the commonly-used feature selection techniques (except for AutoSpearman) are often inconsistent and correlated, these techniques should be avoided when interpreting defect models. In addition to introducing AutoSpearman which mitigates correlated metrics better than commonly-used feature selection techniques, this paper opens up new research avenues in the automated selection of features for defect models to optimise for interpretability as well as performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing Security and Performance of Software Defect Prediction Models: A Literature Review

On the assessment of software defect prediction models via ROC curves

Article Open access 19 August 2020

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

References

Agrawal A, Menzies T (2018) Is better data better than better data miners?. In: 2018 IEEE/ACM 40Th international conference on software engineering (ICSE), IEEE, pp 1050–1061
Alckmin G, Kooistra L, Lucieer A, Rawnsley R (2019) Feature filtering and selection for dry matter estimation on perrenial ryegrass: a case study of vegetation indices. International archives of the photogrammetry. Remote Sensing and Spatial Information Sciences 42(2/W13)
Alzubi R, Ramzan N, Alzoubi H, Amira A (2017) A hybrid feature selection method for complex diseases SNPs. IEEE Access 6:1292–1301. https://doi.org/10.1109/ACCESS.2017.2778268
Article Google Scholar
Arisholm E, Briand LC, Johannessen EB (2010) A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J Syst Softw 83(1):2–17
Google Scholar
Berry WD (1993) Understanding regression assumptions, vol 92. Sage Publications
Bettenburg N, Hassan AE (2010) Studying the impact of social structures on software quality. In: Proceedings of the International Conference on Program Comprehension (ICPC), pp 124–133
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: Bias in bug-fix datasets. In: Proceedings of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 121–130
Blake C, Merz C (1998) Uci repository of machine learning databases. University of California, Irvine, CA 55
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
MATH Google Scholar
Breiman L, Cutler A, Liaw A, Wiener M (2006) randomForest: Breiman and Cutler’s random forests for classification and regression. R package version 4.6-12. Software available at https://cran.r-project.org/package=randomForest
Cahill J, Hogan JM, Thomas R (2013) Predicting fault-prone software modules with rank sum classification. In: Proceedings of the Australian Software Engineering Conference (ASWEC), pp 211–219
Cai Y, Chow M, Lu W, Li L (2010) Statistical feature selection from massive data in distribution fault diagnosis. IEEE Trans Power Syst 25 (2):642–648
Google Scholar
Canfora G, De Lucia A, Di Penta M, Oliveto R, Panichella A, Panichella S (2013) Multi-objective cross-project defect prediction. In: Proceedings of the International Conference on Software Testing, Verification and Validation (ICST), pp 252–261
Chambers JM (1992) Statistical models in s wadsworth. Pacific Grove, California
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: Synthetic minority over-sampling TE chnique. J Artif Intell Res 16:321–357
MATH Google Scholar
D’Ambros M, Lanza M, Robbes R (2010) An Extensive Comparison of Bug Prediction Approaches. In: Proceedings of the International Conference on Mining Software Repositories (MSR), pp 31–41
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Emp Softw Eng (EMSE) 17(4-5):531–577
Google Scholar
Dash M, Liu H, Motoda H (2000) Consistency based Feature Selection. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp 98–109
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Springer, Boston
MATH Google Scholar
Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660
Google Scholar
Fox J (2015) Applied regression analysis and generalized linear models. Sage Publications
Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Statis Assoc (JASA) 87(417):178–183
Google Scholar
Friedman J, Hastie T, Tibshirani R (2001) The Elements of Statistical Learning, vol 1. Springer series in statistics
Fu W, Menzies T, Shen X (2016) Tuning for Software analytics: Is it really necessary?. Inf Softw Technol 76:135–146
Google Scholar
Garner SR, et al. (1995) Weka: the waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference (NZCSRSC), pp 57–64
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 789–800
Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: Proceedings of the 14th International Conference on Mining Software Repositories, pp 146–157
Gil Y, Lalouche G (2017) On the correlation between size and metric validity. Emp Softw Eng (EMSE) 22(5):2585–2611
Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
MATH Google Scholar
Hair JF, Black WC, Babin BJ, Anderson RE, Tatham RL, et al. (2006) Multivariate data analysis vol. 6
Hall MA (1999) Correlation-based feature selection for machine learning. PhD thesis, University of Waikato Hamilton
Hall MA, Smith LA (1997) Feature Subset Selection: A Correlation Based Filter Approach
Hanley J, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(4):29–36
Google Scholar
Harrell FE Jr (2013) Hmisc: Harrell miscellaneous. R package version 3.12-2. Software available at http://cran.r-project.org/web/packages/Hmisc
Harrell FE Jr (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, New Tork
MATH Google Scholar
Harrell FE Jr (2017) rms: regression modeling strategies. R package version 5.1-1. Software available at http://cran.r-project.org/web/packages/rms
Hinkle DE, Wiersma W, Jurs SG (2003) Applied statistics for the behavioral sciences, vol 663. Houghton Mifflin College Division
Hsu HH, Hsieh CW, Lu MD (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38(7):8144–8150. https://doi.org/10.1016/j.eswa.2010.12.156
Article Google Scholar
Huan L, Setiono R (1995) Chi2: feature selection and discretization of numeric attributes. In: Proceedings of the International Conference on Tools with Artificial Intelligence, pp 388–391
Jiang Y, Cukic B, Menzies T (2008) Can data transformation help in the detection of fault-prone modules?. In: Proceedings of the International Workshop on Defects in Large Software Systems (DEFECTS), pp 16–20
Jiarpakdee J, Tantithamthavorn C, Ihara A, Matsumoto K (2016) A study of redundant metrics in defect prediction datasets. In: Proceedings of the International Symposium on Software Reliability Engineering Workshops (ISSREW), pp 51–52
Jiarpakdee J, Tantithamthavorn C, Hassan AE (2018a) The Impact of Correlated Metrics on Defect Models. arXiv:180110271 p To Appear
Jiarpakdee J, Tantithamthavorn C, Treude C (2018b) AutoSpearman: automatically mitigating correlated software metrics for interpreting defect models. In: Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), pp 92–103
Jiarpakdee J, Tantithamthavorn C, Treude C (2018c) Online Appendix for Should Automated Feature Selection Techniques be Applied when Interpreting Defect Models?. https://github.com/software-analytics/autospearman-extension-appendix
Jiarpakdee J, Tantithamthavorn C, Dam HK, Grundy J (2020) An empirical study of model-agnostic techniques for defect prediction models. Trans Softw Eng (TSE) 1–1
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the International Conference on Predictive Models in Software Engineering (PROMISE), p 9
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A Large-Scale empirical study of Just-In-Time quality assurance. Trans Softw Eng (TSE) 39(6):757–773
Google Scholar
Kaur A, Malhotra R (2008) Application of Random Forest in Predicting Fault-prone Classes. In: Proceedings of International Conference on the Advanced Computer Theory and Engineering (ICACTE), pp 37–43
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 481–490
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
MATH Google Scholar
Koru AG, Liu H (2005) An investigation of the effect of module size on defect prediction using static measures. Softw Eng Notes (SEN) 30:1–5
Google Scholar
Kraemer HC, Morgan GA, Leech NL, Gliner JA, Vaske JJ, Harmon RJ (2003) Measures of clinical significance. J Am Academy Child Adolescent Psychiat(JAACAP) 42(12):1524–1529
Google Scholar
Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, Team R, et al. (2017) Caret: Classification and regression training. R package version 6.0–78. Software available at https://cran.r-project.org/package=caret
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. Trans Softw Eng (TSE) 34(4):485–496
Google Scholar
Lewis DD, Ringuette M (1994) A comparison of two learning algorithms for text categorization. In: Annual Sympos Document Anal Inform Retrieval, vol 33, pp 81–93
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Computing Surveys (CSUR) 50(6):94
Google Scholar
Lu H, Kocaguneli E, Cukic B (2014) Defect Prediction between Software Versions with Active Learning and Dimensionality Reduction. In: Proceedings of the International Symposium on Software Reliability Engineering (ISSRE), pp 312–322
Mason CH, Perreault WD Jr (1991) Collinearity, power, and interpretation of multiple regression analysis. J Market Res (JMR) 268–280
Matthews BW (1975) Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure 405(2):442–451
Google Scholar
McHugh ML (2013) The Chi-square Test of Independence. Biochemia Medica 23(2):143–149
Google Scholar
McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality. In: Proceedings of the International Conference on Mining Software Repositories (MSR), pp 192–201
Mende T (2010) Replication of defect prediction studies: problems, pitfalls and recommendations. In: Proceedings of the International Conference on Predictive Models in Software Engineering (PROMISE), pp 1–10
Mende T, Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the International Conference on Predictive Models in Software Engineering (PROMISE), pp 7–16
Menzies T (2018) The unreasonable effectiveness of software analytics. IEEE Softw 35(2):96–98. https://doi.org/10.1109/MS.2018.1661323
Article Google Scholar
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. Trans Softw Eng (TSE) 33(1):2–13
Google Scholar
Menzies T, Caglayan B, Kocaguneli E, Krall J, Peters f, Turhan B (2012) The Promise Repository of Empirical Software Engineering Data
Mersmann O, Beleites C, Hurling R, Friedman A (2018) Microbenchmark: Accurate Timing Functions. R package version 1.4-6. Software available at https://cran.r-project.org/package=microbenchmark
Mitchell TM (1997) Machine Learning. McGraw Hill
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 181–190
Nam J, Fu W, Kim S, Menzies T, Tan L (2017) Heterogeneous Defect Prediction. Transactions on Software Engineering (TSE) p In Press
Okutan A, Yıldız OT (2014) Software defect prediction using bayesian networks. Empirical Softw Eng (EMSE) 19(1):154–181
Google Scholar
Osman H, Ghafari M, Nierstrasz O (2018) The impact of feature selection on predicting the number of bugs. arXiv:180704486
Pandari Y, Thangavel P, Senthamaraikannan H, Jagadeeswaran S (2019) HybridFS: a hybrid filter-wrapper feature selection method. R package version 0.1.3. Software available at https://cran.r-project.org/package=HybridFS
Petrić J, Bowes D, Hall T, Christianson B, Baddoo N (2016) The jinx on the nasa software defect data sets. In: Proceedings of the International Conference on Evaluation and Assessment in Software Engineering (EASE), pp 13–17
Rahman F, Devanbu P (2013) How, and Why, process metrics are better. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 432–441
Robles G (2010) Replicating MSR: A Study of the Potential Replicability of Papers Published in the Mining Software Repositories Proceedings. In: Proceedings of the International Conference on Mining Software Repositories (MSR), pp 171–180
Rodríguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J (2007) Detecting fault modules applying feature selection to classifiers. In: Proceedings of the International Conference on Information Reuse and Integration (IRI), pp 667–672
Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and Cohen’s d for evaluating group differences on the NSSE and other surveys. In: Annual meeting of the Florida Association of Institutional Research (FAIR), pp 1–33
Romanski P, Kotthoff L (2013) FSelector: Selecting attributes. R package version 0.19. Software available at https://cran.r-project.org/package=FSelector
Shepperd M, Song Q, Sun Z, Mair C (2013) Data quality: some comments on the NASA software defect datasets. Trans Softw Eng (TSE) 39(9):1208–1215
Google Scholar
Shepperd M, Bowes D, Hall T (2014) Researcher bias: the use of machine learning in software defect prediction. Trans Softw Eng (TSE) 40(6):603–616
Google Scholar
Shivaji S, Whitehead EJ, Akella R, Kim S (2013) Reducing features to improve code Change-Based bug prediction. Trans Softw Eng (TSE) 39 (4):552–569
Google Scholar
Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9 (1):307
Google Scholar
Tantithamthavorn C (2017) ScottKnottESD: The Scott-Knott Effect Size Difference (ESD) Test. R package version 2.0. Software available at https://cran.r-project.org/web/packages/ScottKnottESD
Tantithamthavorn C, Hassan AE (2018) An experience report on defect modelling in practice: pitfalls and challenges
Tantithamthavorn C, Jiarpakdee J (2018) Rnalytica: An R package of the Miscellaneous Functions for Data Analytics Research. https://github.com/software-analytics/Rnalytica
Tantithamthavorn C, McIntosh S, Hassan AE, Ihara A, Matsumoto K (2015) The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proceeding of the International Conference on Software Engineering (ICSE), pp 812–823
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016a) Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the International Conference on Software Engineering (ICSE), pp 321–332
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016b) Comments on bias: the use of machine learning in software defect prediction. Trans Softw Eng (TSE) 42(11):1092–1094
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction models. Trans Softw Eng (TSE) 43(1):1–18
Google Scholar
Tantithamthavorn C, Hassan AE, Matsumoto K (2019a) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. Trans Softw Eng (TSE) p Preprints
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2019b) The impact of automated parameter optimization on defect prediction models. Trans Softw Eng (TSE) 45(7):683–711
Team RC (2017) contributors worldwide Stats: The R Stats Package. R Package. Version 3.4.0
Tian Y, Nagappan M, Lo D, Hassan AE (2015) What Are the Characteristics of High-Rated Apps? A Case Study on Free Android Applications. In: Proceedings of the International Conference on Software Maintenance and Evolution (ICSME), pp 301–310
Torgo L, Torgo ML (2015) DMwR: Functions and Data for Data Mining with R. R package version 0.4.1. Software available at https://cran.r-project.org/package=DMwR
Tosun A, Bener A (2009) Reducing false alarms in software defect prediction by decision threshold optimization. In: Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM), pp 477–480
Xu Z, Liu J, Yang Z, An G, Jia X (2016) The Impact of Feature Selection on Defect Prediction Performance: An Empirical Comparison. In: Proceedings of the International Symposium on Software Reliability Engineering (ISSRE), pp 309–320
Yan K, Zhang D (2015) Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sensors Actuators B: Chemical 212:353–363
Google Scholar
Yathish S, Jiarpakdee J, Thongtanunam P, Tantithamthavorn C (2019) Mining software defects should we consider affected releases?. In: In Proceedings of the International Conference on Software Engineering (ICSE), p To Appear
Zhang F, Hassan AE, McIntosh S, Zou Y (2017) The use of summation to aggregate software metrics hinders the performance of defect prediction models. Trans Softw Eng (TSE) 43(5):476–491
Google Scholar
Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the International Workshop on Predictor Models in Software Engineering (PROMISE), pp 9–19
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project Defect Prediction. In: Proceedings of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 91–100

Download references

Acknowledgements

C. Tantithamthavorn is supported by the Australian Research Council’s Discovery Early Career Researcher Award (DECRA) funding scheme (DE200100941). C. Treude is supported by the Australian Research Council’s Discovery Early Career Researcher Award (DECRA) funding scheme (DE180100153).

Author information

Authors and Affiliations

Faculty of Information and Technology, Monash University, Melbourne, Australia
Jirayus Jiarpakdee & Chakkrit Tantithamthavorn
School of Computer Science, The University of Adelaide, Adelaide, Australia
Christoph Treude

Authors

Jirayus Jiarpakdee
View author publications
You can also search for this author in PubMed Google Scholar
Chakkrit Tantithamthavorn
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Treude
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jirayus Jiarpakdee.

Additional information

Communicated by: Tim Menzies

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiarpakdee, J., Tantithamthavorn, C. & Treude, C. The impact of automated feature selection techniques on the interpretation of defect models. Empir Software Eng 25, 3590–3638 (2020). https://doi.org/10.1007/s10664-020-09848-1

Download citation

Published: 01 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10664-020-09848-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of automated feature selection techniques on the interpretation of defect models

Abstract

Access this article

Similar content being viewed by others

Enhancing Security and Performance of Software Defect Prediction Models: A Literature Review

On the assessment of software defect prediction models via ROC curves

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The impact of automated feature selection techniques on the interpretation of defect models

Abstract

Access this article

Similar content being viewed by others

Enhancing Security and Performance of Software Defect Prediction Models: A Literature Review

On the assessment of software defect prediction models via ROC curves

An Open-Source Software Metric Tool for Defect Prediction, Its Case Study and Lessons We Learned

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation