Skip to main content

Employing an Adjusted Stability Measure for Multi-criteria Model Fitting on Data Sets with Similar Features

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2021)

Abstract

Fitting models with high predictive accuracy that include all relevant but no irrelevant or redundant features is a challenging task on data sets with similar (e.g. highly correlated) features. We propose the approach of tuning the hyperparameters of a predictive model in a multi-criteria fashion with respect to predictive accuracy and feature selection stability. We evaluate this approach based on both simulated and real data sets and we compare it to the standard approach of single-criteria tuning of the hyperparameters as well as to the state-of-the-art technique “stability selection”. We conclude that our approach achieves the same or better predictive performance compared to the two established approaches. Considering the stability during tuning does not decrease the predictive accuracy of the resulting models. Our approach succeeds at selecting the relevant features while avoiding irrelevant or redundant features. The single-criteria approach fails at avoiding irrelevant or redundant features and the stability selection approach fails at selecting enough relevant features for achieving acceptable predictive accuracy. For our approach, for data sets with many similar features, the feature selection stability must be evaluated with an adjusted stability measure, that is, a measure that considers similarities between features. For data sets with only few similar features, an unadjusted stability measure suffices and is faster to compute.

The R source code for all analyses presented in this paper is publicly available at https://github.com/bommert/model-fitting-similar-features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bommert, A.: Integration of Feature Selection Stability in Model Fitting. Ph.D. thesis, TU Dortmund University, Germany (2020)

    Google Scholar 

  2. Bommert, A., Rahnenführer, J.: Adjusted measures for feature selection stability for data sets with similar features. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12565, pp. 203–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64583-0_19

    Chapter  Google Scholar 

  3. Bommert, A., Rahnenführer, J., Lang, M.: A multicriteria approach to find predictive and sparse models with stable feature selection for high-dimensional data. Comput. Math. Methods Med. 2017, 7907163 (2017)

    Article  MATH  Google Scholar 

  4. Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13, 27–66 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1, 131–156 (1997)

    Article  Google Scholar 

  6. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, University of Waikato, Hamilton, New Zealand (1999)

    Google Scholar 

  7. Hazimeh, H., Mazumder, R.: Fast best subset selection: coordinate descent and local combinatorial optimization algorithms. Oper. Res. Art. Adv. 68, 1526–5463 (2020)

    MathSciNet  MATH  Google Scholar 

  8. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1), 95–116 (2007)

    Article  Google Scholar 

  9. Meinshausen, N., Bühlmann, P.: Stability selection. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 72(4), 417–473 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Miettinen, K.: Introduction to multiobjective optimization: noninteractive approaches. In: Branke, J., Deb, K., Miettinen, K., Słowiński, R. (eds.) Multiobjective Optimization. LNCS, vol. 5252, pp. 1–26. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88908-3_1

    Chapter  Google Scholar 

  11. Shah, R.D., Samworth, R.J.: Variable selection with error control: another look at stability selection. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 75(1), 55–80 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  12. Vanschoren, J., Van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2013)

    Article  Google Scholar 

  13. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was supported by German Research Foundation (DFG), Project RA 870/7-1 and Collaborative Research Center SFB 876, A3. We acknowledge the computing time provided on the Linux HPC cluster at TU Dortmund University (LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German Research Foundation (DFG) as Project 271512359.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Bommert .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bommert, A., Rahnenführer, J., Lang, M. (2022). Employing an Adjusted Stability Measure for Multi-criteria Model Fitting on Data Sets with Similar Features. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95467-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95466-6

  • Online ISBN: 978-3-030-95467-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics