Skip to main content

Agnostic Feature Selection

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Abstract

Unsupervised feature selection is mostly assessed along a supervised learning setting, depending on whether the selected features efficiently permit to predict the (unknown) target variable. Another setting is proposed in this paper: the selected features aim to efficiently recover the whole dataset. The proposed algorithm, called AgnoS, combines an AutoEncoder with structural regularizations to sidestep the combinatorial optimization problem at the core of feature selection. The extensive experimental validation of AgnoS on the scikit-feature benchmark suite demonstrates its ability compared to the state of the art, both in terms of supervised learning and data compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    That is, assuming with no loss of generality that

    $$\mu _i < \mu _{i+1}$$

    one approximates the curve (\(log(1 - i/n), log(\mu _i)\)) with a linear function, the slope of which is taken as approximation of d.

  2. 2.

    A question however is whether all latent variables are equally important. It might be that some latent variables are more important than others, and if an initial variable \(f_i\) matters a lot for an unimportant latent variable, the \(f_i\) relevance might be low. Addressing this concern is left for further work.

  3. 3.

    The estimator from Facco et al.  (2017) was used as this estimator is empirically less computationally expensive, requires less datapoints to be accurate, and is more resilient to high-dimensional noise than other ID estimators (Sect. 2.2).

References

  • Alemu, H., Wu, W., Zhao, J.: Feedforward neural networks with a hidden layer regularization method. Symmetry 10(10), 525 (2018)

    Article  Google Scholar 

  • Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)

    Google Scholar 

  • Camastra, F., Staiano, A.: Intrinsic dimension estimation: advances and open problems. Inf. Sci. 328, 26–41 (2016)

    Article  MATH  Google Scholar 

  • Chen, J., Stern, M., Wainwright, M.J., Jordan, M.I.: Kernel feature selection via conditional covariance minimization. In: Advances in Neural Information Processing Systems, pp. 6946–6955 (2017)

    Google Scholar 

  • Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and Hall/CRC, Boca Raton (2000)

    Book  MATH  Google Scholar 

  • Doquet, G.: Unsupervised feature selection. Ph.D. thesis, Université Paris-Sud (2019, to appear)

    Google Scholar 

  • Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)

  • Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)

    MATH  Google Scholar 

  • Facco, E., d’Errico, M., Rodriguez, A., Laio, A.: Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Nature 7(1), 1–8 (2017)

    Google Scholar 

  • Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2004)

    MATH  Google Scholar 

  • Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

    Google Scholar 

  • Gneiting, T., Ševčíková, H., Percival, D.B.: Estimators of fractal dimension: assessing the roughness of time series and spatial data. Stat. Sci. 27, 247–277 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)

    Google Scholar 

  • Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., Sebag, M.: Learning functional causal models with generative neural networks. In: Escalante, H.J., et al. (eds.) Explainable and Interpretable Models in Computer Vision and Machine Learning. TSSCML, pp. 39–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4_3

    Chapter  Google Scholar 

  • He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2005)

    Google Scholar 

  • Ivanoff, S., Picard, F., Rivoirard, V.: Adaptive Lasso and group-lasso for functional poisson regression. J. Mach. Learn. Res. 17(1), 1903–1948 (2016)

    MathSciNet  MATH  Google Scholar 

  • Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

    Google Scholar 

  • Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)

  • LeCun, Y.: The next frontier in AI: unsupervised learning. https://www.youtube.com/watch?v=IbjF5VjniVE (2016)

  • Leray, P., Gallinari, P.: Feature selection with neural networks. Behaviormetrika 26(1), 145–166 (1999). https://doi.org/10.2333/bhmk.26.145

    Article  Google Scholar 

  • Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2005)

    Google Scholar 

  • Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2018)

    Article  Google Scholar 

  • Li, Z., Yang, Y., Liu, Y., Zhou, X. Lu,, H.: Unsupervised feature selection using non-negative spectral analysis. In: AAAI (2012)

    Google Scholar 

  • Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  • Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)

    MATH  Google Scholar 

  • McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 (2018)

  • Meier, L., Van De Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat.Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Ng, A.Y.: Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: International Conference on Machine Learning (2004)

    Google Scholar 

  • Nie, F., Zhu, W., Li, X.: Unsupervised feature selection with structured graph optimization. In: AAAI, pp. 1302–1308 (2016)

    Google Scholar 

  • Sadeghyan, S.: A new robust feature selection method using variance-based sensitivity analysis. arXiv:1804.05092 (2018)

  • Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun), 119–155 (2003)

    MathSciNet  MATH  Google Scholar 

  • Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  • Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Article  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Varga, D., Csiszárik, A., Zombori, Z.: Gradient regularization improves accuracy of discriminative models. arXiv:1712.09936 (2017)

  • Wiatowski, T., Bölcskei, H.: A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Inf. Theory 64(3), 1845–1866 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: International Conference on Machine Learning (2007)

    Google Scholar 

Download references

Acknowledgments

We wish to thank Diviyan Kalainathan for many enjoyable discussions. We also thank the anonymous reviewers, whose comments helped to improve the experimental setting and the assessment of the method.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Doquet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Doquet, G., Sebag, M. (2020). Agnostic Feature Selection. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics