Agnostic Feature Selection

Doquet, Guillaume; Sebag, Michèle

doi:10.1007/978-3-030-46150-8_21

Guillaume Doquet¹⁴ &
Michèle Sebag¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2273 Accesses
6 Citations

Abstract

Unsupervised feature selection is mostly assessed along a supervised learning setting, depending on whether the selected features efficiently permit to predict the (unknown) target variable. Another setting is proposed in this paper: the selected features aim to efficiently recover the whole dataset. The proposed algorithm, called AgnoS, combines an AutoEncoder with structural regularizations to sidestep the combinatorial optimization problem at the core of feature selection. The extensive experimental validation of AgnoS on the scikit-feature benchmark suite demonstrates its ability compared to the state of the art, both in terms of supervised learning and data compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
That is, assuming with no loss of generality that
$$\mu _i < \mu _{i+1}$$
one approximates the curve ($log(1 - i/n), log(\mu _i)$) with a linear function, the slope of which is taken as approximation of d.
2.
A question however is whether all latent variables are equally important. It might be that some latent variables are more important than others, and if an initial variable $f_i$ matters a lot for an unimportant latent variable, the $f_i$ relevance might be low. Addressing this concern is left for further work.
3.
The estimator from Facco et al. (2017) was used as this estimator is empirically less computationally expensive, requires less datapoints to be accurate, and is more resilient to high-dimensional noise than other ID estimators (Sect. 2.2).

References

Alemu, H., Wu, W., Zhao, J.: Feedforward neural networks with a hidden layer regularization method. Symmetry 10(10), 525 (2018)
Article Google Scholar
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)
Google Scholar
Camastra, F., Staiano, A.: Intrinsic dimension estimation: advances and open problems. Inf. Sci. 328, 26–41 (2016)
Article MATH Google Scholar
Chen, J., Stern, M., Wainwright, M.J., Jordan, M.I.: Kernel feature selection via conditional covariance minimization. In: Advances in Neural Information Processing Systems, pp. 6946–6955 (2017)
Google Scholar
Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and Hall/CRC, Boca Raton (2000)
Book MATH Google Scholar
Doquet, G.: Unsupervised feature selection. Ph.D. thesis, Université Paris-Sud (2019, to appear)
Google Scholar
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)
MATH Google Scholar
Facco, E., d’Errico, M., Rodriguez, A., Laio, A.: Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Nature 7(1), 1–8 (2017)
Google Scholar
Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2004)
MATH Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Gneiting, T., Ševčíková, H., Percival, D.B.: Estimators of fractal dimension: assessing the roughness of time series and spatial data. Stat. Sci. 27, 247–277 (2012)
Article MathSciNet MATH Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
Google Scholar
Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., Sebag, M.: Learning functional causal models with generative neural networks. In: Escalante, H.J., et al. (eds.) Explainable and Interpretable Models in Computer Vision and Machine Learning. TSSCML, pp. 39–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4_3
Chapter Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2005)
Google Scholar
Ivanoff, S., Picard, F., Rivoirard, V.: Adaptive Lasso and group-lasso for functional poisson regression. J. Mach. Learn. Res. 17(1), 1903–1948 (2016)
MathSciNet MATH Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
LeCun, Y.: The next frontier in AI: unsupervised learning. https://www.youtube.com/watch?v=IbjF5VjniVE (2016)
Leray, P., Gallinari, P.: Feature selection with neural networks. Behaviormetrika 26(1), 145–166 (1999). https://doi.org/10.2333/bhmk.26.145
Article Google Scholar
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2005)
Google Scholar
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2018)
Article Google Scholar
Li, Z., Yang, Y., Liu, Y., Zhou, X. Lu,, H.: Unsupervised feature selection using non-negative spectral analysis. In: AAAI (2012)
Google Scholar
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Article MathSciNet Google Scholar
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 (2018)
Meier, L., Van De Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat.Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Article MathSciNet MATH Google Scholar
Ng, A.Y.: Feature selection, $l_1$ vs. $l_2$ regularization, and rotational invariance. In: International Conference on Machine Learning (2004)
Google Scholar
Nie, F., Zhu, W., Li, X.: Unsupervised feature selection with structured graph optimization. In: AAAI, pp. 1302–1308 (2016)
Google Scholar
Sadeghyan, S.: A new robust feature selection method using variance-based sensitivity analysis. arXiv:1804.05092 (2018)
Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun), 119–155 (2003)
MathSciNet MATH Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Article MathSciNet Google Scholar
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Varga, D., Csiszárik, A., Zombori, Z.: Gradient regularization improves accuracy of discriminative models. arXiv:1712.09936 (2017)
Wiatowski, T., Bölcskei, H.: A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Inf. Theory 64(3), 1845–1866 (2018)
Article MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2007)
Article MathSciNet MATH Google Scholar
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: International Conference on Machine Learning (2007)
Google Scholar

Download references

Acknowledgments

We wish to thank Diviyan Kalainathan for many enjoyable discussions. We also thank the anonymous reviewers, whose comments helped to improve the experimental setting and the assessment of the method.

Author information

Authors and Affiliations

TAU, CNRS – INRIA – LRI – Université Paris-Saclay, Paris, France
Guillaume Doquet & Michèle Sebag

Authors

Guillaume Doquet
View author publications
You can also search for this author in PubMed Google Scholar
Michèle Sebag
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Doquet .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doquet, G., Sebag, M. (2020). Agnostic Feature Selection. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_21
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)