Abstract
Unsupervised feature selection is mostly assessed along a supervised learning setting, depending on whether the selected features efficiently permit to predict the (unknown) target variable. Another setting is proposed in this paper: the selected features aim to efficiently recover the whole dataset. The proposed algorithm, called AgnoS, combines an AutoEncoder with structural regularizations to sidestep the combinatorial optimization problem at the core of feature selection. The extensive experimental validation of AgnoS on the scikit-feature benchmark suite demonstrates its ability compared to the state of the art, both in terms of supervised learning and data compression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
That is, assuming with no loss of generality that
$$\mu _i < \mu _{i+1}$$one approximates the curve (\(log(1 - i/n), log(\mu _i)\)) with a linear function, the slope of which is taken as approximation of d.
- 2.
A question however is whether all latent variables are equally important. It might be that some latent variables are more important than others, and if an initial variable \(f_i\) matters a lot for an unimportant latent variable, the \(f_i\) relevance might be low. Addressing this concern is left for further work.
- 3.
References
Alemu, H., Wu, W., Zhao, J.: Feedforward neural networks with a hidden layer regularization method. Symmetry 10(10), 525 (2018)
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)
Camastra, F., Staiano, A.: Intrinsic dimension estimation: advances and open problems. Inf. Sci. 328, 26–41 (2016)
Chen, J., Stern, M., Wainwright, M.J., Jordan, M.I.: Kernel feature selection via conditional covariance minimization. In: Advances in Neural Information Processing Systems, pp. 6946–6955 (2017)
Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and Hall/CRC, Boca Raton (2000)
Doquet, G.: Unsupervised feature selection. Ph.D. thesis, Université Paris-Sud (2019, to appear)
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, Hoboken (2012)
Facco, E., d’Errico, M., Rodriguez, A., Laio, A.: Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Nature 7(1), 1–8 (2017)
Falconer, K.: Fractal Geometry: Mathematical Foundations and Applications. Wiley, Hoboken (2004)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Gneiting, T., Ševčíková, H., Percival, D.B.: Estimators of fractal dimension: assessing the roughness of time series and spatial data. Stat. Sci. 27, 247–277 (2012)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., Sebag, M.: Learning functional causal models with generative neural networks. In: Escalante, H.J., et al. (eds.) Explainable and Interpretable Models in Computer Vision and Machine Learning. TSSCML, pp. 39–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98131-4_3
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2005)
Ivanoff, S., Picard, F., Rivoirard, V.: Adaptive Lasso and group-lasso for functional poisson regression. J. Mach. Learn. Res. 17(1), 1903–1948 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
LeCun, Y.: The next frontier in AI: unsupervised learning. https://www.youtube.com/watch?v=IbjF5VjniVE (2016)
Leray, P., Gallinari, P.: Feature selection with neural networks. Behaviormetrika 26(1), 145–166 (1999). https://doi.org/10.2333/bhmk.26.145
Levina, E., Bickel, P.J.: Maximum likelihood estimation of intrinsic dimension. In: Advances in Neural Information Processing Systems, pp. 777–784 (2005)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2018)
Li, Z., Yang, Y., Liu, Y., Zhou, X. Lu,, H.: Unsupervised feature selection using non-negative spectral analysis. In: AAAI (2012)
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 (2018)
Meier, L., Van De Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat.Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)
Ng, A.Y.: Feature selection, \(l_1\) vs. \(l_2\) regularization, and rotational invariance. In: International Conference on Machine Learning (2004)
Nie, F., Zhu, W., Li, X.: Unsupervised feature selection with structured graph optimization. In: AAAI, pp. 1302–1308 (2016)
Sadeghyan, S.: A new robust feature selection method using variance-based sensitivity analysis. arXiv:1804.05092 (2018)
Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun), 119–155 (2003)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58, 267–288 (1996)
Varga, D., Csiszárik, A., Zombori, Z.: Gradient regularization improves accuracy of discriminative models. arXiv:1712.09936 (2017)
Wiatowski, T., Bölcskei, H.: A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans. Inf. Theory 64(3), 1845–1866 (2018)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2007)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: International Conference on Machine Learning (2007)
Acknowledgments
We wish to thank Diviyan Kalainathan for many enjoyable discussions. We also thank the anonymous reviewers, whose comments helped to improve the experimental setting and the assessment of the method.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Doquet, G., Sebag, M. (2020). Agnostic Feature Selection. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-46150-8_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)