Abstract
Feature selection is one of the major challenges in machine learning. In this paper, we focus on mutual information based methods, which attracted a significant attention in recent years. A clear limitation of the most existing methods is that they usually take into account only low-order interactions between features (up to 3rd order). We propose a novel criterion which takes into account both 3-way and 4-way interactions and can be naturally extended to the case of higher order terms. The basic component of our criterion is interaction information which is a measure of interaction strength derived from information theory. We show that our method is able to find interactions which remain undetected when using standard methods. We prove some theoretical properties of the introduced criterion and interaction information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Li, J., et al.: Feature selection: a data perspective. J. Mach. Learn. Res. 1–73 (2016)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Taylor, M.B., Ehrenreich, I.M.: Higher-order genetic interactions and their contribution to complex traits. Trends Genet. 31(1), 34–40 (2015)
Lin, D., Tang, X.: Conditional infomax learning: an integrated framework for feature extraction and fusion. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 68–82. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_6
Kozachenko, L., Leonenko, N.: Sample estimate of the entropy of a random vector. Problemy Peredachi Informatsii 23(2), 9–16 (1987)
Jakulin, A., Bratko, I.: Quantifying and visualizing attribute interactions: an approach based on entropy. Manuscript (2004)
Han, T.S.: Multiple mutual informations and multiple interactions in frequency data. Inf. Control 46(1), 26–45 (1980)
McGill, W.J.: Multivariate information transmission. Psychometrika 19(2), 97–116 (1954)
Kojadinovic, I.: Relevance measures for subset variable selection in regression problems based on k-additive mutual information. Comput. Stat. Data Anal. 49(4), 1205–1227 (2005)
Meyer, P., Schretter, C., Bontempi, G.: Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Sel. Top. Sig. Process. 2(3), 261–274 (2008)
Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24(1), 175–186 (2014)
Brown, G.: A new perspective for information theoretic feature selection. In: Twelfth International Conference on Artificial Intelligence and Statistics, AISTATS-2009, pp. 49–56 (2009)
Moore, J., et al.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241(2), 256–261 (2006)
Mielniczuk, J., Teisseyre, P.: A deeper look at two concepts of measuring genegene interactions: logistic regression and interaction information revisited. Genet. Epidemiol. 42(2), 187–200 (2018)
Matsuda, H.: Physical nature of higher-order mutual information: intrinsic correlations and frustration. Phys. Rev. E 62(3 A), 3096–3102 (2000)
Brown, G., Pocock, A., Zhao, M.J., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13(1), 27–66 (2012)
Fleuret, F.: Fast binary feature selection with conditional mutual information. J. Mach. Learn. Res. 5, 1531–1555 (2004)
Battiti, R.: Using mutual information for selecting features in supervised neural-net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Yang, H.H., Moody, J.: Data visualization and feature selection: new algorithms for nongaussian data. Adv. Neural Inf. Process. Syst. 12, 687–693 (1999)
Guyon, I.: Design of experiments for the NIPS 2003 variable selection benchmark (2003)
Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017)
Shishkin, A., Bezzubtseva, A., Drutsa, A.: Efficient high-order interaction-aware feature selection based on conditional mutual information. In: Advances in Neural Information Processing Systems, NIPS, pp. 1–9 (2016)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pawluk, M., Teisseyre, P., Mielniczuk, J. (2019). Information-Theoretic Feature Selection Using High-Order Interactions. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-13709-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)