Abstract
Feature selection is a fundamental preprocessing step for many machine learning and pattern recognition systems. Notably, some mutual-information-based and correlation-based feature selection problems can be formulated as fractional programs with a single ratio of polynomial 0–1 functions. In this paper, we study approaches that ensure globally optimal solutions for these feature selection problems. We conduct computational experiments with several real datasets and report encouraging results. The considered solution methods perform well for medium- and reasonably large-sized datasets, where the existing mixed-integer linear programs from the literature fail.
Similar content being viewed by others
References
Ahuja, R. K., Magnanti, T. L., & Orlin, J. B. (1993). Network flows: Theory, algorithms, and applications. London: Pearson Education.
Asuncion, A. & Newman, D. (2007). UCI machine learning repository. https://archive.ics.uci.edu. Accessed August 2020.
Atamtürk, A. & Gómez, A. (2020). Safe screening rules for l0-regression from perspective relaxations. In International conference on machine learning (pp. 421–430). PMLR.
Borrero, J. S., Gillen, C., & Prokopyev, O. A. (2017). Fractional 0–1 programming: Applications and algorithms. Journal of Global Optimization, 69(1), 255–282.
Brown, G., Pocock, A., Zhao, M.-J., & Luján, M. (2012). Conditional likelihood maximisation: A unifying framework for information theoretic feature selection. The Journal of Machine Learning Research, 13(1), 27–66.
Busygin, S., Prokopyev, O. A., & Pardalos, P. M. (2005). Feature selection for consistent biclustering via fractional 0–1 programming. Journal of Combinatorial Optimization, 10(1), 7–21.
Busygin, S., Prokopyev, O., & Pardalos, P. M. (2008). Biclustering in data mining. Computers & Operations Research, 35(9), 2964–2987.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.
Chang, C.-T. (2001). On the polynomial mixed 0–1 fractional programming problems. European Journal of Operational Research, 131(1), 224–227.
Cilia, N. D., De Stefano, C., Fontanella, F., & di Freca, A. S. (2019). A ranking-based feature selection approach for handwritten character recognition. Pattern Recognition Letters, 121, 77–86.
Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3(02), 185–205.
Dinkelbach, W. (1967). On nonlinear fractional programming. Management Science, 13(7), 492–498.
El Ghaoui, L., Viallon, V., & Rabbani, T. (2010). Safe feature elimination for the lasso and sparse supervised learning problems. arXiv preprint arXiv:1009.4219.
Fan, Y.-J., & Chaovalitwongse, W. A. (2010). Optimizing feature selection to improve medical diagnosis. Annals of Operations Research, 174(1), 169–183.
Glover, F., & Woolsey, E. (1974). Converting the 0–1 polynomial programming problem to a 0–1 linear program. Operations Research, 22(1), 180–182.
Gómez, A., & Prokopyev, O. A. (2020). A mixed-integer fractional optimization approach to best subset selection. INFORMS Journal on Computing. Accepted for publication.
Gurobi (2018). Gurobi optimizer reference manual v. 8. http://www.gurobi.com. Accessed August 2020.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar), 1157–1182.
Hall, M. A. (1999). Correlation-based feature selection for machine learning. PhD thesis, University of Waikato Hamilton.
Huang, H., Xie, H.-B., Guo, J.-Y., & Chen, H.-J. (2012). Ant colony optimization-based feature selection method for surface electromyography signals classification. Computers in Biology and Medicine, 42(1), 30–38.
Ibaraki, T. (1983). Parametric approaches to fractional programs. Mathematical Programming, 26(3), 345–362.
IBM (2019). ILOG CPLEX Optimizer v. 12.9.0. http://www-01.ibm.com. Accessed August 2020.
Jović, A., Brkić, K., & Bogunović, N. (2015). A review of feature selection methods with applications. In 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO),pp. 1200–1205. IEEE.
Kocheturov, A., Pardalos, P. M., & Karakitsiou, A. (2019). Massive datasets and machine learning for computational biomedicine: Trends and challenges. Annals of Operations Research, 276(1–2), 5–34.
Lawler, E. L. (2001). Combinatorial optimization: Networks and matroids. North Chelmsford: Courier Corporation.
Li, J., Cheng, K., Wang, S., Morstatter, F., Robert, T., Tang, J., & Liu, H. (2016). Feature selection: A data perspective. arXiv:1601.07996.
Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining (Vol. 454). Berlin: Springer.
MacKay, D. J. C. (2003). Information theory, inference and learning algorithms. Cambridge: Cambridge University Press.
Megiddo, N. (1979). Combinatorial optimization with rational objective functions. Mathematics of Operations Research, 4(4), 414–424.
Mehmanchi, E. (2020). Reformulation Techniques and Solution Approaches for Fractional 0-1 Programs and Applications. PhD thesis, University of Pittsburgh.
Mehmanchi, E., Gómez, A., & Prokopyev, O. A. (2019). Fractional 0–1 programs: Links between mixed-integer linear and conic quadratic formulations. Journal of Global Optimization, 75(2), 273–339.
Nguyen, H. T., Franke, K. & Petrović, S. (2011). A new ensemble-feature-selection framework for intrusion detection. In 2011 11th international conference on intelligent systems design and applications, pages 213–218. IEEE.
Nguyen, H., Franke, K., & Petrovic, S. (2010a). Improving effectiveness of intrusion detection by correlation feature selection. In 2010 International conference on availability, reliability and security (pp. 17–24). IEEE.
Nguyen, H. T., Franke, K., & Petrovic, S. (2010b). Towards a generic feature-selection measure for intrusion detection. In 2010 20th international conference on pattern recognition (pp. 1529–1532). IEEE.
Nguyen, H., Franke, K., & Petrovic, S. (2009). Optimizing a class of feature selection measures. In NIPS 2009 Workshop on discrete optimization in machine learning: Submodularity, Sparsity & Polyhedra (DISCML). Canada: Vancouver.
Palubeckis, G. (2004). Multistart tabu search strategies for the unconstrained binary quadratic optimization problem. Annals of Operations Research, 131(1–4), 259–282.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C++. The Art of Scientific Computing, 2, 1002.
Python Software Foundation. (2020). Python language reference v. 3.7.7. https://www.python.org/. Last accessed August (2020).
Radzik, T. (2013). Fractional combinatorial optimization. In Handbook of combinatorial optimization (pp. 1311–1355). Springer.
Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517.
Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. Data Classification: Algorithms and Applications, p. 37.
Tibshirani, R., Bien, J., Friedman, J., Hastie, T., Simon, N., Taylor, J., et al. (2012). Strong rules for discarding predictors in lasso-type problems. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(2), 245–266.
Viola, M., Sangiovanni, M., Toraldo, G., & Guarracino, M. R. (2017). A generalized eigenvalues classifier with embedded feature selection. Optimization Letters, 11(2), 299–311.
Yu, L. & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856–863).
Yuan, H., Xu, W., Li, Q., & Lau, R. (2018). Topic sentiment mining for sales performance prediction in e-commerce. Annals of Operations Research, 270(1–2), 553–576.
Zhao, Z., Morstatter, F., Sharma, S., Alelyani, S., Anand, A., & Liu, H. (2010). Advancing feature selection research. ASU feature selection repository (pp. 1–28).
Acknowledgements
The authors thank the Review Team for their helpful and constructive comments. This paper is based upon work supported by the National Science Foundation under Grant No. 1818700.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mehmanchi, E., Gómez, A. & Prokopyev, O.A. Solving a class of feature selection problems via fractional 0–1 programming. Ann Oper Res 303, 265–295 (2021). https://doi.org/10.1007/s10479-020-03917-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-020-03917-w