Abstract
In this paper we consider two relevant optimization problems: the problem of selecting the best sparse linear regression model and the problem of optimally identifying the parameters of auto-regressive models based on time series data. Usually these problems, which although different are indeed related, are solved through a sequence of separate steps, alternating between choosing a subset of features and then finding a best fit regression. In this paper we propose to model both problems as mixed integer non linear optimization ones and propose numerical procedures based on state of the art optimization tools in order to solve both of them. The proposed approach has the advantage of considering both model selection as well as parameter estimation as a single optimization problem. Numerical experiments performed on widely available datasets as well as on synthetic ones confirm the high quality of our approach, both in terms of the quality of the resulting models and in terms of CPU time.
Similar content being viewed by others
References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Selected Papers of Hirotugu Akaike, pp. 199–213. Springer (1998)
Anderson, D.A., Burnham, K.P., Anderson, D., Burnham, K.P.: Model Selection and Inference: A Practical Information-Theoretic Approach. Springer, New York (1998)
Bagirov, A., Clausen, C., Kohler, M.: An algorithm for the estimation of a regression function by continuous piecewise linear functions. Comput. Optim. Appl. 45(1), 159–179 (2010)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Nashua (2016)
Bertsimas, D., King, A.: Or forum–an algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2015)
Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)
Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43(1), 1–22 (2009)
Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: exact scalable algorithms and phase transitions. arXiv preprint arXiv:1709.10029, (2017)
Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley, Hoboken (2015)
Bozdogan, H.: Akaike’s information criterion and recent developments in information complexity. J. Math. Psychol. 44(1), 62–91 (2000)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Stat. Comput. 16(5), 1190–1208 (1995)
Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60(6), 2211–2227 (2014)
De Livera, A.M., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 106(496), 1513–1527 (2011)
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)
Dua, D., Graff, C.: UCI machine learning repository. 2017. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml
Efroymson, M.: Multiple regression analysis. Mathematical methods for digital computers, pp. 191–203 (1960)
Garside, M.: Some computational procedures for the best subset problem. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 20(1), 8–15 (1971)
Gómez, A., Prokopyev, O.: A mixed-integer fractional optimization approach to best subset selection. Technical Report Optimization On Line, 6795, Swanson School of Engineering, University of Pittsburgh, (2018)
Gurobi Optimization LLC. Gurobi optimizer reference manual, (2018). http://www.gurobi.com
Hamilton, J.D.: Time Series Analysis, vol. 2. Princeton University Press, Princeton, NJ (1994)
Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. Roy. Stat. Soc.: Ser. B (Methodol.) 41(2), 190–195 (1979)
Hyndman, R.J., Khandakar, Y. et al.: Automatic time series for forecasting: the forecast package for R. Number 6/07. Monash University, Department of Econometrics and Business Statistics, (2007)
Kimura, K., Waki, H.: Minimization of Akaike’s information criterion in linear regression analysis via mixed integer nonlinear program. Optim. Methods Softw. 33(3), 633–649 (2018)
Konishi, S., Kitagawa, G.: Information Criteria and Statistical Modeling. Springer, New York (2008)
Kwiatkowski, D., Phillips, P.C., Schmidt, P., Shin, Y.: Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econ. 54(1–3), 159–178 (1992)
Medeiros, M.C., Resende, M.G., Veiga, A.: Piecewise linear time series estimation with GRASP. Comput. Optim. Appl. 19(2), 127–144 (2001)
Miller, A.: Subset Selection in Regression. Chapman and Hall/CRC, London (2002)
Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247(3), 721–731 (2015)
Miyashiro, R., Takano, Y.: Subset selection by Mallows’ Cp: a mixed integer programming approach. Expert Syst. Appl. 42(1), 325–331 (2015)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Osborn, D.R., Chui, A.P., Smith, J.P., Birchenhall, C.R.: Seasonality and the order of integration for consumption. Oxf. Bull. Econ. Stat. 50(4), 361–377 (1988)
Sato, T., Takano, Y., Miyashiro, R., Yoshise, A.: Feature subset selection for logistic regression via mixed integer optimization. Comput. Optim. Appl. 64, 865–880 (2016)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Shen, X., Pan, W., Zhu, Y., Zhou, H.: On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 65(5), 807–832 (2013)
Shibata, R.: Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika 63(1), 117–126 (1976)
Team, R.C. et al.: R: A language and environment for statistical computing (2013)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 2 58(1), 267–288 (1996)
Wilson, Z.T., Sahinidis, N.V.: The ALAMO approach to machine learning. Comput. Chem. Eng. 106, 785–795 (2017)
Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(3), 627–649 (2014)
Acknowledgements
We are deeply indebted to the Associate Editor for having pointed out an important reference during the first review of this paper. This suggestion led us to totally revise and significantly expand the scope of this paper. We are also grateful to the reviewers for their useful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gangi, L.D., Lapucci, M., Schoen, F. et al. An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series. Comput Optim Appl 74, 919–948 (2019). https://doi.org/10.1007/s10589-019-00134-5
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-019-00134-5