Skip to main content
Log in

An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

In this paper we consider two relevant optimization problems: the problem of selecting the best sparse linear regression model and the problem of optimally identifying the parameters of auto-regressive models based on time series data. Usually these problems, which although different are indeed related, are solved through a sequence of separate steps, alternating between choosing a subset of features and then finding a best fit regression. In this paper we propose to model both problems as mixed integer non linear optimization ones and propose numerical procedures based on state of the art optimization tools in order to solve both of them. The proposed approach has the advantage of considering both model selection as well as parameter estimation as a single optimization problem. Numerical experiments performed on widely available datasets as well as on synthetic ones confirm the high quality of our approach, both in terms of the quality of the resulting models and in terms of CPU time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)

    Article  MathSciNet  Google Scholar 

  2. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Selected Papers of Hirotugu Akaike, pp. 199–213. Springer (1998)

  3. Anderson, D.A., Burnham, K.P., Anderson, D., Burnham, K.P.: Model Selection and Inference: A Practical Information-Theoretic Approach. Springer, New York (1998)

    MATH  Google Scholar 

  4. Bagirov, A., Clausen, C., Kohler, M.: An algorithm for the estimation of a regression function by continuous piecewise linear functions. Comput. Optim. Appl. 45(1), 159–179 (2010)

    Article  MathSciNet  Google Scholar 

  5. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Nashua (2016)

    MATH  Google Scholar 

  6. Bertsimas, D., King, A.: Or forum–an algorithmic approach to linear regression. Oper. Res. 64(1), 2–16 (2015)

    Article  MathSciNet  Google Scholar 

  7. Bertsimas, D., King, A., Mazumder, R., et al.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)

    Article  MathSciNet  Google Scholar 

  8. Bertsimas, D., Shioda, R.: Algorithm for cardinality-constrained quadratic optimization. Comput. Optim. Appl. 43(1), 1–22 (2009)

    Article  MathSciNet  Google Scholar 

  9. Bertsimas, D., Van Parys, B.: Sparse high-dimensional regression: exact scalable algorithms and phase transitions. arXiv preprint arXiv:1709.10029, (2017)

  10. Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley, Hoboken (2015)

    MATH  Google Scholar 

  11. Bozdogan, H.: Akaike’s information criterion and recent developments in information complexity. J. Math. Psychol. 44(1), 62–91 (2000)

    Article  MathSciNet  Google Scholar 

  12. Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Stat. Comput. 16(5), 1190–1208 (1995)

    Article  MathSciNet  Google Scholar 

  13. Cozad, A., Sahinidis, N.V., Miller, D.C.: Learning surrogate models for simulation-based optimization. AIChE J. 60(6), 2211–2227 (2014)

    Article  Google Scholar 

  14. De Livera, A.M., Hyndman, R.J., Snyder, R.D.: Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 106(496), 1513–1527 (2011)

    Article  MathSciNet  Google Scholar 

  15. Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002)

    Article  MathSciNet  Google Scholar 

  16. Dua, D., Graff, C.: UCI machine learning repository. 2017. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml

  17. Efroymson, M.: Multiple regression analysis. Mathematical methods for digital computers, pp. 191–203 (1960)

  18. Garside, M.: Some computational procedures for the best subset problem. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 20(1), 8–15 (1971)

    Google Scholar 

  19. Gómez, A., Prokopyev, O.: A mixed-integer fractional optimization approach to best subset selection. Technical Report Optimization On Line, 6795, Swanson School of Engineering, University of Pittsburgh, (2018)

  20. Gurobi Optimization LLC. Gurobi optimizer reference manual, (2018). http://www.gurobi.com

  21. Hamilton, J.D.: Time Series Analysis, vol. 2. Princeton University Press, Princeton, NJ (1994)

    MATH  Google Scholar 

  22. Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. Roy. Stat. Soc.: Ser. B (Methodol.) 41(2), 190–195 (1979)

    MathSciNet  MATH  Google Scholar 

  23. Hyndman, R.J., Khandakar, Y. et al.: Automatic time series for forecasting: the forecast package for R. Number 6/07. Monash University, Department of Econometrics and Business Statistics, (2007)

  24. Kimura, K., Waki, H.: Minimization of Akaike’s information criterion in linear regression analysis via mixed integer nonlinear program. Optim. Methods Softw. 33(3), 633–649 (2018)

    Article  MathSciNet  Google Scholar 

  25. Konishi, S., Kitagawa, G.: Information Criteria and Statistical Modeling. Springer, New York (2008)

    Book  Google Scholar 

  26. Kwiatkowski, D., Phillips, P.C., Schmidt, P., Shin, Y.: Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econ. 54(1–3), 159–178 (1992)

    Article  Google Scholar 

  27. Medeiros, M.C., Resende, M.G., Veiga, A.: Piecewise linear time series estimation with GRASP. Comput. Optim. Appl. 19(2), 127–144 (2001)

    Article  MathSciNet  Google Scholar 

  28. Miller, A.: Subset Selection in Regression. Chapman and Hall/CRC, London (2002)

    Book  Google Scholar 

  29. Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247(3), 721–731 (2015)

    Article  MathSciNet  Google Scholar 

  30. Miyashiro, R., Takano, Y.: Subset selection by Mallows’ Cp: a mixed integer programming approach. Expert Syst. Appl. 42(1), 325–331 (2015)

    Article  Google Scholar 

  31. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  Google Scholar 

  32. Osborn, D.R., Chui, A.P., Smith, J.P., Birchenhall, C.R.: Seasonality and the order of integration for consumption. Oxf. Bull. Econ. Stat. 50(4), 361–377 (1988)

    Article  Google Scholar 

  33. Sato, T., Takano, Y., Miyashiro, R., Yoshise, A.: Feature subset selection for logistic regression via mixed integer optimization. Comput. Optim. Appl. 64, 865–880 (2016)

    Article  MathSciNet  Google Scholar 

  34. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  35. Shen, X., Pan, W., Zhu, Y., Zhou, H.: On constrained and regularized high-dimensional regression. Ann. Inst. Stat. Math. 65(5), 807–832 (2013)

    Article  MathSciNet  Google Scholar 

  36. Shibata, R.: Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika 63(1), 117–126 (1976)

    Article  MathSciNet  Google Scholar 

  37. Team, R.C. et al.: R: A language and environment for statistical computing (2013)

  38. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Methodol. 2 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  39. Wilson, Z.T., Sahinidis, N.V.: The ALAMO approach to machine learning. Comput. Chem. Eng. 106, 785–795 (2017)

    Article  Google Scholar 

  40. Zheng, Z., Fan, Y., Lv, J.: High dimensional thresholded regression and shrinkage effect. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(3), 627–649 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We are deeply indebted to the Associate Editor for having pointed out an important reference during the first review of this paper. This suggestion led us to totally revise and significantly expand the scope of this paper. We are also grateful to the reviewers for their useful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Schoen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gangi, L.D., Lapucci, M., Schoen, F. et al. An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series. Comput Optim Appl 74, 919–948 (2019). https://doi.org/10.1007/s10589-019-00134-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00134-5

Keywords

Navigation