Skip to main content
Log in

Forward variable selection for ultra-high dimensional quantile regression models

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We propose forward variable selection procedures with a stopping rule for feature screening in ultra-high-dimensional quantile regression models. For such very large models, penalized methods do not work and some preliminary feature screening is necessary. We demonstrate the desirable theoretical properties of our forward procedures by taking care of uniformity w.r.t. subsets of covariates properly. The necessity of such uniformity is often overlooked in the literature. Our stopping rule suitably incorporates the model size at each stage. We also present the results of simulation studies and a real data application to show their good finite sample performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barut, E., Fan, J., Verhasselt, A. (2016). Conditional sure independence screening. Journal of the American Statistical Association, 111, 1266–1277.

    Article  MathSciNet  Google Scholar 

  • Belloni, A., Chernozhukov, V. (2011). \(\ell\)1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39, 82–130.

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. New York: Springer.

  • Bühlmann, P., Kalisch, M., Meier, L. (2014). High-dimensional statistics with a view toward applications in biology. Annual Review of Statistics and Its Application, 1, 255–278.

    Article  Google Scholar 

  • Chen, J., Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, J., Chen, Z. (2012). Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, 22, 555–574.

    Article  MathSciNet  MATH  Google Scholar 

  • Cheng, M. Y., Honda, T., Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 1209–1221.

    Article  MathSciNet  Google Scholar 

  • Das, D., Gregory, K., Lahiri, S. N. (2019). Perturbation bootstrap in adaptive lasso. The Annals of Statistics, 47, 2080–2116.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 95, 1348–1360.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Fan, Y., Barut, E. (2014). Adaptive robust variable selection. The Annals of Statistics, 42, 324–351.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R., Zhang, C. H., Zou, H. (2020). Statistical foundations of data science. Boca Raton: Chapman and Hall/CRC.

  • Hastie, T., Tibshirani, R., Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton: Chapman & Hall/CRC.

  • He, X., Wang, L., Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342–369.

    MathSciNet  MATH  Google Scholar 

  • Honda, T., Lin, C. T. (2021). Forward variable selection for sparse ultra-high-dimensional generalized varying coefficient models. Japanese Journal of Statistics and Data Science, 4, 151–179.

    Article  MathSciNet  MATH  Google Scholar 

  • Honda, T., Ing, C. K., Wu, W. Y. (2019). Adaptively weighted group Lasso for semiparametric quantile regression models. Bernoulli, 25, 3311–3338.

    Article  MathSciNet  MATH  Google Scholar 

  • Ing, C. K., Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.

    Article  MathSciNet  MATH  Google Scholar 

  • Koenker, R. (2005). Quantile regression. New York: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Koenker, R. (2021). quantreg: Quantile regression. R Package version 5.86. https://cran.r-project.org/web/packages/quantreg/index.html.

  • Koenker, R., Basset, G. (1978). Regression quantiles. Econometrica, 46, 33–50.

    Article  MathSciNet  MATH  Google Scholar 

  • Kong, Y., Li, Y., Zerom, D. (2019). Screening and selection for quantile regression using an alternative measure of variable importance. Journal of Multivariate Analysis, 173, 435–455.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, E. R., Noh, H., Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109, 216–229.

    Article  MathSciNet  MATH  Google Scholar 

  • Lin, C. T., Cheng, Y. J., Ing, C. K. (2022). Greedy variable selection for high-dimensional Cox models. Statistica Sinica, 34.

  • Liu, J., Zhong, W., Li, R. (2015). A selective overview of feature screening for ultrahigh-dimensional data. Science China Mathematics, 58, 1–22.

    Article  MathSciNet  MATH  Google Scholar 

  • Luo, S., Chen, Z. (2014). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109, 1229–1240.

    Article  MathSciNet  MATH  Google Scholar 

  • Pijyan, A., Zheng, Q., Hong, H. G., Li, Y. (2020). Consistent estimation of generalized linear models with high dimensional predictors via stepwise regression. Entropy, 22, 965.

    Article  MathSciNet  Google Scholar 

  • Sherwood, B., Maidman A. (2020). rqPen: Penalized quantile regression. R Package version 2.2.2. https://cran.r-project.org/web/packages/rqPen/index.html.

  • Sherwood, B., Wang, L. (2016). Partially linear additive quantile regression in ultra-high dimension. The Annals of Statistics, 44, 288–317.

    Article  MathSciNet  MATH  Google Scholar 

  • Tang, Y., Wang, Y., Wang, H. J., Pan, Q. (2022). Conditional marginal test for high dimensional quantile regression. Statistica Sinica, 32, 869–892.

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.

  • Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 1512–1524.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, L., Wu, Y., Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222.

    Article  MathSciNet  MATH  Google Scholar 

  • Wu, Y., Yin, G. (2015). Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 102, 65–76.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, Q., Hong, H. G., Li, Y. (2020). Building generalized linear models with ultrahigh dimensional features: A sequentially conditional approach. Biometrics, 76, 47–60.

    Article  MathSciNet  MATH  Google Scholar 

  • Zheng, Q., Peng, L., He, X. (2015). Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics, 43, 2225–2258.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We appreciate valuable comments from the two reviewers very much. We also appreciate comments and help from Prof. Ching-Kang Ing very much. Honda’s research was supported in part by JSPS KAKENHI Grant Number JP 20K11705, Japan. Lin’s research was supported by grant 111-2118-M-035-007-MY2 from the National Science and Technology Council, Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshio Honda.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Honda, T., Lin, CT. Forward variable selection for ultra-high dimensional quantile regression models. Ann Inst Stat Math 75, 393–424 (2023). https://doi.org/10.1007/s10463-022-00849-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-022-00849-z

Keywords

Navigation