Abstract
We propose forward variable selection procedures with a stopping rule for feature screening in ultra-high-dimensional quantile regression models. For such very large models, penalized methods do not work and some preliminary feature screening is necessary. We demonstrate the desirable theoretical properties of our forward procedures by taking care of uniformity w.r.t. subsets of covariates properly. The necessity of such uniformity is often overlooked in the literature. Our stopping rule suitably incorporates the model size at each stage. We also present the results of simulation studies and a real data application to show their good finite sample performances.
Similar content being viewed by others
References
Barut, E., Fan, J., Verhasselt, A. (2016). Conditional sure independence screening. Journal of the American Statistical Association, 111, 1266–1277.
Belloni, A., Chernozhukov, V. (2011). \(\ell\)1-penalized quantile regression in high-dimensional sparse models. The Annals of Statistics, 39, 82–130.
Bühlmann, P., van de Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. New York: Springer.
Bühlmann, P., Kalisch, M., Meier, L. (2014). High-dimensional statistics with a view toward applications in biology. Annual Review of Statistics and Its Application, 1, 255–278.
Chen, J., Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Chen, J., Chen, Z. (2012). Extended BIC for small-n-large-P sparse GLM. Statistica Sinica, 22, 555–574.
Cheng, M. Y., Honda, T., Zhang, J. T. (2016). Forward variable selection for sparse ultra-high dimensional varying coefficient models. Journal of the American Statistical Association, 111, 1209–1221.
Das, D., Gregory, K., Lahiri, S. N. (2019). Perturbation bootstrap in adaptive lasso. The Annals of Statistics, 47, 2080–2116.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 95, 1348–1360.
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B, 70, 849–911.
Fan, J., Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.
Fan, J., Fan, Y., Barut, E. (2014). Adaptive robust variable selection. The Annals of Statistics, 42, 324–351.
Fan, J., Li, R., Zhang, C. H., Zou, H. (2020). Statistical foundations of data science. Boca Raton: Chapman and Hall/CRC.
Hastie, T., Tibshirani, R., Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton: Chapman & Hall/CRC.
He, X., Wang, L., Hong, H. G. (2013). Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. The Annals of Statistics, 41, 342–369.
Honda, T., Lin, C. T. (2021). Forward variable selection for sparse ultra-high-dimensional generalized varying coefficient models. Japanese Journal of Statistics and Data Science, 4, 151–179.
Honda, T., Ing, C. K., Wu, W. Y. (2019). Adaptively weighted group Lasso for semiparametric quantile regression models. Bernoulli, 25, 3311–3338.
Ing, C. K., Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.
Koenker, R. (2005). Quantile regression. New York: Cambridge University Press.
Koenker, R. (2021). quantreg: Quantile regression. R Package version 5.86. https://cran.r-project.org/web/packages/quantreg/index.html.
Koenker, R., Basset, G. (1978). Regression quantiles. Econometrica, 46, 33–50.
Kong, Y., Li, Y., Zerom, D. (2019). Screening and selection for quantile regression using an alternative measure of variable importance. Journal of Multivariate Analysis, 173, 435–455.
Lee, E. R., Noh, H., Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109, 216–229.
Lin, C. T., Cheng, Y. J., Ing, C. K. (2022). Greedy variable selection for high-dimensional Cox models. Statistica Sinica, 34.
Liu, J., Zhong, W., Li, R. (2015). A selective overview of feature screening for ultrahigh-dimensional data. Science China Mathematics, 58, 1–22.
Luo, S., Chen, Z. (2014). Sequential Lasso cum EBIC for feature selection with ultra-high dimensional feature space. Journal of the American Statistical Association, 109, 1229–1240.
Pijyan, A., Zheng, Q., Hong, H. G., Li, Y. (2020). Consistent estimation of generalized linear models with high dimensional predictors via stepwise regression. Entropy, 22, 965.
Sherwood, B., Maidman A. (2020). rqPen: Penalized quantile regression. R Package version 2.2.2. https://cran.r-project.org/web/packages/rqPen/index.html.
Sherwood, B., Wang, L. (2016). Partially linear additive quantile regression in ultra-high dimension. The Annals of Statistics, 44, 288–317.
Tang, Y., Wang, Y., Wang, H. J., Pan, Q. (2022). Conditional marginal test for high dimensional quantile regression. Statistica Sinica, 32, 869–892.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.
van der Vaart, A. W., Wellner, J. A. (1996). Weak convergence and empirical processes. New York: Springer.
Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 1512–1524.
Wang, L., Wu, Y., Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 107, 214–222.
Wu, Y., Yin, G. (2015). Conditional quantile screening in ultrahigh-dimensional heterogeneous data. Biometrika, 102, 65–76.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Zheng, Q., Hong, H. G., Li, Y. (2020). Building generalized linear models with ultrahigh dimensional features: A sequentially conditional approach. Biometrics, 76, 47–60.
Zheng, Q., Peng, L., He, X. (2015). Globally adaptive quantile regression with ultra-high dimensional data. The Annals of Statistics, 43, 2225–2258.
Acknowledgements
We appreciate valuable comments from the two reviewers very much. We also appreciate comments and help from Prof. Ching-Kang Ing very much. Honda’s research was supported in part by JSPS KAKENHI Grant Number JP 20K11705, Japan. Lin’s research was supported by grant 111-2118-M-035-007-MY2 from the National Science and Technology Council, Taiwan.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
About this article
Cite this article
Honda, T., Lin, CT. Forward variable selection for ultra-high dimensional quantile regression models. Ann Inst Stat Math 75, 393–424 (2023). https://doi.org/10.1007/s10463-022-00849-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00849-z