Skip to main content
Log in

Testing covariates in high-dimensional regression

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

In a high-dimensional linear regression model, we propose a new procedure for testing statistical significance of a subset of regression coefficients. Specifically, we employ the partial covariances between the response variable and the tested covariates to obtain a test statistic. The resulting test is applicable even if the predictor dimension is much larger than the sample size. Under the null hypothesis, together with boundedness and moment conditions on the predictors, we show that the proposed test statistic is asymptotically standard normal, which is further supported by Monte Carlo experiments. A similar test can be extended to generalized linear models. The practical usefulness of the test is illustrated via an empirical example on paid search advertising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bai, Z. D., Saranadasa, H. (1996). Effect of high dimension: by an example of two sample problem. Statistica Sinica, 6, 311–329.

    Google Scholar 

  • Bao, Y., Ullah, A. (2010). Expectation of quadratic forms in normal and nonnormal variables with applications. Journal of Statistical Planning and Inference, 140, 1193–1205.

    Google Scholar 

  • Bendat, J. S., Piersol, A. G. (1966). Measurement and analysis of random data. New York: Wiley.

  • Chatterjee, S., Hadi, A. S. (2006). Regression analysis by example (4th edn.). New Work: Wiley.

  • Chen, S. X., Qin, Y. L. (2010). A two sample test for high dimensional data with application to gene-set testing. The Annals of Statistics, 38, 808–835.

    Google Scholar 

  • Chen, S. X., Zhang, L. X., Zhong, P. S. (2010). Tests for high dimensional covariance matrices. Journal of the American Statistical Association, 105, 810–819.

    Google Scholar 

  • Draper, N. R., Smith, H. (1998). Applied regression analysis (3rd edn.). New York: Wiley.

  • Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Google Scholar 

  • Fan, J., Fan, Y., Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147, 186–197.

    Google Scholar 

  • Hall, P., Heyde, C. C. (1980). Martingale limit theory and its application. New York: Academic Press.

  • Lehmann, E. L. (1998). Theory of Point Estimation (2nd edn.). New York: Springer.

  • McCullagh, P., Nelder, J. A. (1989). Generalized linear models. New York: Chapman and Hall.

  • Milliken, G. A., Johnson, D. E. (2009). Analysis of Messy Data, Volume I: Designed Experiments (2nd ed.). New York: Chapman and Hall.

  • Ravishanker, N., Dey, D. K. (2001). A first course in linear model theory. New York: Chapman and Hall/CRC.

  • Seber, G. A. F., Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York: Wiley.

  • Shao, J. (2003). Mathematical statistics (2nd edn.). New York: Springer-Verlag.

  • Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high dimensional data. Journal of Japan Statistical Society, 35, 251–272.

    Google Scholar 

  • Tibshirani, R. J. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    Google Scholar 

  • Vittinghoff, E., Glidden, D., Shiboski, S., McCulloch, C. E. (2010). Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. New York: Springer.

  • Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 1512–1524.

    Google Scholar 

  • Weisberg, S. (2005). Applied linear regression (3rd edn.). New York: Wiley.

  • Yandel, B. S. (1997). Practical data analysis for designed experiments. New York: CRC Press.

  • Zhang, C. H., Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics, 36, 1567–1594.

    Google Scholar 

  • Zhong, P. S., Chen, S. X. (2011). Tests for high dimensional regression coefficients with factorial designs. Journal of the American Statistical Association, 106, 260–274.

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to the Editor, the AE, and two referees for their helpful comments and advices.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hansheng Wang.

Additional information

The research of Wang and Lan were supported in part by National Natural Science Foundation of China (NSFC, 11131002, 11271032), Fox Ying Tong Education Foundation, the Business Intelligence Research Center at Peking University, and the Center for Statistical Science at Peking University.

About this article

Cite this article

Lan, W., Wang, H. & Tsai, CL. Testing covariates in high-dimensional regression. Ann Inst Stat Math 66, 279–301 (2014). https://doi.org/10.1007/s10463-013-0414-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-013-0414-0

Keywords

Navigation