Abstract
In a high-dimensional linear regression model, we propose a new procedure for testing statistical significance of a subset of regression coefficients. Specifically, we employ the partial covariances between the response variable and the tested covariates to obtain a test statistic. The resulting test is applicable even if the predictor dimension is much larger than the sample size. Under the null hypothesis, together with boundedness and moment conditions on the predictors, we show that the proposed test statistic is asymptotically standard normal, which is further supported by Monte Carlo experiments. A similar test can be extended to generalized linear models. The practical usefulness of the test is illustrated via an empirical example on paid search advertising.
Similar content being viewed by others
References
Bai, Z. D., Saranadasa, H. (1996). Effect of high dimension: by an example of two sample problem. Statistica Sinica, 6, 311–329.
Bao, Y., Ullah, A. (2010). Expectation of quadratic forms in normal and nonnormal variables with applications. Journal of Statistical Planning and Inference, 140, 1193–1205.
Bendat, J. S., Piersol, A. G. (1966). Measurement and analysis of random data. New York: Wiley.
Chatterjee, S., Hadi, A. S. (2006). Regression analysis by example (4th edn.). New Work: Wiley.
Chen, S. X., Qin, Y. L. (2010). A two sample test for high dimensional data with application to gene-set testing. The Annals of Statistics, 38, 808–835.
Chen, S. X., Zhang, L. X., Zhong, P. S. (2010). Tests for high dimensional covariance matrices. Journal of the American Statistical Association, 105, 810–819.
Draper, N. R., Smith, H. (1998). Applied regression analysis (3rd edn.). New York: Wiley.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J., Fan, Y., Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics, 147, 186–197.
Hall, P., Heyde, C. C. (1980). Martingale limit theory and its application. New York: Academic Press.
Lehmann, E. L. (1998). Theory of Point Estimation (2nd edn.). New York: Springer.
McCullagh, P., Nelder, J. A. (1989). Generalized linear models. New York: Chapman and Hall.
Milliken, G. A., Johnson, D. E. (2009). Analysis of Messy Data, Volume I: Designed Experiments (2nd ed.). New York: Chapman and Hall.
Ravishanker, N., Dey, D. K. (2001). A first course in linear model theory. New York: Chapman and Hall/CRC.
Seber, G. A. F., Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York: Wiley.
Shao, J. (2003). Mathematical statistics (2nd edn.). New York: Springer-Verlag.
Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high dimensional data. Journal of Japan Statistical Society, 35, 251–272.
Tibshirani, R. J. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58, 267–288.
Vittinghoff, E., Glidden, D., Shiboski, S., McCulloch, C. E. (2010). Regression methods in biostatistics: linear, logistic, survival, and repeated measures models. New York: Springer.
Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104, 1512–1524.
Weisberg, S. (2005). Applied linear regression (3rd edn.). New York: Wiley.
Yandel, B. S. (1997). Practical data analysis for designed experiments. New York: CRC Press.
Zhang, C. H., Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics, 36, 1567–1594.
Zhong, P. S., Chen, S. X. (2011). Tests for high dimensional regression coefficients with factorial designs. Journal of the American Statistical Association, 106, 260–274.
Acknowledgments
The authors are grateful to the Editor, the AE, and two referees for their helpful comments and advices.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research of Wang and Lan were supported in part by National Natural Science Foundation of China (NSFC, 11131002, 11271032), Fox Ying Tong Education Foundation, the Business Intelligence Research Center at Peking University, and the Center for Statistical Science at Peking University.
About this article
Cite this article
Lan, W., Wang, H. & Tsai, CL. Testing covariates in high-dimensional regression. Ann Inst Stat Math 66, 279–301 (2014). https://doi.org/10.1007/s10463-013-0414-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-013-0414-0