Abstract
This paper considers testing regression coefficients in high-dimensional linear model with fixed design matrix. This problem is highly irregular in the frequentist point of view. In fact, we prove that no test can guarantee nontrivial power even when the true model deviates greatly from the null hypothesis. Nevertheless, Bayesian methods can still produce tests with good average power behavior. We propose a new test statistic which is the limit of Bayes factors under normal distribution. The null distribution of the proposed test statistic is approximated by Lindeberg’s replacement trick. Under certain conditions, the global asymptotic power function of the proposed test is also given. The finite sample performance of the proposed test is demonstrated via simulation studies.
Similar content being viewed by others
References
Arboretti R, Ceccato R, Corain L, Ronchi F, Salmaso L (2018) Multivariate small sample tests for two-way designs with applications to industrial statistics. Stat Pap 59(4):1483–1503
Bai Z, Pan G, Yin Y (2018) A central limit theorem for sums of functions of residuals in a high-dimensional regression model with an application to variance homoscedasticity test. TEST 27(4):896–920
Baltagi BH, Kao C, Na S (2013) Testing for cross-sectional dependence in a panel factor model using the wild bootstrap \(F\) test. Stat Pap 54(4):1067–1094
Bentkus V, Götze F (1996) Optimal rates of convergence in the CLT for quadratic forms. Ann Probab 24(1):466–490
Bühlmann P (2013) Statistical significance in high-dimensional linear models. Bernoulli 19(4):1212–1242
Casella G, Moreno E (2006) Objective Bayesian variable selection. J Am Stat Assoc 101(473):157–167
Chatterjee S (2006) A generalization of the Lindeberg principle. Ann Probab 34(6):2061–2076
Chatterjee S (2008) A new method of normal approximation. Ann Probab 36(4):1584–1610
Chen SX, Zhang LX, Zhong PS (2010) Tests for high-dimensional covariance matrices. J Am Stat Assoc 105(490):810–819
Cohn DL (2013) Measure theory, 2nd edn. Birkhäuser, New York
Cui H, Guo W, Zhong W (2018) Test for high-dimensional regression coefficients using refitted cross-validation variance estimation. Ann Stat 46(3):958–988
DasGupta A (2008) Asymptotic theory of statistics and probability, 1st edn. Springer, New York
Dezeure R, Bühlmann P, Zhang CH (2017) High-dimensional simultaneous inference with the bootstrap. TEST 26(4):685–719
Dicker LH, Erdogdu MA (2017) Flexible results for quadratic forms with applications to variance components estimation. Ann Stat 45(1):386–414
Draper NR, Pukelsheim F (1996) An overview of design of experiments. Stat Pap 37(1):1–32
Fan J, Yuan L, Mincheva M (2013) Large covariance estimation by thresholding principal orthogonal complements. J R Stat Soc B 75(4):603–680
Feng L, Zou C, Wang Z, Chen B (2013) Rank-based score tests for high-dimensional regression coefficients. Electron J Stat 7:2131–2149
Goddard SD, Johnson VE (2016) Restricted most powerful Bayesian tests for linear models. Scand J Stat 43(4):1162–1177
Goeman JJ, van de Geer SA, van Houwelingen HC (2006) Testing against a high dimensional alternative. J R Stat Soc B 68(3):477–493
Goeman JJ, van Houwelingen HC, Finos L (2011) Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control. Biometrika 98(2):381–390
Götze F, Tikhomirov A (2002) Asymptotic distribution of quadratic forms and applications. J Theor Probab 15(2):423–475
Horn RA, Johnson CR (1991) Topics in matrix analysis, 1st edn. Cambridge University Press, New York
Ingster YI, Tsybakov AB, Verzelen N (2010) Detection boundary in sparse regression. Electron J Stat 4:1476–1526
Janson L, Barber RF, Candès E (2016) EigenPrism: inference for high dimensional signal-to-noise ratios. J R Stat Soc B 79(4):1037–1065
Javier Girón F, Lina Martínez M, Moreno E, Torres F (2006) Objective testing procedures in linear models: calibration of the \(p\)-values. Scand J Stat 33(4):765–784
Jiang J (1996) Reml estimation: asymptotic behavior and related topics. Ann Stat 24(1):255–286
Jong PD (1987) A central limit theorem for generalized quadratic forms. Probab Theory Relat Fields 75(2):261–277
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90(430):773–795
Lan W, Wang H, Tsai CL (2014) Testing covariates in high-dimensional regression. Ann Inst Stat Math 66(2):279–301
Lan W, Ding Y, Fang Z, Fang K (2016a) Testing covariates in high dimension linear regression with latent factors. J Multivar Anal 144:25–37
Lan W, Zhong PS, Li R, Wang H, Tsai CL (2016b) Testing a single regression coefficient in high dimensional linear models. J Econom 195(1):154–168
Lehmann EL, Romano JP (2005) Testing statistical hypotheses, 3rd edn. Springer, New York
Lei L, Bickel PJ, Karoui NE (2018) Asymptotics for high dimensional regression M-estimates: fixed design results. Probab Theory Relat Fields 172(3–4):983–1079
Liang F, Paulo R, Molina G, Clyde MA, Berger JO (2008) Mixtures of g priors for Bayesian variable selection. J Am Stat Assoc 103(481):410–423
Pollard D (1984) Convergence of stochastic processes, 1st edn. Springer, New York
Sevast’yanov BA (1961) A class of limit distributions for quadratic forms of stochastic random variables. Theory Probab Appl 6(3):337–340
Wang S, Cui H (2015) A new test for part of high dimensional regression coefficients. J Multivar Anal 137:187–203
Xu K (2016) A new nonparametric test for high-dimensional regression coefficients. J Stat Comput Simul 87(5):855–869
Zhang X, Cheng G (2017) Simultaneous inference for high-dimensional linear models. J Am Stat Assoc 112(518):757–768
Zhang CH, Zhang SS (2014) Confidence intervals for low dimensional parameters in high dimensional linear models. J R Stat Soc B 76(1):217–242
Zhong PS, Chen SX (2011) Tests for high-dimensional regression coefficients with factorial designs. J Am Stat Assoc 106(493):260–274
Zhou Q, Guan Y (2018) On the null distribution of bayes factors in linear regression. J Am Stat Assoc 113(523):1362–1371
Acknowledgements
The authors would like to thank the editor and two anonymous referees for their helpful comments and suggestions which helped to improve the paper significantly. This work was supported by the National Natural Science Foundation of China under Grant No. 11471035.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 1
Under the assumptions of Theorem 1, if there exists a Borel set \(G\subset {\mathbb {R}}^n\) and a number \(M\ge 0\) such that
then \(\varphi (\mathbf {y}){\mathbf {1}}_{G}(\mathbf {y})\ge \alpha {\mathbf {1}}_{G}(\mathbf {y})\) a.e. \(\lambda \).
Proof
We prove the claim by contradiction. Suppose \(\lambda (\{\mathbf {y}:\varphi (\mathbf {y}) <\alpha \} \cap G )>0\). Then there exists a sufficiently small \(0< \eta <\alpha \), such that \(\lambda (\{\mathbf {y}:\varphi (\mathbf {y}) <\alpha -\eta \} \cap G)>0\). We denote \( E:=\{\mathbf {y}:\varphi (\mathbf {y}) <\alpha -\eta \} \cap G\). From Lebesgue density theorem (Cohn 2013, Corollary 6.2.6), there exists a point \(z\in E\), such that, for any \(\varepsilon >0\) there is a \(\delta _{\varepsilon }>0\) such that for any \(0< \delta ' <\delta _\varepsilon \),
where \(C_{\delta '}=\prod _{i=1}^n [z_i-{\delta '}, z_i + {\delta '}]\). We put
Then for any \(\phi >M\) and \(0<\delta ' <\delta _\varepsilon \),
In the last inequality, we put \(\delta '\) small enough such that
and put
Then we obtain the contradiction \(\alpha \le \alpha -(2/3)\eta \). This completes the proof. \(\square \)
Proof
(Proof of Theorem 1) We prove the claim by contradiction. Suppose there exists an \(M\ge 0\) such that
for every \(\varvec{\beta }_a\in {\mathbb {R}}^q\), \(\varvec{\beta }_b \in {\mathbb {R}}^p\), \(\phi >0\) satisfying \(\phi \varvec{\beta }_b^\top \mathbf {X}_b^\top {\tilde{\mathbf {P}}}_a \mathbf {X}_b \varvec{\beta }_b >M\). Note that for any \(h>0\),
is a subset of
Then Lemma 1 implies that for any \(h>0\), \(\varphi (\mathbf {y}) {\mathbf {1}}_{G_h}(\mathbf {y})\ge \alpha {\mathbf {1}}_{G_h}(\mathbf {y})\) a.e. \(\lambda \), where
It can be seen that \(\lambda (\{ \cup _{n=1}^\infty G_{1/n}\}^\complement )= 0\). Hence \(\varphi (\mathbf {y}) \ge \alpha \) a.e. \(\lambda \). On the other hand, since \(\varphi (\mathbf {y})\) is a level \(\alpha \) test, for every \(\phi >0\),
Note that the integrand of (10) is nonnegative. It follows that \(\varphi (\mathbf {y})=\alpha \) a.s. \(\lambda \), a contradiction. This completes the proof. \(\square \)
Proof
(Proof of Theorem 2) We note that \({{\,\mathrm{Var}\,}}(\varvec{\xi }^\top \mathbf {A}\varvec{\xi })= 2{{\,\mathrm{tr}\,}}(\mathbf {A}^2) + ({{\,\mathrm{E}\,}}(\xi _1^4)-3){{\,\mathrm{tr}\,}}(\mathbf {A}^{\circ 2} ) \). Let
Then
For \(l=1,\ldots , n\), define
It can be seen that for \(l=2,\ldots , n\), \(S_{l-1}+ h_{l-1} =S_{l} + g_{l} \), and \(S=S_n + h_n\), \(S_1 + g_1=S_\tau ^*\).
Thus, for any \(f \in {\mathscr {C}}^4 ({\mathbb {R}})\),
Apply Taylor’s theorem, for \(l=1,\ldots ,n\),
where \(\theta _1,\theta _2\in [0,1]\). Thus,
where \({{\,\mathrm{E}\,}}_l\) denotes taking expectation with respect to \(\xi _l, z_l ,{\check{z}}_l\). It is straightforward to show that
Thus,
Now we bound \({{\,\mathrm{E}\,}}(h_l^4)\) and \({{\,\mathrm{E}\,}}(g_l^4)\). By direct calculation,
To upper bound the above quantity, we use the facts \( 24 {{\,\mathrm{E}\,}}[ \xi _1^2(\xi _1^2 -1)^2] \le 2(16{{\,\mathrm{E}\,}}(\xi _1^2 -1)^4 + (9/4) {{\,\mathrm{E}\,}}(\xi _1^4) ) \), \({{\,\mathrm{E}\,}}(\xi _1^2 - 1)^4\le {{\,\mathrm{E}\,}}(\xi _1^8)\) and
Then we obtain the bound
Similarly, we have
Combining (11), (12) and (13) yields
This completes the proof. \(\square \)
Proof
(Proof of Theorem 3) Throughout the proof, we use the similar notations as in Theorem 2 and define
and
where \(z_1,\ldots , z_n, {\check{z}}_1,\ldots , {\check{z}}_n\) are iid \({\mathcal {N}}(0,1)\) random variables and are independent of \({\hat{\tau }}^2\).
By a standard subsequence argument, we only need to prove the theorem along a subsequence of \(\{n\}\). Hence, without loss of generality, we assume \({\hat{\tau }}^2 \xrightarrow {a.s.} \phi ^2 {{\,\mathrm{E}\,}}(\epsilon _1^4)-1\). Write
Note that \(S_{\hat{\tau },1}^*\) is independent of \({\hat{\tau }}\). Since \({{\,\mathrm{E}\,}}( S_{{\hat{\tau }},1}^{*2} )=1\), the distributions \({\mathcal {L}}(S_{{\hat{\tau }},1}^{*}) \) are tight as \(n\rightarrow \infty \). Hence, without loss of generality, we assume \({\mathcal {L}} (S_{{\hat{\tau }},1}^*)\) weakly converges to a limit distribution with distribution function \(F^\dagger (x)\). Let \(S^\dagger \) be a random variable with distribution function \(F^\dagger (x)\). By some algebra (see, e.g., Chen et al. (2010), Proposition A.1.(iii)), it can be shown that \({{\,\mathrm{E}\,}}(S^{*4}_{{\hat{\tau }},1})\) is uniformly bounded. Then \({\mathcal {L}} ( S_{{\hat{\tau }},1}^{*2} )\) is uniformly integrable. Hence \({{\,\mathrm{E}\,}}(S^{\dagger 2})=1\) and \(F^\dagger (x)\) can not concentrate on a single point. Consequently, \(F^\dagger (x)\) is continuous and is strict increasing for \(x\in \{x:0<F(x)<1\}\); see Sevast’yanov (1961) as well the remark made by A. N. Kolmogorov on that paper.
The condition (8) implies that \({{\,\mathrm{E}\,}}[S_{{\hat{\tau }},2}^{*2}|{\hat{\tau }}]\rightarrow 0\) almost surely. Then almost surely, \({\mathcal {L}} (S^*_{{\hat{\tau }}}|{\hat{\tau }}) \rightsquigarrow {\mathcal {L}}(S^\dagger )\). Consequently, for every \(f\in {\mathscr {C}}^4 ({\mathbb {R}})\), we have \(| {{\,\mathrm{E}\,}}[f(S^*_{{\hat{\tau }}}) |{\hat{\tau }}] - {{\,\mathrm{E}\,}}f(S^\dagger ) |\rightarrow 0\) almost surely. On the other hand, Theorem 2 and the condition (8) imply \(|{{\,\mathrm{E}\,}}f(S)- {{\,\mathrm{E}\,}}[f(S^*_{{\hat{\tau }}})|{\hat{\tau }}] |\rightarrow 0\) almost surely. Thus, \(|{{\,\mathrm{E}\,}}f(S)- {{\,\mathrm{E}\,}}f(S^\dagger ) |\rightarrow 0\). That is, \({\mathcal {L}} (S)\rightsquigarrow {\mathcal {L}} (S^\dagger )\).
Note that
We need to deal with \(F^{-1} (1-\alpha ;\mathbf {A},{\hat{\tau }})\). Since \({\mathcal {L}} (S^*_{{\hat{\tau }}}|{\hat{\tau }}) \rightsquigarrow {\mathcal {L}}(S^\dagger )\) almost surely, the fact
implies that almost surely,
We also need the fact that
which is a consequence of
The fact \(S\rightsquigarrow S^\dagger \), Eqs. (14), (15) and Slutsky’s theorem lead to
This proves the theorem. \(\square \)
Proof
(Proof of Proposition 1) From Bai et al. (2018), Theorem 2.1, one can obtain the explicit forms of \({{\,\mathrm{Var}\,}}\left( {\tilde{\varvec{\epsilon }}}^\top \left( {\tilde{\mathbf {P}}}_a \right) {\tilde{\varvec{\epsilon }}} \right) \) and \({{\,\mathrm{Var}\,}}\left( \sum _{i=1}^n {\tilde{\epsilon }}_i^4 \right) \) which involves the traces of certain matrices. Using Horn and Johnson (1991), Theorem 5.5.1, one can see that the eigenvalues of these matrices are all bounded. Hence it can be deduced that \({{\,\mathrm{Var}\,}}( {\tilde{\varvec{\epsilon }}}^\top {\tilde{\mathbf {P}}}_a {\tilde{\varvec{\epsilon }}} )=O(n)\) and \({{\,\mathrm{Var}\,}}\left( \sum _{i=1}^n {\tilde{\epsilon }}_i^4 \right) =O(n)\). Thus,
It follows that
Let \(\delta _{i,j}=1\) if \(i=j\) and 0 if \(i\ne j\). We have
Then
This completes the proof. \(\square \)
Proof
(Proof of Proposition 2) Without loss of generality, we assume \(\mathbf {A}\) is a diagonal matrix and \(|b_1|\ge \cdots \ge |b_n|\). By a standard subsequence argument, we only need to prove the result along a subsequence. Hence we can assume \(\lim _{n\rightarrow \infty }\Vert b\Vert ^2/{{\,\mathrm{tr}\,}}(\mathbf {A}^2) =c \in [0,+\infty ]\). If \( c=0\), Lyapunov central limit theorem implies that
If \(c=+\infty \),
In what follows, we assume \(c\in (0,+\infty )\). By Helly selection theorem, we can assume \(\lim _{n\rightarrow \infty } |b_i|/\Vert \mathbf {b}\Vert = b_i^*\in [0,1]\), \(i=1,2,\ldots .\) From Fatou’s lemma, we have \(\sum _{i=1}^{\infty } (b_i^{*})^2\le 1\). Consequently, \(\lim _{i\rightarrow \infty } b_i^* =0\).
Note that we have assumed that \(\lambda _1(\mathbf {A}^2)/{{\,\mathrm{tr}\,}}(\mathbf {A}^2)\rightarrow 0\). Then for every fixed integer \(r>0\),
Then there exists a sequence of positive integers \(r(n)\rightarrow \infty \) such that \({(\sum _{i=1}^{r(n)} a_{i,i}^2) }/{( \sum _{i=1}^n a_{i,i}^2) } \rightarrow 0\) and \(r(n)/n\rightarrow 0\). Write
which is a sum of independent random variables. The first term is negligible since \({{\,\mathrm{Var}\,}}( \sum _{i=1}^{r(n)} a_{i,i}(z_i^2-1) )=o(\sum _{i=1}^n a_{i,i}^2)\). Now we deal with the third term. From Berry–Esseen inequality (see, e.g., DasGupta 2008, Theorem 11.2), there exists an absolute constant \(C^*>0\), such that
By some simple algebra, there exist absolute constants \(C_1^*,C_2^*>0\) such that for sufficiently large n,
Since the right hand side tends to 0, we have
Note that \( \sum _{i=r(n)+1}^n \left( a_{i,i}(z_i^2-1) + b_i z_i \right) \) is independent of \(\sum _{i=1}^{r(n)} b_{i} z_i\) and \(\sum _{i=1}^{r(n)} b_i z_i\sim {\mathcal {N}}(0,\sum _{i=1}^{r(n)}b_i^2)\). Thus,
This completes the proof. \(\square \)
Proof
(Proof of Theorem 4) We note that
where
Since \(\mathbf {y}^{*\top } \mathbf {B}\mathbf {y}^*= \varvec{\epsilon }^\top \tilde{\mathbf {U}}_a \mathbf {B}\tilde{\mathbf {U}}_a^\top \varvec{\epsilon }+ 2\varvec{\epsilon }^\top \tilde{\mathbf {U}}_a \mathbf {B}\mathbf {X}_b^* \varvec{\beta }_b + \varvec{\beta }_b^{\top } \mathbf {X}_b^{*\top } \mathbf {B}\mathbf {X}_b^* \varvec{\beta }_b\), we have
To apply Proposition 2, we need to verify the condition \(\lambda _1\left( \mathbf {B}^2 \right) /{{\,\mathrm{tr}\,}}\left( \mathbf {B}^2 \right) \rightarrow 0\). It is straightforward to show that \({{\,\mathrm{tr}\,}}(\mathbf {B}^2) = ( n-q +2x^2 ) {{\,\mathrm{Var}\,}}( \gamma _I^k )\). On the other hand,
Thus,
which tends to 0 by the condition (9). Hence Proposition 2 implies that
Then the conclusion follows from (16), (17) and the following facts
\(\square \)
Proof
(Proof of Proposition 3) Fix an \(x\in {\mathbb {R}}\). In view of Theorem 4, we only need to show that \({{\,\mathrm{E}\,}}(\gamma _I w_I^2) = o(1)\) and
The former one is a consequence of the assumption \({{\,\mathrm{E}\,}}(\gamma _I w_I^2) = O((n-q)^{-1/2})\). For the latter one, we have
This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Wang, R., Xu, X. A Bayesian-motivated test for high-dimensional linear regression models with fixed design matrix. Stat Papers 62, 1821–1852 (2021). https://doi.org/10.1007/s00362-020-01157-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-020-01157-5