Abstract
Multiple testing has gained much attention in high-dimensional statistical theory and applications, and the problem of variable selection can be regarded as a generalization of the multiple testing. It is aiming to select the important variables among many variables. Performing variable selection in high-dimensional linear models with measurement errors is challenging. Both the influence of high-dimensional parameters and measurement errors need to be considered to avoid severely biases. We consider the problem of variable selection in error-in-variables and introduce the DCoCoLasso-FDP procedure, a new variable selection method. By constructing the consistent estimator of false discovery proportion (FDP) and false discovery rate (FDR), our method can prioritize the important variables and control FDP and FDR at a specifical level in error-in-variables models. An extensive simulation study is conducted to compare DCoCoLasso-FDP procedure with existing methods in various settings, and numerical results are provided to present the efficiency of our method.
Similar content being viewed by others
References
Barber, R.F., Candes, E.J.: Controlling the false discovery rate via knockoffs. Ann. Stat. 43(5), 2055–2085 (2015)
Belloni, A., Chernozhukov, V., Kaul, A.: Confidence bands for coefficients in high dimensional linear models with error-in-variables. arXiv: Statistics Theory (2017)
Belloni, A., Rosenbaum, M., Tsybakov, A.B.: Linear and conic programming approaches to-high dimensional errors-in-variables models. J. R. Stat. Soc. Ser. B. (2014) (forthcoming)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57(1), 289–300 (1995)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)
Chen, X., Doerge, R.W.: A strong law of large numbers related to multiple testing normal means. arXiv: Statistics Theory (2014)
Datta, A., Zou, H.: Cocolasso for high-dimensional error-in-variables regression. Ann. Stat. 45(6), 2400–2426 (2017)
Fan, J., Han, X., Gu, W.: Estimating false discovery proportion under arbitrary covariance dependence. J. Am. Stat. Assoc. 107(499), 1019–1035 (2012)
Gsell, M., Wager, S., Chouldechova, A., Tibshirani, R.: Sequential selection procedures and false discovery rate control. J. R. Stat. Soc. Ser. B Stat. Methodol. 78(2), 423–444 (2016)
Hartigan, J.A.: Bounding the maximum of dependent random variables. Electron. J. Stat. 8(2), 3126–3140 (2014)
Jeng, X.J., Chen, X.: Predictor ranking and false discovery proportion control in high-dimensional regression. J. Multivar. Anal. 171, 163–175 (2019)
Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Publ. Am. Stat. Assoc. 96(December), 1348–1360 (2001)
Loh, P., Wainwright, M.J.: High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. Ann. Stat. 40(3), 1637–1664 (2012)
Peter, B.: Stability selection. J. R. Stat. Soc. 72(4), 417–473 (2010)
Rosenbaum, M., Tsybakov, A.B.: Sparse recovery under matrix uncertainty. Ann. Stat. 38(5), 2620–2651 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
Van de Geer, S., Buhlmann, P., Ritov, Y., Dezeure, R.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014)
Wang, Z., Xue, L.: Inference for high dimensional linear models with error-in-variables. Commun. Stat. Simul. Comput. 13, 1–10 (2019). https://doi.org/10.1080/03610918.2018.1554108
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7(12), 2541–2563 (2006)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Research is supported by Grant No. 11901006 from the National Natural Science Foundation of China, by Grant Nos. 1908085MA20 and 1908085QA06 from the Natural Science Foundation of Anhui Province.
Appendix: Proof of the Main Results
Appendix: Proof of the Main Results
1.1 Appendix A: Proof of Theorem 3.7
We first need to prove the following lemma, which is essential for the proof of theorem.
The conclusion about the convergence rate in Lemmas 2.1 and 2.2 has been proved in many literatures, such as [2, 7] and so on.
Their proofs are similar and we omit them.
Proof of Lemma 3.5:
We focus first on the case when the covariance matrix \({\varvec{\Sigma }}\) is unknown, then we get results of the known covariance matrix \({\varvec{\Sigma }}\) by \(\widehat{\varvec{\Theta }}={\varvec{\Sigma }}_0^{-1}\). Recall the Eq. (2.9), let \(\varvec{t}=\varvec{\widehat{\Theta }\zeta }_n\), we decompose \(\sqrt{n}(\varvec{\hat{b}-\beta ^0})\) into following two items:
where \(\varvec{\widehat{\Sigma }}=n^{-1}\mathbf{Z}^T\mathbf{Z}-\varvec{\Sigma }_{\mathbf{w}}\), \(\varvec{\zeta }_n=\sqrt{n}\{(n^{-1}\mathbf{Z}^T\mathbf{X}-\varvec{\widehat{\Sigma })\beta ^0}+n^{-1}\mathbf{Z}^T\varvec{ \varepsilon }\} =n^{-1/2}\sum _{i=1}^n\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\beta }^{\varvec{0}})+\varvec{\Sigma _w\beta ^0}\} =n^{-1/2}\sum _{i=1}^n\varvec{\zeta }_{ni}\).
Because \(\varvec{ \varepsilon }\perp \mathbf{X}\), \(\mathbf{W}\perp \mathbf{X}\), \(\mathbf{W}\perp \varvec{\varepsilon }\), and zero-mean conditions, we have that \(E(\varvec{\zeta }_n|\mathbf{X})=0\). Observe that \(\{\zeta _{ni}\}_{i=1}^n\) are independent identically distributed random vector, we apply Lindeberg levy central limiting theorem, we have
where \( { {\varvec{\Gamma }}}=\text {var}(\zeta _{ni}|\mathbf{X})=E[\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\beta }^{\varvec{0}})+\varvec{\Sigma _w\beta ^0}\}\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\beta }^{\varvec{0}})+\varvec{\Sigma _w\beta ^0}\}^{T}|\mathbf{X}]. \) Consequently, when \({\varvec{\Sigma }}\) is assumed to be known, we have
while \({\varvec{\Sigma }}\) is unavailable, we can plug the \(\widehat{\varvec{\Theta }}\widehat{\Gamma }\widehat{\Theta }^T\) as the estimation of \({\varvec{\Sigma }}_0^{-1}{\varvec{\Gamma }}{\varvec{\Sigma }}_0^{-1}\). Then
where
\(\square \)
Proof of Lemma 3.6:
We just need to prove that \(\varvec{\delta }\) satisfies the conclusion in Lemma 3.6, \(\varvec{\delta _0}\)’s proof is similar. For \(\varvec{\delta }\), we apply Holder inequality,
Lemma 5.5 of [18] implies
with probability at least \(1-c_1\exp (-c_2\log p)\). This, together with (2.13), we have
with probability at least \(1-c_1\exp (-c_2\log p)\). This completes the proof. \(\square \)
Lemma A.1
Consider the estimator \(\widehat{\varvec{\Theta }}\) from the modified node-wise regression, and suppose the Assumptions 3.1 and 3.3 hold. Then for suitable tuning parameters \(\lambda _j\), \(j\in \{1,\ldots ,p\}\), we have
Furthermore,
Proof
Let us simplify \({\widehat{{\varvec{\Gamma }}}}\) first, then we have
where \(b^2=(\varvec{\hat{\beta }}^\mathbf{coco})^{T}\varvec{\hat{\beta }^{coco}}\) and \(\varvec{\widehat{\Gamma }_1=(\Sigma +\Sigma _w)}+nb^2\sigma _\varepsilon ^{-2}\varvec{\Sigma _w\Sigma }\). Thus, we see that \(\varvec{\widehat{\Omega }=\widehat{\Theta }\widehat{\Gamma }\widehat{\Theta }}^T=\sigma _\varepsilon ^2\varvec{\widehat{\Theta }\widehat{\Gamma }_1\widehat{\Theta }}^T.\)
Then, we will give a brief explanation of the above steps.
-
(1)
Recall \(\varvec{\varepsilon \perp X, W\perp X, W\perp \varepsilon }\) and zero-mean, so \(\text {cov}(\mathbf{Z}^T\varvec{\varepsilon },\mathbf{Z}^T\mathbf{W}\varvec{\hat{\beta }^{coco}})=0\).
-
(2)
We define \({{\varvec{\phi }}}_n=\mathbf{Z}^T\varvec{\varepsilon }=(\mathbf{X}+\mathbf{W})^T\varvec{\varepsilon }\). For any \(j\in \{1,\ldots ,p\}\),
$$\begin{aligned}{}[\phi _n]_j=\sum \limits _{i=1}^n\underbrace{(x_{ij}+w_{ij})\varepsilon _i}_{\nu _{ij}}. \end{aligned}$$
Since \(\mathbf{X}, \mathbf{W}, \varvec{ \varepsilon }\) are perpendicular to each other, for each \(j\in \{1,...,p\}\), we have
and for \(j\ne k\),
Therefore, we can obtain \(\text {var}(n^{-1/2}\mathbf{Z}^T\varvec{\varepsilon })=({\varvec{\Sigma }}+\varvec{\Sigma _w)\sigma _\varepsilon }^2\). Also, \(\text {var}(n^{-1/2}\mathbf{X}^T\mathbf{W}\varvec{\hat{\beta }^{coco}})=nb^2{\varvec{\Sigma }}\varvec{\Sigma _w}\) can be achieved in the similar way.
When Assumption 3.3 holds, lemma 5.5 of [18] gives
From what we have been discussed above, and put \(nb^2\sqrt{\max _j s_j}=O_\mathbb {P}(1)\), then we next prove the upper bound for \(\Vert \varvec{\widehat{\Theta }\widehat{\Gamma }_1-I}\Vert _\infty \). By triangle inequality and Holder’s inequality, we get
on account of \(n\gg s_j\log p\) and thus the first equality in Lemma A.1 holds.
Next, we prove the second equality. For convenience, we first give the following results which have been mentioned in Lemma 5.5 of [18].
Combined with Eq. (2.13), we show that
As a result,
With the above conclusions, we can prove the last equation.
Given that \(\Vert { \widehat{{\varvec{\Omega }}}}\Vert _1=O(\Vert \sigma _\varepsilon ^2\widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T\Vert )\) and \(p\sqrt{\log (p)}\gg \sqrt{n}\), then we have
These results will be applied in the next proof of Theorem. \(\square \)
Proof of Theorem 3.7:
Recall that \( \sqrt{n}(\varvec{\hat{b}-\beta ^0})=\mathbf{t}-\varvec{\delta }\), where \(\mathbf{t}|\mathbf{X}\sim \mathcal {N}_p(0,{\widehat{\varvec{\Omega }}})\). By making similar standardization with \(\hat{b}\), for arbitrary \(j\in \{1,\ldots ,p\}\), we can get \(\tilde{\beta ^0_j},\tilde{t}_j,\tilde{\delta }_j\). Here,
Then, we have
and know each \(\tilde{t}_j\) has unit variance. Let \(\varvec{\tilde{\beta ^0}}=(\tilde{\beta ^0_1},\ldots ,\tilde{\beta ^0_p})^T, \varvec{\tilde{t}}=(\tilde{t}_1,\ldots ,\tilde{t}_p)^T\) and \(\varvec{\tilde{\delta }}=(\tilde{\delta }_1,\ldots ,\tilde{\delta }_p)^T\). Recall \({\varvec{\Theta }}={\varvec{\Sigma }}^{-1}\). By \(\Vert \widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T-\Theta \Vert _\infty =o_{\mathbb {P}}\left( 1\right) \) in Lemma A.1,then the largest and smallest among \((\widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T)_{jj}\) are between \(C_{\min }\) and \(C_{\max }\). Consequently, using Lemma 3.6, we can obtain
Note that, under Assumption 3.3, \(Q_p(s_0,n)=o(1)\),then \(\Vert \varvec{\tilde{\delta }}\Vert _\infty =o_{\mathbb {P}}(1)\).
Subsequently, we divided the proof in two steps. Step 1 finds the lower bound of \(\min \limits _{j\in S_0}|w_j|\), and step 2 establishes the upper bound for \(\max \limits _{j\in P_0}|w_j|\).
Step 1: Applying Theorem 3.3 in [10] with unit Gaussian random variables \(\tilde{t}_j\) and an exponential variable \(\kappa \) with expectation 1, we have
as \(p_0\rightarrow \infty \). Combining the bounds of \(\tilde{t}_j\) and \(\tilde{\delta }_j\), as \(p_0\rightarrow \infty \), we get
Step 2: Similar to \(\max \limits _{j\in P_0}|w_j|\) in step 1. Note that \(s_0\le p_0\),when \(s_0\rightarrow \infty \), we see that
Furthermore, we obtain
For the convenience of narrative, let \(L_{p_0}=\sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )+2\kappa }\). The final part of the proof is to consider the probability
This probability converges to 1 as \(s_0\rightarrow \infty \) if for some positive constants \(n_0\), the following inequality
holds. By \(|\tilde{\beta ^0_j}|\)’s definition and some algebra operations, the above condition is equivalent to
The proof of Theorem 3.7 is now complete. \(\square \)
1.2 Appendix B: Proof of Theorem 3.8
In this section, we turn to the proof of Theorem 3.8. Also, we include in the main text the important lemma.
Lemma A.2
Consider standardized DCoCoLasso estimator \(w_j\) and the definition of \(\tilde{t}_j\). Suppose the Assumption 3.1 and 3.3 hold, then
Furthermore,
Proof
Recall that \(w_j=\tilde{\beta ^0_j}+\tilde{t}_j-\tilde{\delta }_j\), for \(j\in P_0\), let \(F_{j}\) and \(\Phi _{j}\) be the cumulative distribution function of \(w_j\) and \(\tilde{t}_j\), respectively. Then, \(\tilde{\beta ^0_j}=0\) and each \(\tilde{t}_j\) has unit variance for all \(j\in P_0\). From the proof of Theorem 3.7, we get \(\Vert \varvec{\tilde{\delta }}\Vert _\infty =o_{\mathbb {P}}(1)\). By Lemma A.1, we have \(\Vert \widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T-{\varvec{\Sigma }}^{-1}\Vert _\infty =o_{\mathbb {P}}(1)\), so \(\tilde{t}\) has a nondegenerate multivariate normal distribution, and \(\Phi _{p,j}\) is absolutely continuous about Lebesgue measure on \(\mathbb {R}\). Therefore, for any \(x\in \mathbb {R}\), then yields
Further, the above equality implies
By the definitions of \(V_w(k)\) and \(V_{\tilde{t}}(k)\), we have
We next prove the last equality in Lemma A.2. For each distinct pair of \(i,j\in \{1,\ldots ,p\}\), define \(F_{i,j}\) as the joint CDF of \((w_i,w_j)\) and \(\Phi _{i,j}\) that of \((\tilde{t}_i,\tilde{t}_j)\). Hence, for any \(x,y\in \mathbb {R}\), we get
Then we have
Therefore,
Here, \(V_1\) is o(1) as \(p_0\rightarrow \infty \). This complete the proof. \(\square \)
Lemma A.3
If Assumption 3.1 and 3.3 hold, then
Proof
For \(i\ne j\), let \(\rho _{ij}\) be the correlation between \(\tilde{t}_i\) and \(\tilde{t}_j\), and \(\xi _j=\mathbb {I}(|\tilde{t}_j|\le k)\). Definite the following sets:
Namely, \(I_{2,p}\) records distinct pairs \((\xi _i,\xi _j)\) for \(i\ne j\) such that \(\tilde{t}_i\) and \(\tilde{t}_j\) are linearly dependent. Then
The term \(\sum \limits _{j=1}^p\text {var}(\xi _j)=O(p)\). Let R be the correlation matrix of \(\tilde{t}\), we have
Inequality (A.17) of [6] implies
Consequently, by Lemma A.1, we have that
In terms of the last equality of Lemma A.3. In fact, we can get the result by replacing \(p_0\) with p. This completes the proof of the lemma. \(\square \)
Proof of Theorem 3.8:
In order to compactly derive the theorem, we give the following additional definition.
Define Marginal FDR as
Recall that \(R_w(k)=\sum \limits _{j=1}^p\mathbb {I}(|w_j|>k)\) and \(V_w(k)=\sum \limits _{j\in P_0}\mathbb {I}(|w_j|>k)\), then define
In addition, we see \(R_w(k)\vee 1=R_w(k)\) for all p large enough.
Next, we divide the proof into three steps.
Step A. By Lemma A.3, under the condition of Assumption 3.3, we can apply the Chebyshev’s inequality to the \(p_0^{-1}V_{\tilde{t}}(k)\), then we have
Combining \(p_0^{-1}|V_w(k)-V_{\tilde{t}}(k)|\rightarrow 0 \quad a.s.\quad p_0\rightarrow \infty \) in Lemma A.2, we get
Subsequently, we show that the \(R_w(k)\) has the lower bound. Because \(R_w(k)\ge V_w(k)\) and \(\mathbb {E}[V_{\tilde{t}}(k)]=\sum \limits _{j\in P_0}\mathbb {I}(|\tilde{t}_j|>k)=2p_0\Phi (-k)\), then
Finally, we have
Due to \(p=p_0+s_0\) and \(s_0/p=o(1)\), then we have
Consequently, we prove \(\lim \limits _{p\rightarrow \infty }\left| \mathrm{FDP}_w(k)-\widehat{\mathrm{FDP}}(k)\right| =0 \quad a.s.\).
Step B. In step A, we know that \(R_w(k)\) is bounded away from 0 uniformly in p with the probability tending to 1. Therefore, it is sufficient to show
Given that \(p_0^{-1}\left| \mathbb {E}[V_w(k)]-\mathbb {E}[V_{\tilde{t}}(k)]\right| \rightarrow 0 \quad a.s.\) in Lemma A.2, we next state
Then, we just show \(p^{-2}\text {var}[R_w(k)]\rightarrow 0 \quad a.s. \quad p\rightarrow \infty \). To accomplish this, we will prove \( p_0^{-2}\text {var}[V_w(k)]\rightarrow 0\) and \(p^{-2}\text {var}[R_w(k)]-p_0^{-2}\text {var}[V_w(k)]\rightarrow 0\).
In terms of \( p_0^{-2}\text {var}[V_w(k)]\rightarrow 0\). By lemma 7, when \(n\gg s_j\log p\), we have
Combining
in Lemma A.2, then we will get the result.
Then for \(p^{-2}\text {var}[R_w(k)]-p_0^{-2}\text {var}[V_w(k)]\rightarrow 0\), observe that
then we have \(p^{-2}\text {var}[R_w(k)]-p_0^{-2}\text {var}[V_w(k)]\rightarrow 0\) because \(p_0/p\rightarrow 1\) and \(s_0/p=o(1)\).
Step C. Following the results in Step A and Step B, we have
Since \(R_w(k)\) is bounded away from 0 uniformly in p, then
Applying the dominated convergence theorem, we obtain
In the end, we decompose \(\widehat{FDP}(k)-FDR_w(k)\) by the above results,
Therefore, we have \(\lim \limits _{p\rightarrow \infty }\left| \widehat{\mathrm{FDP}}(k)-\mathrm{FDR}_w(k)\right| =0\). \(\square \)
Proof of Corollary 3.9:
By the definition of \(k_\alpha \) and \(\widehat{\mathrm{FDP}}(k_\alpha )\), we have
Further, for a small fixed constant \(\alpha \), we see that \(\mathbb {P}\{2p\Phi (-k_\alpha )\le p\alpha \}=1\). This implies that \(k_\alpha \) does not approach 0 as \(p\rightarrow \infty \), then we just consider positive constant value \(k_\alpha \) meets the results in theorem.
Theorem 3.8 tells us when \(p\rightarrow \infty \),
Then for any positive constant t, using the Lebesgue’s Dominated Convegence Theorem, we have
Therefore, \(\lim \limits _{p\rightarrow \infty }\mathbb {P}\{\mathrm{FDP}_w(k_\alpha )\le \alpha \}=1\).
About \(\lim \limits _{p\rightarrow \infty }\mathbb {P}\{\mathrm{FDR}_w(k_\alpha )\le \alpha \}=1\), we can use a similar method to prove. Combining \(\left| \widehat{\mathrm{FDP}}(k_\alpha )-\mathrm{FDR}_w(k_\alpha )\right| \rightarrow 0\) in theorem 3.8, we obtain
Together with \(\mathbb {P}\{\widehat{\mathrm{FDP}}(k_\alpha )\le \alpha \}=1\), we complete the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Huang, X., Bao, N., Xu, K. et al. Variable Selection in High-Dimensional Error-in-Variables Models via Controlling the False Discovery Proportion. Commun. Math. Stat. 10, 123–151 (2022). https://doi.org/10.1007/s40304-020-00233-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40304-020-00233-4
Keywords
- Multiple testing
- High-dimensional inference
- False discovery proportion
- Measurement error models
- Variable selection