Variable Selection in High-Dimensional Error-in-Variables Models via Controlling the False Discovery Proportion

Huang, Xudong; Bao, Nana; Xu, Kai; Wang, Guanpeng

doi:10.1007/s40304-020-00233-4

Variable Selection in High-Dimensional Error-in-Variables Models via Controlling the False Discovery Proportion

Published: 31 May 2021

Volume 10, pages 123–151, (2022)
Cite this article

Communications in Mathematics and Statistics Aims and scope Submit manuscript

Xudong Huang ORCID: orcid.org/0000-0002-5818-6528¹,
Nana Bao¹,
Kai Xu¹ &
…
Guanpeng Wang²

583 Accesses
2 Citations
Explore all metrics

Abstract

Multiple testing has gained much attention in high-dimensional statistical theory and applications, and the problem of variable selection can be regarded as a generalization of the multiple testing. It is aiming to select the important variables among many variables. Performing variable selection in high-dimensional linear models with measurement errors is challenging. Both the influence of high-dimensional parameters and measurement errors need to be considered to avoid severely biases. We consider the problem of variable selection in error-in-variables and introduce the DCoCoLasso-FDP procedure, a new variable selection method. By constructing the consistent estimator of false discovery proportion (FDP) and false discovery rate (FDR), our method can prioritize the important variables and control FDP and FDR at a specifical level in error-in-variables models. An extensive simulation study is conducted to compare DCoCoLasso-FDP procedure with existing methods in various settings, and numerical results are provided to present the efficiency of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Barber, R.F., Candes, E.J.: Controlling the false discovery rate via knockoffs. Ann. Stat. 43(5), 2055–2085 (2015)
Article MathSciNet Google Scholar
Belloni, A., Chernozhukov, V., Kaul, A.: Confidence bands for coefficients in high dimensional linear models with error-in-variables. arXiv: Statistics Theory (2017)
Belloni, A., Rosenbaum, M., Tsybakov, A.B.: Linear and conic programming approaches to-high dimensional errors-in-variables models. J. R. Stat. Soc. Ser. B. (2014) (forthcoming)
Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57(1), 289–300 (1995)
MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)
Article Google Scholar
Chen, X., Doerge, R.W.: A strong law of large numbers related to multiple testing normal means. arXiv: Statistics Theory (2014)
Datta, A., Zou, H.: Cocolasso for high-dimensional error-in-variables regression. Ann. Stat. 45(6), 2400–2426 (2017)
Article MathSciNet Google Scholar
Fan, J., Han, X., Gu, W.: Estimating false discovery proportion under arbitrary covariance dependence. J. Am. Stat. Assoc. 107(499), 1019–1035 (2012)
Article MathSciNet Google Scholar
Gsell, M., Wager, S., Chouldechova, A., Tibshirani, R.: Sequential selection procedures and false discovery rate control. J. R. Stat. Soc. Ser. B Stat. Methodol. 78(2), 423–444 (2016)
Article MathSciNet Google Scholar
Hartigan, J.A.: Bounding the maximum of dependent random variables. Electron. J. Stat. 8(2), 3126–3140 (2014)
Article MathSciNet Google Scholar
Jeng, X.J., Chen, X.: Predictor ranking and false discovery proportion control in high-dimensional regression. J. Multivar. Anal. 171, 163–175 (2019)
Article MathSciNet Google Scholar
Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. Publ. Am. Stat. Assoc. 96(December), 1348–1360 (2001)
MathSciNet MATH Google Scholar
Loh, P., Wainwright, M.J.: High-dimensional regression with noisy and missing data: provable guarantees with nonconvexity. Ann. Stat. 40(3), 1637–1664 (2012)
Article MathSciNet Google Scholar
Peter, B.: Stability selection. J. R. Stat. Soc. 72(4), 417–473 (2010)
Article MathSciNet Google Scholar
Rosenbaum, M., Tsybakov, A.B.: Sparse recovery under matrix uncertainty. Ann. Stat. 38(5), 2620–2651 (2010)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Van de Geer, S., Buhlmann, P., Ritov, Y., Dezeure, R.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014)
MathSciNet MATH Google Scholar
Wang, Z., Xue, L.: Inference for high dimensional linear models with error-in-variables. Commun. Stat. Simul. Comput. 13, 1–10 (2019). https://doi.org/10.1080/03610918.2018.1554108
Article Google Scholar
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7(12), 2541–2563 (2006)
MathSciNet MATH Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101(476), 1418–1429 (2006)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics and Statistics, Anhui Normal University, 189 South Jiuhua Road, Wuhu, 241002, Anhui, People’s Republic of China
Xudong Huang, Nana Bao & Kai Xu
School of Mathematical Sciences, Capital Normal University, 105 West Third Ring Road, Beijing, 100048, People’s Republic of China
Guanpeng Wang

Authors

Xudong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Nana Bao
View author publications
You can also search for this author in PubMed Google Scholar
Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Guanpeng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xudong Huang.

Additional information

Research is supported by Grant No. 11901006 from the National Natural Science Foundation of China, by Grant Nos. 1908085MA20 and 1908085QA06 from the Natural Science Foundation of Anhui Province.

Appendix: Proof of the Main Results

1.1 Appendix A: Proof of Theorem 3.7

We first need to prove the following lemma, which is essential for the proof of theorem.

The conclusion about the convergence rate in Lemmas 2.1 and 2.2 has been proved in many literatures, such as [2, 7] and so on.

Their proofs are similar and we omit them.

Proof of Lemma 3.5:

We focus first on the case when the covariance matrix ${\varvec{\Sigma }}$ is unknown, then we get results of the known covariance matrix ${\varvec{\Sigma }}$ by $\widehat{\varvec{\Theta }}={\varvec{\Sigma }}_0^{-1}$. Recall the Eq. (2.9), let $\varvec{t}=\varvec{\widehat{\Theta }\zeta }_n$, we decompose $\sqrt{n}(\varvec{\hat{b}-\beta ^0})$ into following two items:

$$\begin{aligned} \sqrt{n}\varvec{(\hat{b}-\beta ^0)}= & {} \sqrt{n}(\varvec{\hat{\beta }^{coco}}-\varvec{\beta }^{\varvec{0}})+\sqrt{n}\varvec{\widehat{\Theta }}\{n^{-1}\mathbf{Z}^T(\mathbf{X}\varvec{\beta }^{\varvec{0}}+ \varvec{\varepsilon }) -(n^{-1}\mathbf{Z}^T\mathbf{Z}-\varvec{\Sigma _w})\varvec{\hat{\beta }^{coco}}\}\\= & {} \sqrt{n}(\varvec{\hat{\beta }^{coco}}-\varvec{\beta }^{\varvec{0}})+\sqrt{n}\varvec{\widehat{\Theta }}\{n^{-1}\mathbf{Z}^T\varvec{\varepsilon }+n^{-1}\mathbf{Z}^T\mathbf{X}\varvec{\beta }^{\varvec{0}}- \varvec{\widehat{\Sigma }(\hat{\beta }^{coco}}-\varvec{\beta }^{\varvec{0}})-\varvec{\widehat{\Sigma }\beta ^0}\}\\= & {} \sqrt{n}\varvec{\widehat{\Theta }}\{(n^{-1}\mathbf{Z}^T\mathbf{X}-\varvec{\widehat{\Sigma })\beta ^0}+ n^{-1}\mathbf{Z}^T\varvec{\varepsilon }\}-\sqrt{n}(\widehat{\varvec{\Theta }}\widehat{\Sigma }-\mathbf{I})(\varvec{\hat{\beta }^{coco}-\beta ^0})\\= & {} \varvec{ t-\delta }, \end{aligned}$$

where $\varvec{\widehat{\Sigma }}=n^{-1}\mathbf{Z}^T\mathbf{Z}-\varvec{\Sigma }_{\mathbf{w}}$, $\varvec{\zeta }_n=\sqrt{n}\{(n^{-1}\mathbf{Z}^T\mathbf{X}-\varvec{\widehat{\Sigma })\beta ^0}+n^{-1}\mathbf{Z}^T\varvec{ \varepsilon }\} =n^{-1/2}\sum _{i=1}^n\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\beta }^{\varvec{0}})+\varvec{\Sigma _w\beta ^0}\} =n^{-1/2}\sum _{i=1}^n\varvec{\zeta }_{ni}$.

Because $\varvec{ \varepsilon }\perp \mathbf{X}$, $\mathbf{W}\perp \mathbf{X}$, $\mathbf{W}\perp \varvec{\varepsilon }$, and zero-mean conditions, we have that $E(\varvec{\zeta }_n|\mathbf{X})=0$. Observe that $\{\zeta _{ni}\}_{i=1}^n$ are independent identically distributed random vector, we apply Lindeberg levy central limiting theorem, we have

$$\begin{aligned} \varvec{\zeta }_n \xrightarrow {D}\mathcal {N}_p(0,{{\varvec{\Gamma }}}), \end{aligned}$$

where $ { {\varvec{\Gamma }}}=\text {var}(\zeta _{ni}|\mathbf{X})=E[\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\beta }^{\varvec{0}})+\varvec{\Sigma _w\beta ^0}\}\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\beta }^{\varvec{0}})+\varvec{\Sigma _w\beta ^0}\}^{T}|\mathbf{X}]. $ Consequently, when ${\varvec{\Sigma }}$ is assumed to be known, we have

$$\begin{aligned} \mathbf{t_0} \xrightarrow {D}\mathcal {N}_p(0,{\varvec{\Sigma }}_0^{-1}{{\varvec{\Gamma }}}{\varvec{\Sigma }}_0^{-1}), \end{aligned}$$

while ${\varvec{\Sigma }}$ is unavailable, we can plug the $\widehat{\varvec{\Theta }}\widehat{\Gamma }\widehat{\Theta }^T$ as the estimation of ${\varvec{\Sigma }}_0^{-1}{\varvec{\Gamma }}{\varvec{\Sigma }}_0^{-1}$. Then

$$\begin{aligned} \mathbf{t} \xrightarrow {D}\mathcal {N}_p(0,\widehat{\varvec{\Theta }}\widehat{\Gamma }\widehat{\Theta }^T), \end{aligned}$$

where

$$\begin{aligned} \widehat{\Gamma }= & {} E[\{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\hat{\beta }^{coco}})+\varvec{\Sigma _w\hat{\beta }^{coco}}\} \{\mathbf{Z}_i(\varepsilon _i-\mathbf{W}_i^T\varvec{\hat{\beta }^{coco}})+\varvec{\Sigma _w\hat{\beta }^{coco}}\}^T |\mathbf{X}]\\= & {} n^{-1}\sum _{i=1}^{n}\{\mathbf{Z}_i(Y_i-\mathbf{Z}_i^T\varvec{\hat{\beta }^{coco}+\Sigma _w\hat{\beta }^{coco}})\} \{\mathbf{Z}_i(Y_i-\mathbf{Z}_i^T\varvec{\hat{\beta }^{coco}+\Sigma _w\hat{\beta }^{coco})\}}^T. \end{aligned}$$

$\square $

Proof of Lemma 3.6:

We just need to prove that $\varvec{\delta }$ satisfies the conclusion in Lemma 3.6, $\varvec{\delta _0}$’s proof is similar. For $\varvec{\delta }$, we apply Holder inequality,

$$\begin{aligned} \Vert \varvec{ \delta }\Vert _\infty \le \sqrt{n}\Vert \widehat{\varvec{\Theta }}\widehat{\Sigma }-I\Vert _\infty \Vert \varvec{\hat{\beta }^{coco}-\beta ^0}\Vert _1. \end{aligned}$$

Lemma 5.5 of [18] implies

$$\begin{aligned} \Vert \widehat{\varvec{\Theta }}\widehat{\Sigma }-I\Vert _\infty \le O_\mathbb {P}\left\{ \max \limits _js_j(1+\max \limits _j\Vert \gamma _j^0\Vert _2)\sqrt{\log (p)/n}\right\} , \end{aligned}$$

with probability at least $1-c_1\exp (-c_2\log p)$. This, together with (2.13), we have

$$\begin{aligned} \Vert \varvec{\delta }\Vert _\infty \le O_{\mathbb {P}}\left\{ (1+\Vert \varvec{\beta }^{\varvec{0}}\Vert _2)s_0\max \limits _js_j(1+\max \Vert \gamma _j^0\Vert _2)\log p/\sqrt{n}))\right\} , \end{aligned}$$

with probability at least $1-c_1\exp (-c_2\log p)$. This completes the proof. $\square $

Lemma A.1

Consider the estimator $\widehat{\varvec{\Theta }}$ from the modified node-wise regression, and suppose the Assumptions 3.1 and 3.3 hold. Then for suitable tuning parameters $\lambda _j$, $j\in \{1,\ldots ,p\}$, we have

$$\begin{aligned}&\Vert \widehat{\varvec{\Theta }}\widehat{\Gamma }_{\mathbf{1}}-\mathbf{I}\Vert _\infty =O_{\mathbb {P}}\left( \sqrt{\frac{\log (p)}{n}}\right) ,\\&\Vert \widehat{\varvec{\Theta }}\widehat{\Gamma }_1\widehat{\Theta }^T-{\varvec{\Theta }}\Vert _\infty =o_{\mathbb {P}}\left( 1\right) . \end{aligned}$$

Furthermore,

$$\begin{aligned} \Vert {\widehat{{\varvec{\Omega }}}}\Vert _1=O_{\mathbb {P}} \left( p^2\sqrt{\frac{\max _{j}s_j\log (p)}{n}}\right) . \end{aligned}$$

Proof

Let us simplify ${\widehat{{\varvec{\Gamma }}}}$ first, then we have

$$\begin{aligned} \varvec{\widehat{\Gamma }}&=_{\quad }&\text {var}(\sqrt{n}\{(n^{-1}\mathbf{Z}^T\mathbf{X}-\varvec{\widehat{\Sigma })\hat{\beta }^{coco}}+n^{-1}\mathbf{Z}^T\varvec{\varepsilon }\})\\&=_{{(1)}}&\text {var}(n^{-1/2}\mathbf{Z}^T\varvec{\varepsilon })+\text {var}(n^{-1/2}\mathbf{Z}^T\mathbf{W}\varvec{\hat{\beta }^{coco}})\\&=_{\quad }&\text {var}(n^{-1/2}\mathbf{Z}^T{\varvec{\varepsilon }})+\text {var}(n^{-1/2}\mathbf{X}^T\mathbf{W}\varvec{\hat{\beta }^{coco}})\\&=_{\tiny {(2)}}&({\varvec{\Sigma }}+\varvec{\Sigma _w})\sigma _\varepsilon ^2+nb^2\varvec{\Sigma _w\Sigma }\\&=_{\quad }&\sigma _\varepsilon ^2\varvec{\widehat{\Gamma }_1}, \end{aligned}$$

where $b^2=(\varvec{\hat{\beta }}^\mathbf{coco})^{T}\varvec{\hat{\beta }^{coco}}$ and $\varvec{\widehat{\Gamma }_1=(\Sigma +\Sigma _w)}+nb^2\sigma _\varepsilon ^{-2}\varvec{\Sigma _w\Sigma }$. Thus, we see that $\varvec{\widehat{\Omega }=\widehat{\Theta }\widehat{\Gamma }\widehat{\Theta }}^T=\sigma _\varepsilon ^2\varvec{\widehat{\Theta }\widehat{\Gamma }_1\widehat{\Theta }}^T.$

Then, we will give a brief explanation of the above steps.

(1)
Recall $\varvec{\varepsilon \perp X, W\perp X, W\perp \varepsilon }$ and zero-mean, so $\text {cov}(\mathbf{Z}^T\varvec{\varepsilon },\mathbf{Z}^T\mathbf{W}\varvec{\hat{\beta }^{coco}})=0$.
(2)
We define ${{\varvec{\phi }}}_n=\mathbf{Z}^T\varvec{\varepsilon }=(\mathbf{X}+\mathbf{W})^T\varvec{\varepsilon }$. For any $j\in \{1,\ldots ,p\}$,
$$\begin{aligned}{}[\phi _n]_j=\sum \limits _{i=1}^n\underbrace{(x_{ij}+w_{ij})\varepsilon _i}_{\nu _{ij}}. \end{aligned}$$

Since $\mathbf{X}, \mathbf{W}, \varvec{ \varepsilon }$ are perpendicular to each other, for each $j\in \{1,...,p\}$, we have

$$\begin{aligned} \text {var}(\nu _{ij}|\mathbf{X})=\text {var}(\sum \limits _{i=1}^n(x_{ij}+w_{ij})\varepsilon _i) =\sum \limits _{i=1}^n\sigma _\varepsilon ^2(x_{ij}^2+w_{ij}^2), \end{aligned}$$

and for $j\ne k$,

$$\begin{aligned} \text {cov}(\nu _{ij},\nu _{ik}|\mathbf{X})= & {} E(|\nu _{ij}\nu _{ik}|^2|\mathbf{X}) =\sum \limits _{i=1}^n\sigma _\varepsilon ^2E((x_{ij}+w_{ij})(x_{ik}+w_{ik}))\\= & {} \sum \limits _{i=1}^n\sigma _\varepsilon ^2(x_{ij}x_{ik}+w_{ij}w_{ik}). \end{aligned}$$

Therefore, we can obtain $\text {var}(n^{-1/2}\mathbf{Z}^T\varvec{\varepsilon })=({\varvec{\Sigma }}+\varvec{\Sigma _w)\sigma _\varepsilon }^2$. Also, $\text {var}(n^{-1/2}\mathbf{X}^T\mathbf{W}\varvec{\hat{\beta }^{coco}})=nb^2{\varvec{\Sigma }}\varvec{\Sigma _w}$ can be achieved in the similar way.

When Assumption 3.3 holds, lemma 5.5 of [18] gives

$$\begin{aligned}&\Vert \widehat{\varvec{\Theta }}{\varvec{\Sigma }}-\mathbf{I}\Vert _\infty \le O_\mathbb {P}\{\max \limits _js_j(1+\max \limits _j\Vert \gamma _j^0\Vert _2)\sqrt{\log (p)/n})\} =o_\mathbb {P}(1),\\&\Vert \varvec{\widehat{\Theta }}\Vert _{\ell _\infty }=\max \limits _j(1+\Vert \hat{\gamma _j}\Vert _1)/\hat{\tau _j}^2\le O_{\mathbb {P}}\left( \max \limits _j\sqrt{s_j}\Vert \gamma _j^0\Vert _2\right) = O_{\mathbb {P}}(\max \limits _j\sqrt{s_j}). \end{aligned}$$

From what we have been discussed above, and put $nb^2\sqrt{\max _j s_j}=O_\mathbb {P}(1)$, then we next prove the upper bound for $\Vert \varvec{\widehat{\Theta }\widehat{\Gamma }_1-I}\Vert _\infty $. By triangle inequality and Holder’s inequality, we get

$$\begin{aligned} \Vert \widehat{\varvec{\Theta }}\widehat{{\varvec{\Gamma }}}_{\mathbf{1}}-\mathbf{I}\Vert _\infty\le & {} \Vert \varvec{\widehat{\Theta }\Sigma }-\mathbf{I}\Vert _\infty +\Vert \varvec{\widehat{\Theta }\Sigma _w}\Vert _\infty +nb^2\sigma _\varepsilon ^{-2}\Vert \varvec{\widehat{\Theta }\Sigma _w\Sigma }\Vert _\infty \\\le & {} \Vert \widehat{\varvec{\Theta }}{\varvec{\Sigma }}-\mathbf{I}\Vert _\infty +\Vert \varvec{\widehat{\Theta }}\Vert _{\ell _\infty }\Vert \varvec{\Sigma _w}\Vert _\infty + nb^2\sigma _\varepsilon ^{-2}\Vert {\varvec{\Sigma }}\Vert _{\ell _1}\Vert \widehat{{\varvec{\Theta }}}\Vert _{\ell _\infty }\Vert \varvec{\Sigma _w}\Vert _\infty \\\le & {} O_{\mathbb {P}}\left( \sqrt{\max _j s_j\log (p)/n}\right) +O_{\mathbb {P}}\left( \sqrt{\log (p)/n}\right) \\\le & {} o_{\mathbb {P}}(1)+O_{\mathbb {P}}\left( \sqrt{\log (p)/n}\right) , \end{aligned}$$

on account of $n\gg s_j\log p$ and thus the first equality in Lemma A.1 holds.

Next, we prove the second equality. For convenience, we first give the following results which have been mentioned in Lemma 5.5 of [18].

$$\begin{aligned} \Vert \widehat{{\varvec{\Theta }}}_j^T\Vert _1\le & {} O_\mathbb {P}(\max _j\sqrt{s_j}),\quad \Vert \gamma _j^0\Vert _2\le \frac{\Lambda _{\max }}{\Lambda _{\min }}\le \frac{C_{\max }}{C_{\min }},\quad \left| \ \frac{1}{\hat{\tau }_j^2}-\frac{1}{\tau _j^2}\right| \\\le & {} O_{\mathbb {P}}\left( \sqrt{s_j\log (p)/n}\right) . \end{aligned}$$

Combined with Eq. (2.13), we show that

$$\begin{aligned} \Vert \widehat{\varvec{\Theta }}-{\varvec{\Theta }}\Vert _\infty\le & {} \max _j\Vert \widehat{\varvec{\Theta }}_j-{\varvec{\Theta }}_j\Vert _2\\\le & {} \Vert \hat{\gamma }_j-\gamma _j^0\Vert _2/\hat{\tau }_j^2+\Vert \gamma _j^0\Vert _2(1/\hat{\tau }_j^2-1/\tau _j^2) \\\le & {} O_{\mathbb {P}}\left( (1+\Vert \gamma _j^0\Vert _2)\sqrt{s_j\log (p)/n}\right) \\= & {} o_{\mathbb {P}}(1). \end{aligned}$$

As a result,

$$\begin{aligned} \Vert \widehat{\varvec{\Theta }}\widehat{{\varvec{\Gamma }}}_{\mathbf{1}} \widehat{{\varvec{\Theta }}}^T-{\varvec{\Theta }}\Vert _\infty\le & {} \Vert {(\widehat{{\varvec{\Theta }}}\widehat{{\varvec{\Gamma }}}_{\mathbf{1}}-\mathbf{I})\hat{{\varvec{\Theta }}}}^T\Vert _\infty +\Vert \widehat{\varvec{\Theta }}-{\varvec{\Theta }}\Vert _\infty \\\le & {} \max _j\Vert (\widehat{\varvec{\Theta }}\widehat{{\varvec{\Gamma }}}_{\mathbf{1}}-\mathbf{I})\Vert _{\infty }\Vert \widehat{{\varvec{\Theta }}}_j^T\Vert _1+\Vert \widehat{\varvec{\Theta }}-{\varvec{\Theta }}\Vert _\infty \\\le & {} O_{\mathbb {P}}\left( \sqrt{\max _j s_j\log (p)/n}\right) +o_{\mathbb {P}}(1) \\= & {} o_{\mathbb {P}}(1). \end{aligned}$$

With the above conclusions, we can prove the last equation.

$$\begin{aligned} \Vert \widehat{\varvec{\Theta }} \widehat{{\varvec{\Gamma }}}_{\mathbf{1}} \widehat{{\varvec{\Theta }}}^\mathbf{T}\Vert _1\le & {} \Vert (\widehat{{\varvec{\Theta }}}{\widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}-\mathbf{I})\widehat{{\varvec{\Theta }}}^T\Vert _1+\Vert \widehat{{\varvec{\Theta }}}^T\Vert _1\\\le & {} p^2\Vert (\widehat{{\varvec{\Theta }}}{\widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T-\widehat{{\varvec{\Theta }}}^T\Vert _\infty +p\Vert \widehat{{\varvec{\Theta }}}_j^T\Vert _1 \\\le & {} O_{\mathbb {P}}\left( p^2\sqrt{\max _j s_j\log (p)/n}\right) +O_{\mathbb {P}}(p\sqrt{\max _j s_j}). \end{aligned}$$

Given that $\Vert { \widehat{{\varvec{\Omega }}}}\Vert _1=O(\Vert \sigma _\varepsilon ^2\widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T\Vert )$ and $p\sqrt{\log (p)}\gg \sqrt{n}$, then we have

$$\begin{aligned} \Vert {\widehat{{\varvec{\Omega }}}}\Vert _1=O_{\mathbb {P}}\left( p^2\sqrt{\max _{j}s_j\log (p)/n}\right) . \end{aligned}$$

These results will be applied in the next proof of Theorem. $\square $

Proof of Theorem 3.7:

Recall that $ \sqrt{n}(\varvec{\hat{b}-\beta ^0})=\mathbf{t}-\varvec{\delta }$, where $\mathbf{t}|\mathbf{X}\sim \mathcal {N}_p(0,{\widehat{\varvec{\Omega }}})$. By making similar standardization with $\hat{b}$, for arbitrary $j\in \{1,\ldots ,p\}$, we can get $\tilde{\beta ^0_j},\tilde{t}_j,\tilde{\delta }_j$. Here,

$$\begin{aligned} \tilde{\beta ^0_j}=\sqrt{n}\beta ^0_j{\widehat{\varvec{\Omega }}}_{jj}^{-1/2}, \quad \tilde{t}_j=\sqrt{n}t_j{\widehat{\varvec{\Omega }}}_{jj}^{-1/2}, \quad \tilde{\delta }_j=\sqrt{n}\delta _j{\widehat{\varvec{\Omega }}}_{jj}^{-1/2}. \end{aligned}$$

Then, we have

$$\begin{aligned} w_j=\tilde{\beta ^0_j}+\tilde{t}_j-\tilde{\delta }_j \end{aligned}$$

and know each $\tilde{t}_j$ has unit variance. Let $\varvec{\tilde{\beta ^0}}=(\tilde{\beta ^0_1},\ldots ,\tilde{\beta ^0_p})^T, \varvec{\tilde{t}}=(\tilde{t}_1,\ldots ,\tilde{t}_p)^T$ and $\varvec{\tilde{\delta }}=(\tilde{\delta }_1,\ldots ,\tilde{\delta }_p)^T$. Recall ${\varvec{\Theta }}={\varvec{\Sigma }}^{-1}$. By $\Vert \widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T-\Theta \Vert _\infty =o_{\mathbb {P}}\left( 1\right) $ in Lemma A.1,then the largest and smallest among $(\widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T)_{jj}$ are between $C_{\min }$ and $C_{\max }$. Consequently, using Lemma 3.6, we can obtain

$$\begin{aligned} \mathbb {P}\left\{ \Vert \varvec{\tilde{\delta }}\Vert _\infty \le (\sigma _\varepsilon \sqrt{C_{\min }})^{-1} Q_p(s_0,n)\right\} \rightarrow 1. \end{aligned}$$

Note that, under Assumption 3.3, $Q_p(s_0,n)=o(1)$,then $\Vert \varvec{\tilde{\delta }}\Vert _\infty =o_{\mathbb {P}}(1)$.

Subsequently, we divided the proof in two steps. Step 1 finds the lower bound of $\min \limits _{j\in S_0}|w_j|$, and step 2 establishes the upper bound for $\max \limits _{j\in P_0}|w_j|$.

Step 1: Applying Theorem 3.3 in [10] with unit Gaussian random variables $\tilde{t}_j$ and an exponential variable $\kappa $ with expectation 1, we have

$$\begin{aligned} \max \limits _{j\in P_0}|\tilde{t}_j|^2\le \log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )+2\kappa \end{aligned}$$

as $p_0\rightarrow \infty $. Combining the bounds of $\tilde{t}_j$ and $\tilde{\delta }_j$, as $p_0\rightarrow \infty $, we get

$$\begin{aligned} \max \limits _{j\in P_0}|w_j|=\max \limits _{j\in P_0}|\tilde{t}_j+\tilde{\delta }_j| \le \sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )+2\kappa }. \end{aligned}$$

Step 2: Similar to $\max \limits _{j\in P_0}|w_j|$ in step 1. Note that $s_0\le p_0$,when $s_0\rightarrow \infty $, we see that

$$\begin{aligned} \max \limits _{j\in S_0}|\tilde{t}_j|\le & {} \sqrt{\log (s_0^2/2\pi )+\log \log (s_0^2/2\pi )+2\kappa }\\\le & {} \sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )+2\kappa }. \end{aligned}$$

Furthermore, we obtain

$$\begin{aligned} \min \limits _{j\in S_0}|w_j|=\min \limits _{j\in S_0}|\tilde{\beta ^0_j}+\tilde{t}_j-\tilde{\delta }_j| \ge \min \limits _{j\in S_0}|\tilde{\beta ^0_j}|-\sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )+2\kappa }. \end{aligned}$$

For the convenience of narrative, let $L_{p_0}=\sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )+2\kappa }$. The final part of the proof is to consider the probability

$$\begin{aligned} \mathbb {P}\left\{ \min \limits _{j\in S_0}|\tilde{\beta ^0_j}|- L_{p_0} \ge L_{p_0}\right\} =\mathbb {P}\left\{ \min \limits _{j\in S_0}|\tilde{\beta ^0_j}|/2 \ge L_{p_0}\right\} . \end{aligned}$$

This probability converges to 1 as $s_0\rightarrow \infty $ if for some positive constants $n_0$, the following inequality

$$\begin{aligned} \min \limits _{j\in S_0}|\tilde{\beta ^0_j}|/2\ge (1+n_0)\sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )} \end{aligned}$$

holds. By $|\tilde{\beta ^0_j}|$’s definition and some algebra operations, the above condition is equivalent to

$$\begin{aligned} \min \limits _{j\in S_0}|\beta ^0_j|\ge 2\sigma _\varepsilon \sqrt{C_{\max }/n}\left\{ (1+n_0)\sqrt{\log (p_0^2/2\pi )+\log \log (p_0^2/2\pi )}\right\} . \end{aligned}$$

The proof of Theorem 3.7 is now complete. $\square $

1.2 Appendix B: Proof of Theorem 3.8

In this section, we turn to the proof of Theorem 3.8. Also, we include in the main text the important lemma.

Lemma A.2

Consider standardized DCoCoLasso estimator $w_j$ and the definition of $\tilde{t}_j$. Suppose the Assumption 3.1 and 3.3 hold, then

$$\begin{aligned} p_0^{-1}\left| \mathbb {E}[V_w(k)]-\mathbb {E}[V_{\tilde{t}}(k)]\right| \rightarrow 0 \quad a.s. \quad \hbox {and} \quad p_0^{-1}\left| V_w(k)-V_{\tilde{t}}(k)\right| \rightarrow 0 \quad a.s. \end{aligned}$$

Furthermore,

$$\begin{aligned} p_0^{-2}\left| \text {var}[V_w(k)]-\text {var}[V_{\tilde{t}}(k)]\right| \rightarrow 0 \quad a.s. \quad as \quad p_0\rightarrow \infty . \end{aligned}$$

Proof

Recall that $w_j=\tilde{\beta ^0_j}+\tilde{t}_j-\tilde{\delta }_j$, for $j\in P_0$, let $F_{j}$ and $\Phi _{j}$ be the cumulative distribution function of $w_j$ and $\tilde{t}_j$, respectively. Then, $\tilde{\beta ^0_j}=0$ and each $\tilde{t}_j$ has unit variance for all $j\in P_0$. From the proof of Theorem 3.7, we get $\Vert \varvec{\tilde{\delta }}\Vert _\infty =o_{\mathbb {P}}(1)$. By Lemma A.1, we have $\Vert \widehat{{\varvec{\Theta }}}{ \widehat{{\varvec{\Gamma }}}_{\mathbf{1}}}\widehat{{\varvec{\Theta }}}^T-{\varvec{\Sigma }}^{-1}\Vert _\infty =o_{\mathbb {P}}(1)$, so $\tilde{t}$ has a nondegenerate multivariate normal distribution, and $\Phi _{p,j}$ is absolutely continuous about Lebesgue measure on $\mathbb {R}$. Therefore, for any $x\in \mathbb {R}$, then yields

$$\begin{aligned} \max \limits _{j\in P_0}|F_j(x)-\Phi _j(x)|=o_{\mathbb {P}}(1). \end{aligned}$$

Further, the above equality implies

$$\begin{aligned} \max \limits _{j\in P_0}|\mathbb {I}(|w_j|\le k)-\mathbb {I}(|\tilde{t}_j|\le k)|=o_{\mathbb {P}}(1). \end{aligned}$$

By the definitions of $V_w(k)$ and $V_{\tilde{t}}(k)$, we have

$$\begin{aligned} p_0^{-1}|\mathbb {E}[V_w(k)]-\mathbb {E}[V_{\tilde{t}}(k)]|=o_{\mathbb {P}}(1) \quad \hbox {and} \quad p_0^{-1}|V_w(k)-V_{\tilde{t}}(k)|=o_{\mathbb {P}}(1). \end{aligned}$$

We next prove the last equality in Lemma A.2. For each distinct pair of $i,j\in \{1,\ldots ,p\}$, define $F_{i,j}$ as the joint CDF of $(w_i,w_j)$ and $\Phi _{i,j}$ that of $(\tilde{t}_i,\tilde{t}_j)$. Hence, for any $x,y\in \mathbb {R}$, we get

$$\begin{aligned} \max \limits _{i\ne j,j\in P_0}|F_{i,j}(x,y)-\Phi _{i,j}(x,y)|=o_{\mathbb {P}}(1). \end{aligned}$$

Then we have

$$\begin{aligned}&\max \limits _{i\ne j,j\in P_0}| \text {cov}\{\mathbb {I}(|\tilde{t}_i-\tilde{\delta }_i|>k),\text {cov}\{\mathbb {I}(|\tilde{t}_j-\tilde{\delta }_j|>k)\}\\&\quad - \text {cov}\{\mathbb {I}(|\tilde{t}_i>k),\text {cov}\{\mathbb {I}(|\tilde{t}_j>k)\} |=o_{\mathbb {P}}(1). \end{aligned}$$

Therefore,

$$\begin{aligned}&\frac{1}{p_0^2}|\text {var}[V_w(k)]-\text {var}[V_{\tilde{t}}(k)]| \\&\quad =\underbrace{\frac{1}{p_0^2}\left( \sum \limits _{j\in P_0}\text {var}[\mathbb {I}(|\tilde{t}_j-\tilde{\delta }_j|>k)]-\sum \limits _{j\in P_0}\text {var}[\mathbb {I}(|\tilde{t}_j>k)] \right) }_{V_1}\\&\qquad +\frac{1}{p_0^2}\left( \sum \limits _{i\ne j,j\in P_0}\text {cov}[\mathbb {I}(|\tilde{t}_i-\tilde{\delta }_i|>k),\mathbb {I}(|\tilde{t}_j-\tilde{\delta }_j|>k)] -\text {cov}[\mathbb {I}(|\tilde{t}_i>k),\mathbb {I}(|\tilde{t}_j>k)] \right) \\&\quad =o_{\mathbb {P}}(1). \end{aligned}$$

Here, $V_1$ is o(1) as $p_0\rightarrow \infty $. This complete the proof. $\square $

Lemma A.3

If Assumption 3.1 and 3.3 hold, then

$$\begin{aligned}&p^{-2}\left| \text {var}[R_{\tilde{t}}(k)]\right| =O_{\mathbb {P}}\left\{ \max \left( 1/p,\sqrt{\max _js_j\log (p)/n}\right) \right\} ,\\&p_0^{-2}\left| \text {var}[V_{\tilde{t}}(k)]\right| =O_{\mathbb {P}}\left\{ \max \left( 1/p_0,\sqrt{\max _js_j\log (p)/n}\right) \right\} . \end{aligned}$$

Proof

For $i\ne j$, let $\rho _{ij}$ be the correlation between $\tilde{t}_i$ and $\tilde{t}_j$, and $\xi _j=\mathbb {I}(|\tilde{t}_j|\le k)$. Definite the following sets:

$$\begin{aligned} \left\{ \begin{array}{l} I_{1,p}=\{(i,j): 1\le i, j\le p, i\ne j, |\rho _{ij}|<1\},\\ I_{2,p}=\{(i,j): 1\le i, j\le p, i\ne j, |\rho _{ij}|=1\}. \end{array}\right. \end{aligned}$$

Namely, $I_{2,p}$ records distinct pairs $(\xi _i,\xi _j)$ for $i\ne j$ such that $\tilde{t}_i$ and $\tilde{t}_j$ are linearly dependent. Then

$$\begin{aligned} |\text {var}[R_{\tilde{t}}(k)]|=\sum \limits _{j=1}^p\text {var}(\xi _j)+\sum \limits _{(i,j)\in I_{2,p}}\text {cov}(\xi _i,\xi _j) +\sum \limits _{(i,j)\in I_{1,p}}\text {cov}(\xi _i,\xi _j). \end{aligned}$$

The term $\sum \limits _{j=1}^p\text {var}(\xi _j)=O(p)$. Let R be the correlation matrix of $\tilde{t}$, we have

$$\begin{aligned} \sum \limits _{(i,j)\in I_{2,p}}\text {cov}(\xi _i,\xi _j)=O(\Vert \mathbf{R}\Vert _1)=O(\Vert {\widehat{{\varvec{\Omega }}}\Vert _{\mathbf{1}}}). \end{aligned}$$

Inequality (A.17) of [6] implies

$$\begin{aligned} \sum \limits _{(i,j)\in I_{1,p}}\text {cov}(\xi _i,\xi _j)\le C\Vert \mathbf{R}\Vert _1=O(\Vert {\widehat{{\varvec{\Omega }}}\Vert _{\mathbf{1}}}). \end{aligned}$$

Consequently, by Lemma A.1, we have that

$$\begin{aligned} p^{-2}|\text {var}[R_{\tilde{t}}(k)]|= & {} O(1/p)+2O_{\mathbb {P}}\left( \sqrt{\max _js_j\log (p)/n}\right) \\= & {} O_{\mathbb {P}}\left\{ \max (1/p,\sqrt{\max _js_j\log (p)/n}\right\} . \end{aligned}$$

In terms of the last equality of Lemma A.3. In fact, we can get the result by replacing $p_0$ with p. This completes the proof of the lemma. $\square $

Proof of Theorem 3.8:

In order to compactly derive the theorem, we give the following additional definition.

Define Marginal FDR as

$$\begin{aligned} \mathrm{mFDR}_w(k)=\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)\vee 1]}. \end{aligned}$$

Recall that $R_w(k)=\sum \limits _{j=1}^p\mathbb {I}(|w_j|>k)$ and $V_w(k)=\sum \limits _{j\in P_0}\mathbb {I}(|w_j|>k)$, then define

$$\begin{aligned} R_{\tilde{t}}(k)=\sum \limits _{j=1}^p\mathbb {I}(|\tilde{t}_j|>k) \quad and \quad V_{\tilde{t}}(k)=\sum \limits _{j\in P_0}\mathbb {I}(|\tilde{t}_j|>k). \end{aligned}$$

In addition, we see $R_w(k)\vee 1=R_w(k)$ for all p large enough.

Next, we divide the proof into three steps.

$$\begin{aligned} \mathrm{Step} \quad \mathrm{A}&:&\lim \limits _{p\rightarrow \infty }\left| \mathrm{FDP}_w(k)-\widehat{\mathrm{FDP}}(k)\right| =0 \qquad \qquad \quad a.s. \\ \mathrm{Step} \quad \mathrm{B}&:&\lim \limits _{p\rightarrow \infty }\left| \widehat{\mathrm{FDP}}(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right| =0 \qquad \quad \qquad a.s.\\ \mathrm{Step} \quad \mathrm{C}&:&\lim \limits _{p\rightarrow \infty }\left| \widehat{\mathrm{FDP}}(k)-\mathrm{FDR}_w(k)\right| =0 \qquad \qquad \quad a.s. \end{aligned}$$

Step A. By Lemma A.3, under the condition of Assumption 3.3, we can apply the Chebyshev’s inequality to the $p_0^{-1}V_{\tilde{t}}(k)$, then we have

$$\begin{aligned} p_0^{-1}|V_{\tilde{t}}(k)-\mathbb {E}[V_{\tilde{t}}(k)]|\rightarrow 0 \quad a.s. \quad p_0\rightarrow \infty . \end{aligned}$$

Combining $p_0^{-1}|V_w(k)-V_{\tilde{t}}(k)|\rightarrow 0 \quad a.s.\quad p_0\rightarrow \infty $ in Lemma A.2, we get

$$\begin{aligned} p_0^{-1}|V_w(k)-\mathbb {E}[V_{\tilde{t}}(k)]|\rightarrow 0 \quad a.s.\quad p_0\rightarrow \infty . \end{aligned}$$

Subsequently, we show that the $R_w(k)$ has the lower bound. Because $R_w(k)\ge V_w(k)$ and $\mathbb {E}[V_{\tilde{t}}(k)]=\sum \limits _{j\in P_0}\mathbb {I}(|\tilde{t}_j|>k)=2p_0\Phi (-k)$, then

$$\begin{aligned} \lim \limits _{p_0\rightarrow \infty }\mathbb {P}\{R_w(k)\ge 2p_0\Phi (-k)\}=1. \end{aligned}$$

Finally, we have

$$\begin{aligned} \left| \frac{p_0^{-1}V_w(k)}{p^{-1}R_w(k)}-\frac{p_0^{-1}\mathbb {E}[V_{\tilde{t}}(k)]}{p^{-1}R_w(k)}\right| \rightarrow 0 \quad a.s.\quad p_0\rightarrow \infty . \end{aligned}$$

Due to $p=p_0+s_0$ and $s_0/p=o(1)$, then we have

$$\begin{aligned} \left| \frac{V_w(k)}{R_w(k)}-\frac{\mathbb {E}[V_{\tilde{t}}(k)]}{R_w(k)}\right| \rightarrow 0 \quad a.s.\quad p_0\rightarrow \infty . \end{aligned}$$

Consequently, we prove $\lim \limits _{p\rightarrow \infty }\left| \mathrm{FDP}_w(k)-\widehat{\mathrm{FDP}}(k)\right| =0 \quad a.s.$.

Step B. In step A, we know that $R_w(k)$ is bounded away from 0 uniformly in p with the probability tending to 1. Therefore, it is sufficient to show

$$\begin{aligned} \lim \limits _{p\rightarrow \infty }\left| \frac{\mathbb {E}[V_{\tilde{t}}(k)]}{R_w(k)}-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right| =0 \quad a.s. \end{aligned}$$

Given that $p_0^{-1}\left| \mathbb {E}[V_w(k)]-\mathbb {E}[V_{\tilde{t}}(k)]\right| \rightarrow 0 \quad a.s.$ in Lemma A.2, we next state

$$\begin{aligned} p^{-1}|R_w(k)-\mathbb {E}[R_w(k)]|\rightarrow 0 \quad a.s. \quad p\rightarrow \infty . \end{aligned}$$

Then, we just show $p^{-2}\text {var}[R_w(k)]\rightarrow 0 \quad a.s. \quad p\rightarrow \infty $. To accomplish this, we will prove $ p_0^{-2}\text {var}[V_w(k)]\rightarrow 0$ and $p^{-2}\text {var}[R_w(k)]-p_0^{-2}\text {var}[V_w(k)]\rightarrow 0$.

In terms of $ p_0^{-2}\text {var}[V_w(k)]\rightarrow 0$. By lemma 7, when $n\gg s_j\log p$, we have

$$\begin{aligned} p_0^{-2}\left| \text {var}[V_{\tilde{t}}(k)]\right| \rightarrow 0 \quad a.s. \quad p_0\rightarrow \infty . \end{aligned}$$

Combining

$$\begin{aligned} p_0^{-2}\left| \text {var}[V_w(k)]-\text {var}[V_{\tilde{t}}(k)]\right| \rightarrow 0 \quad a.s. \quad as \quad p_0\rightarrow \infty \end{aligned}$$

in Lemma A.2, then we will get the result.

Then for $p^{-2}\text {var}[R_w(k)]-p_0^{-2}\text {var}[V_w(k)]\rightarrow 0$, observe that

$$\begin{aligned} p^{-1}R_w(k)= & {} p^{-1}V_w(k)+p^{-1}\sum \limits _{j \in S_0}\mathbb {I}(|w_j|>k)\\= & {} (p_0/p)[p_0^{-1}V_w(k)]+(s_0/p)\sum \limits _{j \in S_0}\mathbb {I}(|w_j|>k)/s_0, \end{aligned}$$

then we have $p^{-2}\text {var}[R_w(k)]-p_0^{-2}\text {var}[V_w(k)]\rightarrow 0$ because $p_0/p\rightarrow 1$ and $s_0/p=o(1)$.

Step C. Following the results in Step A and Step B, we have

$$\begin{aligned} \lim \limits _{p\rightarrow \infty }\left| \mathrm{FDP}_w(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right| =0 \quad a.s. \end{aligned}$$

Since $R_w(k)$ is bounded away from 0 uniformly in p, then

$$\begin{aligned} \left| \mathrm{FDP}_w(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right| \le 2 \quad a.s. \end{aligned}$$

Applying the dominated convergence theorem, we obtain

$$\begin{aligned} \lim \limits _{p\rightarrow \infty }\mathbb {E}\left[ \mathrm{FDP}_w(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right] =0. \end{aligned}$$

In the end, we decompose $\widehat{FDP}(k)-FDR_w(k)$ by the above results,

$$\begin{aligned} \widehat{\mathrm{FDP}}(k)-\mathrm{FDR}_w(k)= & {} \widehat{\mathrm{FDP}}(k)-\mathbb {E}[\mathrm{FDP}_w(k)]\\= & {} \widehat{\mathrm{FDP}}(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}-\left( \mathbb {E}[\mathrm{FDP}_w(k)]-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right) \\= & {} \widehat{\mathrm{FDP}}(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}-\mathbb {E}\left( \mathrm{FDP}_w(k)-\frac{\mathbb {E}[V_w(k)]}{\mathbb {E}[R_w(k)]}\right) . \end{aligned}$$

Therefore, we have $\lim \limits _{p\rightarrow \infty }\left| \widehat{\mathrm{FDP}}(k)-\mathrm{FDR}_w(k)\right| =0$. $\square $

Proof of Corollary 3.9:

By the definition of $k_\alpha $ and $\widehat{\mathrm{FDP}}(k_\alpha )$, we have

$$\begin{aligned} \mathbb {P}\{\widehat{\mathrm{FDP}}(k_\alpha )\le \alpha \}=\mathbb {P}\left\{ \frac{2p\Phi (-k_\alpha )}{R_w(k)}\le \alpha \right\} =1. \end{aligned}$$

Further, for a small fixed constant $\alpha $, we see that $\mathbb {P}\{2p\Phi (-k_\alpha )\le p\alpha \}=1$. This implies that $k_\alpha $ does not approach 0 as $p\rightarrow \infty $, then we just consider positive constant value $k_\alpha $ meets the results in theorem.

Theorem 3.8 tells us when $p\rightarrow \infty $,

$$\begin{aligned} \left| \mathrm{FDP}_w(k_\alpha )-\widehat{\mathrm{FDP}}(k_\alpha )\right| \rightarrow 0 . \end{aligned}$$

Then for any positive constant t, using the Lebesgue’s Dominated Convegence Theorem, we have

$$\begin{aligned}&\lim \limits _{p\rightarrow \infty }\mathbb {P}\{|\mathrm{FDP}_w(k_\alpha )-\widehat{\mathrm{FDP}}(k_\alpha )|> t\}\\&\quad = \lim \limits _{p\rightarrow \infty }\mathbb {E}\left[ \mathbb {E}[\mathbb {I}\{|\mathrm{FDP}_w(k_\alpha )-\widehat{\mathrm{FDP}}(k_\alpha )|> t|k_\alpha \}]\right] \\&\quad =\mathbb {E}\left[ \lim \limits _{p\rightarrow \infty } \mathbb {E}[\mathbb {I}\{|\mathrm{FDP}_w(k_\alpha )-\widehat{\mathrm{FDP}}(k_\alpha )|> t|k_\alpha \}] \right] \\&\quad = 0 \end{aligned}$$

Therefore, $\lim \limits _{p\rightarrow \infty }\mathbb {P}\{\mathrm{FDP}_w(k_\alpha )\le \alpha \}=1$.

About $\lim \limits _{p\rightarrow \infty }\mathbb {P}\{\mathrm{FDR}_w(k_\alpha )\le \alpha \}=1$, we can use a similar method to prove. Combining $\left| \widehat{\mathrm{FDP}}(k_\alpha )-\mathrm{FDR}_w(k_\alpha )\right| \rightarrow 0$ in theorem 3.8, we obtain

$$\begin{aligned} \lim \limits _{p\rightarrow \infty }\mathbb {P}\{|\mathrm{FDR}_w(k_\alpha )-\widehat{\mathrm{FDP}}(k_\alpha )|> k\}=0. \end{aligned}$$

Together with $\mathbb {P}\{\widehat{\mathrm{FDP}}(k_\alpha )\le \alpha \}=1$, we complete the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X., Bao, N., Xu, K. et al. Variable Selection in High-Dimensional Error-in-Variables Models via Controlling the False Discovery Proportion. Commun. Math. Stat. 10, 123–151 (2022). https://doi.org/10.1007/s40304-020-00233-4

Download citation

Received: 27 July 2020
Revised: 20 September 2020
Accepted: 16 November 2020
Published: 31 May 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s40304-020-00233-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable Selection in High-Dimensional Error-in-Variables Models via Controlling the False Discovery Proportion

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of the Main Results

1.1 Appendix A: Proof of Theorem 3.7

Proof of Lemma 3.5:

Proof of Lemma 3.6:

Lemma A.1

Proof

Proof of Theorem 3.7:

1.2 Appendix B: Proof of Theorem 3.8

Lemma A.2

Proof

Lemma A.3

Proof

Proof of Theorem 3.8:

Proof of Corollary 3.9:

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Variable Selection in High-Dimensional Error-in-Variables Models via Controlling the False Discovery Proportion

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Proof of the Main Results

Appendix: Proof of the Main Results

1.1 Appendix A: Proof of Theorem 3.7

Proof of Lemma 3.5:

Proof of Lemma 3.6:

Lemma A.1

Proof

Proof of Theorem 3.7:

1.2 Appendix B: Proof of Theorem 3.8

Lemma A.2

Proof

Lemma A.3

Proof

Proof of Theorem 3.8:

Proof of Corollary 3.9:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation