Robust Feature Screening for Ultrahigh-Dimensional Censored Data Subject to Measurement Error

Chen, Li-Pang; Yi, Grace Y.

doi:10.1007/978-3-031-08329-7_2

Li-Pang Chen^7,8 &
Grace Y. Yi^9,10

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

427 Accesses
1 Citations

Abstract

Feature screening is commonly used to handle ultrahigh-dimensional data prior to conducting a formal data analysis. While various feature screening methods have been developed in the literature, research gaps still exist. The existing methods usually make an implicit assumption that data are accurately measured. This requirement, however, is frequently violated in applications. In this chapter, we consider error-prone ultrahigh-dimensional survival data and propose a robust feature screening method. We develop an iteration algorithm to improve the performance of retaining all informative covariates. Theoretical results are established for the proposed method. Simulation studies are reported to assess the performance of the proposed method, together with an application of the proposed method to handle a mantle cell lymphoma microarray dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Model. New York: CRC Press.
Book Google Scholar
Chen, L.-P. (2019). Iterated feature screening based on distance correlation for ultrahigh-dimensional censored data with covariates measurement error. arXiv:1901.01610.
Google Scholar
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95, 759–771.
Article Google Scholar
Chen, L.-P., & Yi, G. Y. (2020). Model selection and model averaging for analysis of truncated and censored data with measurement error. Electronic Journal of Statistics, 14, 4054–4109.
Article Google Scholar
Chen, L.-P., & Yi, G. Y. (2021a). Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics. https://doi.org/10.1111/biom.13331
Chen, L.-P., & Yi, G. Y. (2021b). Semiparametric methods for left-truncated and right-censored survival data with covariate measurement error. Annals of the Institute of Statistical Mathematics, 73, 481–517. https://doi.org/10.1007/s10463-020-00755-2
Article Google Scholar
Chen, X., Chen, X., & Wang, H. (2018). Robust feature screening for ultra-high dimensional right censored data via distance correlation. Computational Statistics and Data Analysis, 119, 118–138.
Article Google Scholar
Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110, 630–641.
Article CAS PubMed Google Scholar
Dreiera, I., & Kotzb, S. (2002). A note on the characteristic function of the t-distribution. Statistics and Probability Letters, 57, 221–224.
Article Google Scholar
Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society, Series B, 70, 849–911.
Article Google Scholar
Fan, J., & Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics, 38, 3567–3604.
Article Google Scholar
Fan, J., Samworth, R., & Wu, Y. (2009). Ultrahigh dimensional feature selection: beyond the linear model. Journal of Machine Learning Research, 10, 1829–1853.
Google Scholar
Fan, J., Feng, Y., & Wu, Y. (2010). Ultrahigh dimensional variable selection for Cox’s proportional hazards model. IMS Collect, 6, 70–86.
Google Scholar
Földes, A., & Rejtö, L. (1981). A LIL type result for the product limit estimator. Z. Wahrscheinlichkeitstheorie verw. Gebiete, 56, 75–86.
Article Google Scholar
Hall, P., & Miller, H. (2009). Using generalized correlation to effect variable selection in very high dimensional problems. Journal of Computational and Graphical Statistics, 18, 533–550.
Article Google Scholar
Hao, M., Lin, Y., Liu, X., & Tang, W. (2019). Robust feature screening for high-dimensional survival data. Journal of Applied Statistics, 46, 979–994.
Article Google Scholar
Isaev, M., & McKay, B. D. (2016). On a bound of Hoeffding in the complex case. Electronic Communications in Probability, 21, 1–7.
Article Google Scholar
Li, R., Zhong, W., & Zhu, L. (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association, 107, 1129–1139.
Article CAS PubMed PubMed Central Google Scholar
Marsden, J. E., & Hoffman, M. J. (1999). Basic complex analysis. New York: W. H. Freeman.
Google Scholar
Rosenwald, A., Wright, G., Chan, W. C., Connors, J. M., Campo, E., Fisher, R. I., Gascoyne, R. D., Muller-Hermelink, H. K., Smeland, E. B., & Staudt, L. M. (2003). The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell, 3, 185–197.
Article CAS PubMed Google Scholar
Song, R., Lu, W., Ma, S., & Jeng, X. (2014). Censored rank independence screening for high-dimensional survival data. Biometrika, 101, 799–814.
Article PubMed Google Scholar
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics, 35, 2769–2794.
Article Google Scholar
Wand, M.P. & Jones, M.C. (1995). Kernel Smoothing. Chapman & Hall, London.
Book Google Scholar
Xue, J., & Liang, F. (2017). A robust model free feature screening method for ultrahigh dimensional data. Journal of Computational and Graphical Statistics, 26, 803–813.
Article PubMed PubMed Central Google Scholar
Yan, X., Tang, N., & Zhao, X. (2017). The Spearman rank correlation screening for ultrahigh dimensional censored data. arXiv:1702.02708v1.
Google Scholar
Yi, G. Y. (2017). Statistical Analysis with Measurement Error and Misclassication: Strategy, Method and Application. Springer.
Book Google Scholar
Yi, G.Y., He, W., & Caroll, R.J. (2021). Feature screening with large-scale and high-dimensional survival data. Biometrics. http://doi.org/10.1111/biom.13479
Yi, G. Y., Ma, Y., Spiegelman, D., & Carroll, R. J. (2015). Functional and structural methods with mixed measurement error and misclassification in covariates. Journal of the American Statistical Association, 110, 681–696.
Article CAS PubMed Google Scholar
Zhang, J., Liu, Y., & Cui, H. (2020). Model-free feature screening via distance correlation for ultrahigh dimensional survival data. Statistical Papers. https://doi.org/10.1007/s00362-020-01210-3
Zhong, W., & Zhu, L. (2015). An iterative approach to distance correlation-based sure independence screening. Journal of Statistical Computation and Simulation, 85, 2331–2345.
Article Google Scholar
Zhu, L., Li, L., Li, R., & Zhu, L. (2011). Model-free feature screening for ultrahigh-dimensional data. Journal of the American Statistical Association, 106, 1464–1475.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors thank the co-editors and a referee for their helpful comments on the initial version. This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC). Yi is Canada Research Chair in Data Science (Tier 1). Her research was undertaken, in part, thanks to funding from the Canada Research Chairs program.

Author information

Authors and Affiliations

Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON, Canada
Li-Pang Chen
Department of Statistics, National Chengchi University, Taipei, Taiwan
Li-Pang Chen
Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON, Canada
Grace Y. Yi
Department of Computer Science, University of Western Ontario, London, ON, Canada
Grace Y. Yi

Authors

Li-Pang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Grace Y. Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grace Y. Yi .

Editor information

Editors and Affiliations

Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON, Canada
Wenqing He
Department of Statistics, University of Manitoba, Winnipeg, MB, Canada
Liqun Wang
Department of Statistics, University of British Columbia, Vancouver, BC, Canada
Jiahua Chen
Department of Mathematics and Statistics, Queen’s University, Kingston, ON, Canada
Chunfang Devon Lin

Appendix

1.1 A. Technical Lemmas

In this appendix, we provide some lemmas that are useful to derive the main theorems. The first lemma is the probabilistic bound of the estimated survivor function.

Lemma 1

Let H(t) = P(Y _i > t) denote the cumulative distribution function of Y _i , where $Y_i = \min \{T_i, C_i\}$ . Suppose that there is a finite time point τ, such that H(τ) > η for a positive constant η. Then for ξ > 2⁷ n ⁻¹ η ⁻² , there exist positive constants κ ₁ and κ ₂ such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left( \sup \limits_{t \in [0,\tau]} \left| \widehat{F}(t) - F(t) \right| > \xi \right) \leq \kappa_1 \exp\left(-n \xi^2 \eta^4 \kappa_2 \right). \end{array} \end{aligned} $$

(A.1)

This lemma is Theorem 2 of Földes & Rejtö (1981). The second lemma is about the probabilistic bound of the estimator (16).

Lemma 2

Under regularity conditions (C1) and (C2), for any ξ ^∗ > 0, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left( \sup \limits_{x} \left| \widehat{F}_{adj,j}(x) - F_{X_{(j)}}(x) \right| > \xi^\ast \right) \leq \kappa_3 \exp \left\{- \frac{2n^2 \xi^{\ast2}}{G^2} + o( n^{-\frac{1}{5}}) \right\} \end{array} \end{aligned} $$

(A.2)

for some positive constants G and κ ₃.

Proof

We first write

$$\displaystyle \begin{aligned} \begin{array}{rcl} \widehat{f}_{adj,j}(x) - f_{X_{(j)}}(x) & =&\displaystyle \left\{\widehat{f}_{adj,j}(x) - f_{adj,j}(x) \right\} + \left\{ f_{adj,j}(x) - f_{X_{(j)}}(x) \right\} \\ & =&\displaystyle \widehat{f}_{adj,j}(x) - f_{adj,j}(x),{} \end{array} \end{aligned} $$

(A.3)

because f _adj,j(x) is just a different symbol of the inverse Fourier transformation of $F_{X_{(j)}}(x)$, i.e., $f_{adj,j}(x) - f_{X_{(j)}}(x) = 0$. Therefore, the remaining task is to examine $\widehat {f}_{adj,j}(x) - f_{adj,j}(x)$. By (11) and (15), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \widehat{f}_{adj,j}(x) - f_{adj,j}(x) = \frac{1}{2\pi} \int_{-\infty}^\infty \frac{\exp\left( -\mathbf{i}u x \right)}{\phi_{\epsilon_{(j)}}(u)} \left\{ \widehat{\phi}_{X_{(j)}^\ast}(u) - \phi_{X_{(j)}^\ast}(u) \right\} du. \end{array} \end{aligned} $$

(A.4)

Note that $\phi _{X_{(j)}^\ast }(u) = E\left \{ \exp \left ( \mathbf {i} u X_{i(j)}^\ast \right ) \right \}$ and $\widehat {\phi }_{X_{(j)}^\ast }(u)$ is given by (14); then

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle \widehat{\phi}_{X_{(j)}^\ast}(u) - \phi_{X_{(j)}^\ast}(u) \\ & &\displaystyle = \left\{\frac{1}{n} \sum \limits_{i=1}^n \exp \left( \mathbf{i} u X_{i(j)}^\ast \right) \right\}\int_{-\infty}^\infty \exp \left( \mathbf{i} u hz \right) K(z) dz - E\left\{ \exp \left( \mathbf{i} u X_{i(j)}^\ast \right) \right\}.\qquad {} \end{array} \end{aligned} $$

(A.5)

By Conditions (C1) and (C2) and the finiteness of $\int _{-\infty }^\infty u^r K(u) du$ for all $r \in \mathbb {N}$, applying the Taylor series expansion of the exponential function gives that $\int _{-\infty }^\infty \exp \left ( \mathbf {i} u h z \right ) K(u) du = 1 + o( n^{-\frac {1}{5}} )$. Combining with (A.5) gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \widehat{\phi}_{X_{(j)}^\ast}(u) - \phi_{X_{(j)}^\ast}(u) = \frac{1}{n} \sum \limits_{i=1}^n \exp \left( \mathbf{i} u X_{i(j)}^\ast \right) - E\left\{ \exp \left( \mathbf{i} u X_{i(j)}^\ast \right) \right\} + o( n^{-\frac{6}{5}} ).\qquad \end{array} \end{aligned} $$

(A.6)

Let $Z_i = \exp \left ( \mathbf {i} u X_{i(j)}^\ast \right )$, which is a complex random variable. By Theorem 1.2 of Isaev and McKay (2016), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \Big| E \left[ \exp \left\{ Z_i - E(Z_i) \right\} \right] - 1 \Big| \leq \exp\left( \frac{G^2}{8} \right) - 1, \end{array} \end{aligned} $$

(A.7)

where G is some constant with $G > \text{diam}Z \triangleq \inf \left \{c \in \mathbb {R}^+ : P\left ( |Z_1 - Z_2| > c \right )\right .$ . Note that $\widehat {\phi }_{X_{(j)}^\ast }(u) = \frac {1}{n} \sum \limits _{i=1}^n Z_i$; then by (A.6), for any ξ ₂ > 0 and ν > 0,

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P\left( \Big| \widehat{\phi}_{X_{(j)}^\ast}(u) - \phi_{X_{(j)}^\ast}(u) \Big| \geq \xi_2 \right) \\ & \leq&\displaystyle P\left( \frac{1}{n} \sum \limits_{i=1}^n \Big|Z_i - E(Z_i) \Big| \geq \xi_2 + o(n^{-\frac{6}{5}}) \right) \\ & =&\displaystyle P\left[ \exp \left(\nu \sum \limits_{i=1}^n \Big|Z_i - E(Z_i) \Big| \right) \geq \exp \left\{\nu n \xi_2 + o(n^{-\frac{1}{5}}) \right\} \right] \\ & \leq&\displaystyle \exp \left\{-\nu n \xi_2 + o(n^{-\frac{1}{5}}) \right\} E \left\{ \exp \left(\nu \sum \limits_{i=1}^n \Big|Z_i - E(Z_i) \Big| \right) \right\} \\ & =&\displaystyle \left[\exp \left\{-\nu n \xi_2 + o(n^{-\frac{1}{5}}) \right\} \right] \times \left[ \prod \limits_{i=1}^n E \left\{ \exp \left(\nu \Big|Z_i - E(Z_i) \Big| \right) \right\} \right] \\ & =&\displaystyle \exp \left\{ \frac{n \nu^2 G^2}{8} -\nu n \xi_2 + o(n^{-\frac{1}{5}}) \right\},{} \end{array} \end{aligned} $$

(A.8)

where the third step is due to Markov’s inequality, the fourth step is by the independence of the $X_i^\ast $, and the last step comes from (A.7) with Z _i replaced by νZ _i, so that with constant νG satisfying $\nu G > \inf \{\nu c : P\left ( \nu |Z_1 - Z_2| > \nu c \right ) = 0 \}$, we have $ E \left \{ \exp \left (\nu | Z_i - E(Z_i) | \right ) \right \} \leq \exp \left ( \frac {\nu ^2 G^2}{8} \right )$.

To get the best upper bound, we take the right-hand side of (A.8) as the function of ν and then minimize it. Specifically, let $\varphi (\nu ) = \frac {n \nu ^2 G^2}{8} -\nu n \xi _2 + o(n^{-\frac {1}{5}})$. Since φ(ν) is a quadratic function, it is easy to check that $\nu ^\ast \triangleq \mathop {\operatorname {argminn}} \limits _\nu \varphi (\nu ) = \frac {4\xi _2}{G^2}$. Then replacing ν by ν ^∗ in (A.8) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left( \left| \widehat{\phi}_{X_{(j)}^\ast}(u) - \phi_{X_{(j)}^\ast}(u) \right| \geq \xi_2 \right) \leq \exp \left\{- \frac{2n \xi_2^2}{G^2} + o(n^{-\frac{1}{5}}) \right\}. \end{array} \end{aligned} $$

(A.9)

Moreover, by (A.4) and (A.9), we observe that with a probability greater than $1-\exp \left \{- \frac {2n \xi _2^2}{G^2} + o(n^{-\frac {1}{5}}) \right \}$,

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sup \limits_x \left| \widehat{f}_{adj,j}(x) - f_{X_{(j)}}(x) \right| & =&\displaystyle \sup \limits_x \left| \widehat{f}_{adj,j}(x) - f_{adj,j}(x) \right| \\ & \leq&\displaystyle \sup \limits_x \left\{\frac{1}{2\pi} \int_{-\infty}^\infty \frac{\exp\left( -\mathbf{i}u x \right)}{\phi_{\epsilon_{(j)}}(u)} \left| \widehat{\phi}_{X_{(j)}^\ast}(u) - \phi_{X_{(j)}^\ast}(u) \right| du \right\} \\ & \leq&\displaystyle \left( \sup \limits_x \frac{1}{2\pi} \int_{-\infty}^\infty \frac{\exp\left( -\mathbf{i}u x \right)}{\phi_{\epsilon_{(j)}}(u)} du \right) \xi_2, \end{array} \end{aligned} $$

where the first equality is due to (A.3), the last step comes from (A.9), and the improper integral $\int _{-\infty }^\infty \frac {\exp \left ( -\mathbf {i}u x \right )}{\phi _{\epsilon _{(j)}}(u)} du$ is shown to converge to a finite value (e.g., Marsden & Hoffman 1999, Proposition 4.3.9).

In other words,

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P \left\{\sup \limits_x \left| \widehat{f}_{adj,j}(x) - f_{X_{(j)}}(x) \right| \geq \left( \sup \limits_x \frac{1}{2\pi} \int_{-\infty}^\infty \frac{\exp\left( -\mathbf{i}u x \right)}{\phi_{\epsilon_{(j)}}(u)} du \right) \xi_2 \right\}\\ & \leq &\displaystyle \exp \left\{- \frac{2n \xi_2^2}{G^2} + o( n^{-\frac{1}{5}}) \right\}. \end{array} \end{aligned} $$

Specifying $\xi ^\ast = \left ( \sup \limits _x \frac {1}{2\pi } \int _{-\infty }^\infty \frac {\exp \left ( -\mathbf {i}u x \right )}{\phi _{\epsilon _{(j)}}(u)} du \right ) \xi _2$ gives that

$$\displaystyle \begin{aligned} \begin{array}{rcl} P\left( \sup \limits_x \left| \widehat{f}_{adj,j}(x) - f_{X_{(j)}}(x) \right| \geq \xi^\ast \right) & \leq&\displaystyle \kappa_3 \exp \left\{- \frac{2n \xi^{\ast 2}}{G^2} + o( n^{-\frac{1}{5}}) \right\}, \end{array} \end{aligned} $$

where $\kappa _3 \triangleq \exp \left \{ \frac {2n \xi ^{\ast 2}}{G^2} - \frac {2n \xi ^2}{G^2} \right \}$, which is positive. Thus, by the definition of the cumulative distribution function and (16), we conclude the desired result (A.2). □

1.2 B. Proofs of Main Theorems

1.2.1 B.1 Proof of Theorem 1

Part 1

We prove (19).

Since ω _j and $\widehat {\omega }_j$ are formulated in terms of dov(⋅, ⋅) and the associated estimates, to show the desired result, it suffices to examine dov(⋅, ⋅) and its estimates.

Let $\omega _j^\ast \triangleq \widehat {\text{dcov}}(F_{X_{(j)}}(X_{(j)}),F(Y)) = \widetilde {M}_{j,1} + \widetilde {M}_{j,2} - 2 \widetilde {M}_{j,3}$, where $\widetilde {M}_{j,k}$ with k = 1, 2, 3 has the same form as $\widehat {M}_{j,k}$ in (4) with $\widehat {F}_{X_{(j)}}(X_{(j)})$ and $\widehat {F}(Y)$ replaced by $F_{X_{(j)}}(X_{(j)})$ and F(Y ), respectively. Therefore, the difference between $\widehat {\omega }_j$ and ω _j can be expressed as

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \widehat{\omega}_j^\ast - \omega_j = \left( \widehat{\omega}_j^\ast - \omega_j^\ast\right) + \left( \omega_j^\ast - \omega_j\right). \end{array} \end{aligned} $$

(B.1)

Similar to the derivation of Li et al. (2012), we can show that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left( \max \limits_{j=1,\cdots,p} \left| \omega_j^\ast - \omega_j \right| > \xi \right) = O\left\{ p \exp \left(-\widetilde{c}_1n\xi^2 \right)\right\} \end{array} \end{aligned} $$

(B.2)

for some positive constants $\widetilde {c}_1$ and ξ.

On the other hand, we examine $ \widehat {\omega }_j - \omega _j^\ast $ by writing

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \widehat{\omega}_j - \omega_j^\ast = \left( \widehat{M}_{j,1} - \widetilde{M}_{j,1} \right) + \left( \widehat{M}_{j,2} - \widetilde{M}_{j,2} \right) - 2\left( \widehat{M}_{j,3} - \widetilde{M}_{j,3} \right). \end{array} \end{aligned} $$

(B.3)

Since the derivations of $ \widehat {M}_{j,2} - \widetilde {M}_{j,2}$ and $ \widehat {M}_{j,3} - \widetilde {M}_{j,3}$ are similar to those of $ \widehat {M}_{j,1} - \widetilde {M}_{j,1}$, we only present the argument for the latter case.

By adding and subtracting $\frac {1}{n^2} \sum \limits _{i=1}^n \sum \limits _{k=1}^n \Big \{\left | \widehat {F}_{adj,j}(X_{i(j)}) - \widehat {F}_{adj,j}(X_{k(j)}) \right | \left |F(Y_i) - \right .$ $\left . F(Y_k) \right | \Big \}$, we obtain that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} & &\displaystyle \widehat{M}_{j,1} - \widetilde{M}_{j,1} \\ & =&\displaystyle \Bigg[ \widehat{M}_{j,1} - \frac{1}{n^2} \sum \limits_{i=1}^n \sum \limits_{k=1}^n \Big\{\left| \widehat{F}_{adj,j}(X_{i(j)}) - \widehat{F}_{adj,j}(X_{k(j)}) \right| \left|F(Y_i) - F(Y_k) \right| \Big\} \Bigg] \\ & &\displaystyle + \Bigg[ \frac{1}{n^2} \sum \limits_{i=1}^n \sum \limits_{k=1}^n \Big\{\left| \widehat{F}_{adj,j}(X_{i(j)}) - \widehat{F}_{adj,j}(X_{k(j)}) \right| \left|F(Y_i) - F(Y_k) \right| \Big\} - \widetilde{M}_{j,1} \Bigg] \\ & =&\displaystyle \frac{1}{n^2} \sum \limits_{i=1}^n \sum \limits_{k=1}^n \left\{ \left| \widehat{F}_{adj,j}(X_{i(j)}) - \widehat{F}_{adj,j}(X_{k(j)}) \right| \left( \left| \widehat{F}(Y_i) - \widehat{F}(Y_k) \right| - \left| F(Y_i) - F(Y_k) \right| \right) \right\} \\ & &\displaystyle + \frac{1}{n^2} \sum \limits_{i=1}^n \sum \limits_{k=1}^n \left\{ \left|F(Y_i) - F(Y_k) \right| \left( \left| \widehat{F}_{adj,j}(X_{i(j)}) - \widehat{F}_{X_{(j)}}(X_{k(j)}) \right| \right. \right. \\ & &\displaystyle \left. \left. - \left| F_{adj,j}(X_{i(j)}) - F_{X_{(j)}}(X_{k(j)}) \right| \right) \right\} \\ & \triangleq&\displaystyle S_1 + S_2. \end{array} \end{aligned} $$

(B.4)

First, we examine S ₁. Since $\widehat {F}_{adj,j}(x)$ is the estimated cumulative distribution function with $0 \leq \widehat {F}_{adj,j}(x) \leq 1$, then for any i and k, we have that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \left| \widehat{F}_{adj,j}(X_{i(j)}) - \widehat{F}_{adj,j}(X_{k(j)}) \right| & \leq &\displaystyle 1. \end{array} \end{aligned} $$

(B.5)

By the triangle inequality, we have that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \left| \widehat{F}(Y_i) - \widehat{F}(Y_k) \right| - \left| F(Y_i) - F(Y_k) \right| \leq \left| \widehat{F}(Y_i) - F(Y_i) \right| + \left| \widehat{F}(Y_k) - F(Y_k) \right|.\\ \end{array} \end{aligned} $$

(B.6)

Then by (B.6), we have that

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle P\bigg\{ \bigg| \left| \widehat{F}(Y_i) - \widehat{F}(Y_k) \right| - \left| F(Y_i) - F(Y_k) \right| \bigg| > \xi \bigg\} \\ & \leq&\displaystyle P\left\{ \left( \left| \widehat{F}(Y_i) - F(Y_i) \right| + \left| \widehat{F}(Y_k) - F(Y_k) \right| \right) > \xi \right\} \\ & \leq&\displaystyle P\left\{ \sup \limits_{t \in [0,\tau]} \left| \widehat{F}(t) - F(t) \right| + \sup \limits_{t \in [0,\tau]} \left| \widehat{F}(t) - F(t) \right| > \xi \right\} \\ & =&\displaystyle P\left\{ 2\sup \limits_{t \in [0,\tau]} \left| \widehat{F}(t) - F(t) \right| > \xi \right\} \\ & =&\displaystyle P\left\{ \sup \limits_{t \in [0,\tau]} \left| \widehat{F}(t) - F(t) \right| > \frac{\xi}{2} \right\} \\ & \leq&\displaystyle \kappa_1 \exp\left(- \frac{1}{4} n \xi^2 \eta^4 \kappa_2 \right),{} \end{array} \end{aligned} $$

(B.7)

where the last step is due to Lemma 1 with ξ in the right-hand side of (A.1) replaced by $\frac {\xi }{2}$. Therefore, combining (B.5) and (B.7) gives

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} P\left( \big| S_1 \big| > \xi \right) \leq \kappa_1 \exp\left(- \frac{1}{4} n \xi^2 \eta^4 \kappa_2 \right). \end{array} \end{aligned} $$

(B.8)

Next, we examine S ₂ in a similar manner. Similar to (B.5), we have that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \left|F(Y_i) - F(Y_k) \right| \leq 1. \end{array} \end{aligned} $$

(B.9)

Similar to the arguments for (B.7), we obtain that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} & &\displaystyle P\bigg\{ \bigg| \left| \widehat{F}_{adj,j}(X_{i(j)}) - \widehat{F}_{X_{(j)}}(X_{k(j)}) \right| - \left| F_{adj,j}(X_{i(j)}) - F_{X_{(j)}}(X_{k(j)}) \right| \bigg| > \xi \bigg\} \\ & \leq&\displaystyle \kappa_3 \exp \left\{- \frac{n \xi^2}{2G^2} + o( n^{-\frac{1}{5}}) \right\}. \end{array} \end{aligned} $$

(B.10)

Therefore, combining (B.9) and (B.10) yields

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left( \big|S_2\big| > \xi \right) \leq \kappa_3 \exp \left\{- \frac{n \xi^2}{2G^2} + o( n^{-\frac{1}{5}}) \right\}. \end{array} \end{aligned} $$

(B.11)

Finally, combining (B.4), (B.8), and (B.11), the probabilistic bound of $\widehat {M}_{j,1} - \widetilde {M}_{j,1}$ is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} P\left( \left|\widehat{M}_{j,1} - \widetilde{M}_{j,1} \right| > \xi \right) & =&\displaystyle P\left( \big|S_1 + S_2 \big|> \xi \right) {}\\ & \leq&\displaystyle P\left( \big|S_1 \big| + \big| S_2 \big|> \xi \right) \\ & \leq&\displaystyle P\left( \big|S_1\big| > \frac{\xi}{2} \right) + P\left( \big|S_2\big| > \frac{\xi}{2} \right) \\ & \leq&\displaystyle \kappa_1 \exp\left(- \frac{1}{16} n \xi^2 \eta^4 \kappa_2 \right) + \kappa_3 \exp \left\{- \frac{n \xi^2}{8G^2} + o( n^{-\frac{1}{5}}) \right\}. \end{array} \end{aligned} $$

(B.12)

Furthermore, similar derivations show that

$$\displaystyle \begin{aligned} \begin{array}{rcl} P\left( \left|\widehat{M}_{j,2} - \widetilde{M}_{j,2} \right| > \xi \right) & \leq&\displaystyle \kappa_1 \exp\left(- \frac{1}{16} n \xi^2 \eta^4 \kappa_2 \right) \\ & &\displaystyle + \kappa_3 \exp \left\{- \frac{n \xi^2}{8G^2} + o( n^{-\frac{1}{5}}) \right\}{} \end{array} \end{aligned} $$

(B.13)

and

$$\displaystyle \begin{aligned} \begin{array}{rcl} P\left( \left|\widehat{M}_{j,3} - \widetilde{M}_{j,3} \right| > \xi \right) & \leq&\displaystyle \kappa_1 \exp\left(- \frac{1}{16} n \xi^2 \eta^4 \kappa_2 \right) \\ & &\displaystyle + \kappa_3 \exp \left\{- \frac{n \xi^2}{8G^2} + o( n^{-\frac{1}{5}}) \right\}.{} \end{array} \end{aligned} $$

(B.14)

Noting that the upper bounds in (B.12)–(B.14) are dominated by $ \exp \left (- c^\ast n \xi ^2 \right )$ for certain constant c ^∗, we apply (B.12)–(B.14) to (B.3) and obtain that

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} P\left( \left| \widehat{\omega}_j - \omega_j^\ast \right| > \xi \right) = O \left\{ \exp\left(- \widetilde{c}_2 n \xi^2 \right) \right\} \end{array} \end{aligned} $$

(B.15)

for some $\widetilde {c}_2 >0$. Thus, combining (B.2) and (B.15) with (B.1) and specifying ξ = cn ^−ζ for the constants c and ζ described in Condition (C4) yield the desired result.

Part 2

We prove (20).

Let $J = \min \limits _{j \in \mathcal {I}} \big |\omega _j\big | - \max \limits _{j \in \mathcal {I}^c} \big |\omega _j \big | $. The left-hand side of (20) can be expressed as

$$\displaystyle \begin{aligned} \begin{array}{rcl} P\left( \max \limits_{j \in \mathcal{I}^c} \big| \widehat{\omega}_j \big| \geq \min \limits_{j \in \mathcal{I}} \big|\widehat{\omega}_j\big| \right) & =&\displaystyle P\left( \max \limits_{j \in \mathcal{I}^c} \big| \widehat{\omega}_j \big| - \max \limits_{j \in \mathcal{I}^c} \big|\omega_j \big| \geq \min \limits_{j \in \mathcal{I}} \big|\widehat{\omega}_j \big| - \max \limits_{j \in \mathcal{I}^c} \big|\omega_j \big| \right) \\ & =&\displaystyle P\left( \max \limits_{j \in \mathcal{I}^c} \big| \widehat{\omega}_j \big| - \max \limits_{j \in \mathcal{I}^c} \big|\omega_j \big| \geq \min \limits_{j \in \mathcal{I}} \big|\widehat{\omega}_j\big| - \min \limits_{j \in \mathcal{I}} \big|\omega_j\big| + J \right) \\ & \leq&\displaystyle P\left( \max \limits_{j \in \mathcal{I}^c} \left| \widehat{\omega}_j - \omega_j \right| + \max \limits_{j \in \mathcal{I}} \left| \widehat{\omega}_j - \omega_j \right| \geq J \right) \\ & \leq&\displaystyle P\left( 2 \max \limits_{j = 1,\cdots,p} \left| \widehat{\omega}_j - \omega_j \right| \geq J \right) \\ & =&\displaystyle O \left\{ \exp\left(- \frac{1}{4} D n v_0^2 \right) \right\}, \end{array} \end{aligned} $$

where the last step comes from the result in Part 1 and Condition (C5). $\hfill \square $

1.2.2 B.2 Proof of Theorem 2

Similar to the derivations of Li et al. (2012), one can obtain that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \left\{ \max \limits_{j \in \mathcal{I}} \left| \widehat{\omega}_j - \omega_j \right| \leq cn^{-\zeta} \right\} \subseteq \left\{ \mathcal{I} \subseteq \widehat{\mathcal{I}} \right\}. \end{array} \end{aligned} $$

It gives

$$\displaystyle \begin{aligned} \begin{array}{rcl} P\left( \mathcal{I} \subseteq \widehat{\mathcal{I}} \right) & \geq &\displaystyle 1 - P\left( \max \limits_{j \in \mathcal{I}} \left| \widehat{\omega}_j - \omega_j \right| \leq cn^{-\zeta} \right) \\ & \geq &\displaystyle 1 - q P\left( \left| \widehat{\omega}_j - \omega_j \right| \leq cn^{-\zeta} \right) \\ & \geq &\displaystyle 1 - O\left\{ q \exp\left(-Dn^{1-2\zeta} \right) \right\}, \end{array} \end{aligned} $$

where the last step comes from Theorem 1. □

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, LP., Yi, G.Y. (2022). Robust Feature Screening for Ultrahigh-Dimensional Censored Data Subject to Measurement Error. In: He, W., Wang, L., Chen, J., Lin, C.D. (eds) Advances and Innovations in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-031-08329-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-08329-7_2
Published: 28 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08328-0
Online ISBN: 978-3-031-08329-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Robust Feature Screening for Ultrahigh-Dimensional Censored Data Subject to Measurement Error

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A. Technical Lemmas

Lemma 1

Lemma 2

Proof

1.2 B. Proofs of Main Theorems

1.2.1 B.1 Proof of Theorem 1

Part 1

Part 2

1.2.2 B.2 Proof of Theorem 2

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation