Abstract
Technological advances allow scientists to collect high dimensional data sets in which the number of variables is much larger than the sample size. A representative example is genomics. Consequently, due to their loss of accuracy or power, many classic statistical methods are being challenged when analyzing such data. In this chapter, we propose an empirical likelihood (EL) method to test regression coefficients in high dimensional generalized linear models. The EL test has an asymptotic chi-squared distribution with two degrees of freedom under the null hypothesis, and this result is independent of the number of covariates. Moreover, we extend the proposed method to test a part of the regression coefficients in the presence of nuisance parameters. Simulation studies show that the EL tests have a good control of the type-I error rate under moderate sample sizes and are more powerful than the direct competitor under the alternative hypothesis under most scenarios. The proposed tests are employed to analyze the association between rheumatoid arthritis (RA) and single nucleotide polymorphisms (SNPs) on chromosome 6. The resulted p-value is 0.019, indicating that chromosome 6 has an influence on RA. With the partial test and logistic modeling, we also find that the SNPs eliminated by the sure independence screening and Lasso methods have no significant influence on RA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bai, Z.D., Saranadasa, H.: Effect of high dimension: by an example of a two sample problem. Stat. Sin. 6, 311–329 (1996)
Bühlmann, P., et al.: Statistical significance in high-dimensional linear models. Bernoulli 19 (4), 1212–1242 (2013)
Chapman, J., Whittaker, J.: Analysis of multiple snps in a candidate gene or region. Genet. Epidemiol. 32, 560–566 (2008)
Chapman, J.M., Cooper, J.D., Todd, J.A., Clayton, D.G.: Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003)
Chen, S.X., Guo, B.: Tests for high dimensional generalized linear models. arXiv preprint. arXiv:1402.4882 (2014)
Chen, S.X., Hall, P.: Smoothed empirical likelihood confidence intervals for quantiles. Ann. Stat. 21, 1166–1181 (1993)
Chen, S.X., Van Keilegom, I.: A review on empirical likelihood methods for regression. Test 18 (3), 415–447 (2009)
Chen, S.X., Peng, L., Qin, Y.L.: Effects of data dimension on empirical likelihood. Biometrika 96, 711–722 (2009)
Chen, S.X., Zhang, L.X., Zhong, P.S.: Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc. 106, 260–274 (2010)
Donoho, D.L., et al.: High-dimensional data analysis: the curses and blessings of dimensionality. In: AMS Math Challenges Lecture, pp. 1–32 (2000)
Ellinghaus, E., Stuart, P.E., Ellinghaus, D., Nair, R.P., Debrus, S., Raelson, J.V., Belouchi, M., Tejasvi, T., Li, Y., Tsoi, L.C., et al.: Genome-wide meta-analysis of psoriatic arthritis identifies susceptibility locus at REL. J. Invest. Dermatol. 132, 1133–1140 (2012)
Fan, J., Song, R., et al.: Sure independence screening in generalized linear models with NP-dimensionality. The Annals of Statistics 38, 3567–3604 (2010)
Goeman, J.J., Van De Geer, S.A., Van Houwelingen, H.C.: Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B (Stat Methodol.) 68, 477–493 (2006)
Huang, J., Ma, S., Zhang, C.H.: The iterated lasso for high-dimensional logistic regression. The University of Iowa Department of Statistical and Actuarial Science Technical Report (392) (2008)
Kolaczyk, E.D.: Empirical likelihood for generalized linear models. Stat. Sin. 4, 199–218 (1994)
Li, Q., Hu, J., Ding, J., Zheng, G.: Fisher’s method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations. Biostatistics 15, 284–295 (2013)
Meinshausen, N., Meier, L., Bühlmann, P.: P-values for high-dimensional regression. J. Am. Stat. Assoc. 104 (488), 1671–1681 (2009)
Newey, W.K., Smith, R.J.: Higher order properties of gmm and generalized empirical likelihood estimators. Econometrica 72, 219–255 (2004)
Owen, A.B.: Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75, 237–249 (1988)
Owen, A.B.: Empirical likelihood for linear models. Ann. Stat. 11, 1725–1747 (1991)
Owen, A.: Empirical Likelihood. Chapman and Hall/CRC, Boca Raton (2001)
Peng, L., Qi, Y., Wang, R.: Empirical likelihood test for high dimensional linear models. Stat. Probab. Lett. 86, 74–79 (2014)
Plenge, R.M., Seielstad, M., Padyukov, L., Lee, A.T., Remmers, E.F., Ding, B., Liew, A., Khalili, H., Chandrasekaran, A., Davies, L.R., et al.: Traf1-c5 as a risk locus for rheumatoid arthritis–a genomewide study. N. Engl. J. Med. 357 (12), 1199–1209 (2007)
Qin, J., Lawless, J.: Empirical likelihood and general estimating equations. Ann. Stat. 22, 300–325 (1994)
Wang, T., Elston, R.C.: Improved power by use of a weighted score test for linkage disequilibrium mapping. Am. J. Hum. Genet. 80, 353–360 (2007)
Wang, R., Peng, L., Qi, Y.: Jackknife empirical likelihood test for equality of two high dimensional means. Stat. Sin. 23, 667–690 (2013)
Zhang, R., Peng, L., Wang, R., et al.: Tests for covariance matrix with fixed or divergent dimension. Ann. Stat. 41, 2075–2096 (2013)
Zhong, P.S., Chen, S.X.: Tests for high-dimensional regression coefficients with factorial designs. J. Am. Stat. Assoc. 106, 260–274 (2011)
Acknowledgements
We thank the organizers and participants of “The Fourth International Workshop on the Perspectives on High-dimensional Data Analysis.” Q. Zhang was partly supported by the China Postdoctoral Science Foundation (Grant No. 2014M550799) and the National Science Foundation of China (11401561). Q. Li was supported in part by the National Science Foundation of China (11371353, 61134013) and the Strategic Priority Research Program of the Chinese Academy of Sciences. S. Ma was supported by the National Social Science Foundation of China (13CTJ001, 13&ZD148).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Recall that μ 0i = g(X i ⊤ β 0), ψ(X i , β 0) = g ′(X i ⊤ β 0)∕V {g(X i ⊤ β 0)}, ɛ i = Y i − g(X i ⊤ β 0),
and
Without loss of generality, we assume μ 0i = 0. Now we prove Theorem 1.
Proof
According to Theorem 3.2 in [21], it suffices to prove that under the assumptions of Theorem 1, Conditions 1–2, and the null hypothesis, we have that as n → ∞,
and
where
and
Notice that
According to Conditions 1–2 and (5), we have
Based on the Lyapunov central limit theorem, we can immediately get \(\sum _{i=1}^{m}T_{i}/\sqrt{m}\sigma _{1}\stackrel{\mathrm{d}}{\rightarrow }N(0,1)\). Similarly we can obtain \(\sum _{i=1}^{m}S_{i}/\sqrt{m}\sigma _{2}\stackrel{\mathrm{d}}{\rightarrow }N(0,1)\). To show (12), we still need to prove that for any constants a and b,
Notice that under the null hypothesis,
Then it is easy to obtain that
By the Lyapunov central limit theorem, we conclude that (14) holds. That is, we prove (12).
To show the first result in (13), it is obviously that
Therefore the first result in (13) holds. Similarly, we can obtain the rest two results in (13). ⊓ ⊔
To prove Theorem 2, we first establish Lemma 1.
Lemma 1
For any δ > 0,
and
Proof
The proof of Lemma 1 is similar to that of Lemma 6 in [26]. ⊓⊔
Proof
[Proof of Theorem 2] It suffices to verify that (5) and (6) hold in Theorem 1. Consider Example 1. Assume that Q 1 = O Σ −1∕2 X 1, and Q 1+m = O Σ −1∕2 X 1+m , where O is an orthogonal matrix satisfying that O Σ O ⊤ is diagonal. Then X 1 ⊤ X 1+m = Q 1 ⊤ O Σ O ⊤ Q 1+m = ∑ j = 1 p ϕ j Q 1j Q 1+m, j , where ϕ j ’s are the eigenvalues of Σ. Therefore
Thus we obtain that E[(X 1 ⊤ X 1+m )]4∕[tr{Σ 2}]2 = O(1) is bounded uniformly for any p, i.e., (5) holds. Equation (6) can be verified in the same way.
As for Example 2, we define Σ ′ = Γ ⊤ Γ = (σ i, j ′)1 ≤ i, j ≤ m and α ⊤ Γ = (a 1, …, a m ). Since X i = Γ F i ,
where F (1+m)j denotes the jth element of F 1+m , and
Denote \(\delta _{j_{1},\ldots,j_{8}} = E\left (\prod _{k=1}^{8}F_{1j_{k}}\right )\). The other cases of ∑ v = 1 d l v ≤ 8 can be proved in the same way. Notice that
\(\delta _{j_{1},\ldots,j_{8}}\neq 0\) only when {j 1, …, j 8} form pairs of integers. Denote ∑ ∗ as the summation of the situations that \(\delta _{j_{1},\ldots,j_{8}}\delta _{j_{1}^{{\prime}},\ldots,j_{8}^{{\prime}}}\neq 0\). By Lemma 1 we have
Similarly we have
Then according to Theorem 1, we can prove Theorem 2. ⊓ ⊔
Proof
[Proof of Theorem 3] Similar to the proof of Theorem 1, we only need to show that under Conditions 1, 3–5, and the null hypothesis, as n → ∞,
where
and
To prove (17), it suffices to prove the following three asymptotic results:
Notice that under the null hypothesis \(\tilde{H}_{0}\), we have
where
Through proper calculation and according to Conditions 3–5, we have
Then by applying the Markov equality, we have
Therefore \(\frac{\sum _{i=1}^{m}\tilde{T}_{ i}} {\sqrt{m}\tilde{\sigma }_{1}}\) can be written as the summation of independent statistics and o p (1), namely
Therefore similar to the proof of (12) in Theorem 1, we can prove (17).
To show the first result in (18), it is obvious that
By applying Conditions 3–5 and with proper computation, we can obtain
According to the Markov equality, we obtain \(\frac{1} {m\tilde{\sigma }_{1}^{2}} \sum _{i=1}^{m}(h_{1i}^{2}(\hat{\beta }_{0}) - h_{1i}^{2}(\beta _{0})) = o_{p}(1)\). Therefore we have
By adopting the method similar to the proof of (13) in Theorem 1, we can obtain the first result in (18). Similarly, we can prove the other two results in (18). ⊓ ⊔
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Zang, Y., Zhang, Q., Zhang, S., Li, Q., Ma, S. (2017). Empirical Likelihood Test for High Dimensional Generalized Linear Models. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-41573-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)