The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square

• Pragya Sur
• Yuxin Chen
• Emmanuel J. Candès
Article

Abstract

Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test (LRT). Indeed, Wilks’ theorem asserts that whenever we have a fixed number p of variables, twice the log-likelihood ratio (LLR) $$2 \Lambda$$ is distributed as a $$\chi ^2_k$$ variable in the limit of large sample sizes n; here, $$\chi ^2_k$$ is a Chi-square with k degrees of freedom and k the number of variables being tested. In this paper, we prove that when p is not negligible compared to n, Wilks’ theorem does not hold and that the Chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that n and p grow large in such a way that $$p/n \rightarrow \kappa$$ for some constant $$\kappa < 1/2$$. (For $$\kappa > 1/2$$, $$2\Lambda {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}}0$$ so that the LRT is not interesting in this regime.) We prove that for a class of logistic models, the LLR converges to a rescaled Chi-square, namely, $$2\Lambda ~{\mathop {\rightarrow }\limits ^{\mathrm {d}}}~ \alpha (\kappa ) \chi _k^2$$, where the scaling factor $$\alpha (\kappa )$$ is greater than one as soon as the dimensionality ratio $$\kappa$$ is positive. Hence, the LLR is larger than classically assumed. For instance, when $$\kappa = 0.3$$, $$\alpha (\kappa ) \approx 1.5$$. In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, from non-asymptotic random matrix theory and from convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.

Keywords

Logistic regression Likelihood-ratio tests Wilks’ theorem High-dimensionality Goodness of fit Approximate message passing Concentration inequalities Convex geometry Leave-one-out analysis

62Fxx

Notes

Acknowledgements

E. C. was partially supported by the Office of Naval Research under grant N00014-16-1-2712, and by the Math + X Award from the Simons Foundation. P. S. was partially supported by the Ric Weiland Graduate Fellowship in the School of Humanities and Sciences, Stanford University. Y. C. is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, and by the Princeton SEAS innovation award. P. S. and Y. C. are grateful to Andrea Montanari for his help in understanding AMP and [22]. Y. C. thanks Kaizheng Wang and Cong Ma for helpful discussion about [25], and P. S. thanks Subhabrata Sen for several helpful discussions regarding this project. E. C. would like to thank Iain Johnstone for a helpful discussion as well.

Supplementary material

440_2018_896_MOESM1_ESM.pdf (490 kb)
Supplementary material 1 (pdf 490 KB)

References

1. 1.
Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, Berlin (2011)
2. 2.
Alon, N., Spencer, J.H.: The Probabilistic Method, 3rd edn. Wiley, Hoboken (2008)
3. 3.
Amelunxen, D., Lotz, M., McCoy, M.B., Tropp, J.A.: Living on the edge: phase transitions in convex programs with random data. Inf. Inference 3, 224–294 (2014)
4. 4.
Baricz, Á.: Mills’ ratio: monotonicity patterns and functional inequalities. J. Math. Anal. Appl. 340(2), 1362–1370 (2008)
5. 5.
Bartlett, M.S.: Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 160, 268–282 (1937)
6. 6.
Bayati, M., Lelarge, M., Montanari, A., et al.: Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25(2), 753–822 (2015)
7. 7.
Bayati, M., Montanari, A.: The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57(2), 764–785 (2011)
8. 8.
Bayati, M., Montanari, A.: The LASSO risk for Gaussian matrices. IEEE Trans. Inf. Theory 58(4), 1997–2017 (2012)
9. 9.
Bickel, P.J., Ghosh, J.K.: A decomposition for the likelihood ratio statistic and the Bartlett correction—a Bayesian argument. Ann. Stat. 18, 1070–1090 (1990)
10. 10.
Boucheron, S., Massart, P.: A high-dimensional Wilks phenomenon. Probab. Theory Relat. Fields 150(3–4), 405–433 (2011)
11. 11.
Box, G.: A general distribution theory for a class of likelihood criteria. Biometrika 36(3/4), 317–346 (1949)
12. 12.
Candès, E., Fan, Y., Janson, L., Lv, J.: Panning for gold: model-free knockoffs for high-dimensional controlled variable selection (2016). ArXiv preprint arXiv:1610.02351
13. 13.
Chernoff, H.: On the distribution of the likelihood ratio. Ann. Math. Stat. 25, 573–578 (1954)
14. 14.
Cordeiro, G.M.: Improved likelihood ratio statistics for generalized linear models. J. R. Stat. Soc. Ser. B (Methodol.) 25, 404–413 (1983)
15. 15.
Cordeiro, G.M., Cribari-Neto, F.: An Introduction to Bartlett Correction and Bias Reduction. Springer, New York (2014)
16. 16.
Cordeiro, G.M., Cribari-Neto, F., Aubin, E.C.Q., Ferrari, S.L.P.: Bartlett corrections for one-parameter exponential family models. J. Stat. Comput. Simul. 53(3–4), 211–231 (1995)
17. 17.
Cover, T.M.: Geometrical and statistical properties of linear threshold devices. Ph.D. thesis (1964)Google Scholar
18. 18.
Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 3, 326–334 (1965)
19. 19.
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)
20. 20.
Cribari-Neto, F., Cordeiro, G.M.: On Bartlett and Bartlett-type corrections Francisco Cribari-Neto. Econom. Rev. 15(4), 339–367 (1996)
21. 21.
Deshpande, Y., Montanari, A.: Finding hidden cliques of size $$\sqrt{N/e}$$ in nearly linear time. Found. Comput. Math. 15(4), 1069–1128 (2015)
22. 22.
Donoho, D., Montanari, A.: High dimensional robust M-estimation: asymptotic variance via approximate message passing. Probab. Theory Relat. Fields 3, 935–969 (2013)
23. 23.
Donoho, D., Montanari, A.: Variance breakdown of Huber (M)-estimators: $$n/p \rightarrow m \in (1,\infty )$$. Technical report (2015)Google Scholar
24. 24.
El Karoui, N.: Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results (2013). ArXiv preprint arXiv:1311.2445
25. 25.
El Karoui, N.: On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Relat. Fields 170, 95–175 (2017)
26. 26.
El Karoui, N., Bean, D., Bickel, P.J., Lim, C., Yu, B.: On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. 110(36), 14557–14562 (2013)
27. 27.
Fan, J., Jiang, J.: Nonparametric inference with generalized likelihood ratio tests. Test 16(3), 409–444 (2007)
28. 28.
Fan, J., Lv, J.: Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory 57(8), 5467–5484 (2011)
29. 29.
Fan, J., Zhang, C., Zhang, J.: Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Stat. 29, 153–193 (2001)
30. 30.
Fan, Y., Demirkaya, E., Lv, J.: Nonuniformity of p-values can occur early in diverging dimensions (2017). arXiv:1705.03604
31. 31.
Hager, W.W.: Updating the inverse of a matrix. SIAM Rev. 31(2), 221–239 (1989)
32. 32.
Hanson, D.L., Wright, F.T.: A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42(3), 1079–1083 (1971)
33. 33.
He, X., Shao, Q.-M.: On parameters of increasing dimensions. J. Multivar. Anal. 73(1), 120–135 (2000)
34. 34.
Hsu, D., Kakade, S., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17(52), 1–6 (2012)
35. 35.
Huber, P.J.: Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat. 1, 799–821 (1973)
36. 36.
Huber, P.J.: Robust Statistics. Springer, Berlin (2011)Google Scholar
37. 37.
Janková, J., Van De Geer, S.: Confidence regions for high-dimensional generalized linear models under sparsity (2016). ArXiv preprint arXiv:1610.01353
38. 38.
Javanmard, A., Montanari, A.: State evolution for general approximate message passing algorithms, with applications to spatial coupling. Inf. Inference 2, 115–144 (2013)
39. 39.
Javanmard, A., Montanari, A.: De-biasing the lasso: optimal sample size for Gaussian designs (2015). ArXiv preprint arXiv:1508.02757
40. 40.
Lawley, D.N.: A general method for approximating to the distribution of likelihood ratio criteria. Biometrika 43(3/4), 295–303 (1956)
41. 41.
Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer, Berlin (2006)
42. 42.
Liang, H., Pang, D., et al.: Maximum likelihood estimation in logistic regression models with a diverging number of covariates. Electron. J. Stat. 6, 1838–1846 (2012)
43. 43.
Mammen, E.: Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Stat. 17, 382–400 (1989)
44. 44.
McCullagh, P., Nelder, J.A.: Generalized Linear Models. Monograph on Statistics and Applied Probability. Chapman & Hall, London (1989)
45. 45.
Moulton, L.H., Weissfeld, L.A., Laurent, R.T.S.: Bartlett correction factors in logistic regression models. Comput. Stat. Data Anal. 15(1), 1–11 (1993)
46. 46.
Oymak, S., Tropp, J.A.: Universality laws for randomized dimension reduction, with applications. Inf. Inference J. IMA 7, 337–446 (2015)
47. 47.
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)Google Scholar
48. 48.
Portnoy, S.: Asymptotic behavior of M-estimators of $$p$$ regression parameters when $$p^2/n$$ is large. I. Consistency. Ann. Stat. 12, 1298–1309 (1984)
49. 49.
Portnoy, S.: Asymptotic behavior of M-estimators of $$p$$ regression parameters when $$p^2/n$$ is large; II. Normal approximation. Ann. Stat. 13, 1403–1417 (1985)
50. 50.
Portnoy, S.: Asymptotic behavior of the empiric distribution of m-estimated residuals from a regression model with many parameters. Ann. Stat. 14, 1152–1170 (1986)
51. 51.
Portnoy, S., et al.: Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Stat. 16(1), 356–366 (1988)
52. 52.
Rudelson, M., Vershynin, R., et al.: Hanson-Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18(82), 1–9 (2013)
53. 53.
Sampford, M.R.: Some inequalities on Mill’s ratio and related functions. Ann. Math. Stat. 24(1), 130–132 (1953)
54. 54.
Spokoiny, V.: Penalized maximum likelihood estimation and effective dimension (2012). ArXiv preprint arXiv:1205.0498
55. 55.
Su, W., Bogdan, M., Candes, E.: False discoveries occur early on the Lasso path. Ann. Stat. 45, 2133–2150 (2015)
56. 56.
Sur, P., Candès, E.J.: Additional supplementary materials for: a modern maximum-likelihood theory for high-dimensional logistic regression. https://statweb.stanford.edu/~candes/papers/proofs_LogisticAMP.pdf (2018)
57. 57.
Sur, P., Candès, E.J.: A modern maximum-likelihood theory for high-dimensional logistic regression (2018). ArXiv preprint arXiv:1803.06964
58. 58.
Sur, P., Chen, Y., Candès, E.: Supplemental materials for “the likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square”. http://statweb.stanford.edu/~candes/papers/supplement_LRT.pdf (2017)
59. 59.
Tang, C.Y., Leng, C.: Penalized high-dimensional empirical likelihood. Biometrika 97, 905–919 (2010)
60. 60.
Tao, T.: Topics in Random Matrix Theory, vol. 132. American Mathematical Society, Providence (2012)
61. 61.
Thrampoulidis, C., Abbasi, E., Hassibi, B.: Precise error analysis of regularized m-estimators in high-dimensions (2016). ArXiv preprint arXiv:1601.06233
62. 62.
Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R., et al.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014)
63. 63.
Van de Geer, S.A., et al.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36(2), 614–645 (2008)
64. 64.
Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)Google Scholar
65. 65.
Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications, pp. 210–268 (2012)Google Scholar
66. 66.
Wilks, S.S.: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9(1), 60–62 (1938)
67. 67.
Yan, T., Li, Y., Xu, J., Yang, Y., Zhu, J.: High-dimensional Wilks phenomena in some exponential random graph models (2012). ArXiv preprint arXiv:1201.0058