Advertisement

The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square

  • Pragya SurEmail author
  • Yuxin Chen
  • Emmanuel J. Candès
Article
  • 62 Downloads

Abstract

Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test (LRT). Indeed, Wilks’ theorem asserts that whenever we have a fixed number p of variables, twice the log-likelihood ratio (LLR) \(2 \Lambda \) is distributed as a \(\chi ^2_k\) variable in the limit of large sample sizes n; here, \(\chi ^2_k\) is a Chi-square with k degrees of freedom and k the number of variables being tested. In this paper, we prove that when p is not negligible compared to n, Wilks’ theorem does not hold and that the Chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis). Assume that n and p grow large in such a way that \(p/n \rightarrow \kappa \) for some constant \(\kappa < 1/2\). (For \(\kappa > 1/2\), \(2\Lambda {\mathop {\rightarrow }\limits ^{{\mathbb {P}}}}0\) so that the LRT is not interesting in this regime.) We prove that for a class of logistic models, the LLR converges to a rescaled Chi-square, namely, \(2\Lambda ~{\mathop {\rightarrow }\limits ^{\mathrm {d}}}~ \alpha (\kappa ) \chi _k^2\), where the scaling factor \(\alpha (\kappa )\) is greater than one as soon as the dimensionality ratio \(\kappa \) is positive. Hence, the LLR is larger than classically assumed. For instance, when \(\kappa = 0.3\), \(\alpha (\kappa ) \approx 1.5\). In general, we show how to compute the scaling factor by solving a nonlinear system of two equations with two unknowns. Our mathematical arguments are involved and use techniques from approximate message passing theory, from non-asymptotic random matrix theory and from convex geometry. We also complement our mathematical study by showing that the new limiting distribution is accurate for finite sample sizes. Finally, all the results from this paper extend to some other regression models such as the probit regression model.

Keywords

Logistic regression Likelihood-ratio tests Wilks’ theorem High-dimensionality Goodness of fit Approximate message passing Concentration inequalities Convex geometry Leave-one-out analysis 

Mathematics Subject Classification

62Fxx 

Notes

Acknowledgements

E. C. was partially supported by the Office of Naval Research under grant N00014-16-1-2712, and by the Math + X Award from the Simons Foundation. P. S. was partially supported by the Ric Weiland Graduate Fellowship in the School of Humanities and Sciences, Stanford University. Y. C. is supported in part by the AFOSR YIP award FA9550-19-1-0030, by the ARO grant W911NF-18-1-0303, and by the Princeton SEAS innovation award. P. S. and Y. C. are grateful to Andrea Montanari for his help in understanding AMP and [22]. Y. C. thanks Kaizheng Wang and Cong Ma for helpful discussion about [25], and P. S. thanks Subhabrata Sen for several helpful discussions regarding this project. E. C. would like to thank Iain Johnstone for a helpful discussion as well.

Supplementary material

440_2018_896_MOESM1_ESM.pdf (490 kb)
Supplementary material 1 (pdf 490 KB)

References

  1. 1.
    Agresti, A., Kateri, M.: Categorical Data Analysis. Springer, Berlin (2011)zbMATHGoogle Scholar
  2. 2.
    Alon, N., Spencer, J.H.: The Probabilistic Method, 3rd edn. Wiley, Hoboken (2008)zbMATHGoogle Scholar
  3. 3.
    Amelunxen, D., Lotz, M., McCoy, M.B., Tropp, J.A.: Living on the edge: phase transitions in convex programs with random data. Inf. Inference 3, 224–294 (2014)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Baricz, Á.: Mills’ ratio: monotonicity patterns and functional inequalities. J. Math. Anal. Appl. 340(2), 1362–1370 (2008)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Bartlett, M.S.: Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 160, 268–282 (1937)zbMATHGoogle Scholar
  6. 6.
    Bayati, M., Lelarge, M., Montanari, A., et al.: Universality in polytope phase transitions and message passing algorithms. Ann. Appl. Probab. 25(2), 753–822 (2015)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Bayati, M., Montanari, A.: The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Trans. Inf. Theory 57(2), 764–785 (2011)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Bayati, M., Montanari, A.: The LASSO risk for Gaussian matrices. IEEE Trans. Inf. Theory 58(4), 1997–2017 (2012)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Bickel, P.J., Ghosh, J.K.: A decomposition for the likelihood ratio statistic and the Bartlett correction—a Bayesian argument. Ann. Stat. 18, 1070–1090 (1990)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Boucheron, S., Massart, P.: A high-dimensional Wilks phenomenon. Probab. Theory Relat. Fields 150(3–4), 405–433 (2011)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Box, G.: A general distribution theory for a class of likelihood criteria. Biometrika 36(3/4), 317–346 (1949)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Candès, E., Fan, Y., Janson, L., Lv, J.: Panning for gold: model-free knockoffs for high-dimensional controlled variable selection (2016). ArXiv preprint arXiv:1610.02351
  13. 13.
    Chernoff, H.: On the distribution of the likelihood ratio. Ann. Math. Stat. 25, 573–578 (1954)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Cordeiro, G.M.: Improved likelihood ratio statistics for generalized linear models. J. R. Stat. Soc. Ser. B (Methodol.) 25, 404–413 (1983)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Cordeiro, G.M., Cribari-Neto, F.: An Introduction to Bartlett Correction and Bias Reduction. Springer, New York (2014)zbMATHGoogle Scholar
  16. 16.
    Cordeiro, G.M., Cribari-Neto, F., Aubin, E.C.Q., Ferrari, S.L.P.: Bartlett corrections for one-parameter exponential family models. J. Stat. Comput. Simul. 53(3–4), 211–231 (1995)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Cover, T.M.: Geometrical and statistical properties of linear threshold devices. Ph.D. thesis (1964)Google Scholar
  18. 18.
    Cover, T.M.: Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electron. Comput. 3, 326–334 (1965)zbMATHGoogle Scholar
  19. 19.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Hoboken (2012)zbMATHGoogle Scholar
  20. 20.
    Cribari-Neto, F., Cordeiro, G.M.: On Bartlett and Bartlett-type corrections Francisco Cribari-Neto. Econom. Rev. 15(4), 339–367 (1996)zbMATHGoogle Scholar
  21. 21.
    Deshpande, Y., Montanari, A.: Finding hidden cliques of size \(\sqrt{N/e}\) in nearly linear time. Found. Comput. Math. 15(4), 1069–1128 (2015)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Donoho, D., Montanari, A.: High dimensional robust M-estimation: asymptotic variance via approximate message passing. Probab. Theory Relat. Fields 3, 935–969 (2013)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Donoho, D., Montanari, A.: Variance breakdown of Huber (M)-estimators: \(n/p \rightarrow m \in (1,\infty )\). Technical report (2015)Google Scholar
  24. 24.
    El Karoui, N.: Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators: rigorous results (2013). ArXiv preprint arXiv:1311.2445
  25. 25.
    El Karoui, N.: On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators. Probab. Theory Relat. Fields 170, 95–175 (2017)MathSciNetzbMATHGoogle Scholar
  26. 26.
    El Karoui, N., Bean, D., Bickel, P.J., Lim, C., Yu, B.: On robust regression with high-dimensional predictors. Proc. Natl. Acad. Sci. 110(36), 14557–14562 (2013)zbMATHGoogle Scholar
  27. 27.
    Fan, J., Jiang, J.: Nonparametric inference with generalized likelihood ratio tests. Test 16(3), 409–444 (2007)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Fan, J., Lv, J.: Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inf. Theory 57(8), 5467–5484 (2011)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Fan, J., Zhang, C., Zhang, J.: Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Stat. 29, 153–193 (2001)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Fan, Y., Demirkaya, E., Lv, J.: Nonuniformity of p-values can occur early in diverging dimensions (2017). arXiv:1705.03604
  31. 31.
    Hager, W.W.: Updating the inverse of a matrix. SIAM Rev. 31(2), 221–239 (1989)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Hanson, D.L., Wright, F.T.: A bound on tail probabilities for quadratic forms in independent random variables. Ann. Math. Stat. 42(3), 1079–1083 (1971)MathSciNetzbMATHGoogle Scholar
  33. 33.
    He, X., Shao, Q.-M.: On parameters of increasing dimensions. J. Multivar. Anal. 73(1), 120–135 (2000)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Hsu, D., Kakade, S., Zhang, T.: A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17(52), 1–6 (2012)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Huber, P.J.: Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Stat. 1, 799–821 (1973)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Huber, P.J.: Robust Statistics. Springer, Berlin (2011)Google Scholar
  37. 37.
    Janková, J., Van De Geer, S.: Confidence regions for high-dimensional generalized linear models under sparsity (2016). ArXiv preprint arXiv:1610.01353
  38. 38.
    Javanmard, A., Montanari, A.: State evolution for general approximate message passing algorithms, with applications to spatial coupling. Inf. Inference 2, 115–144 (2013)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Javanmard, A., Montanari, A.: De-biasing the lasso: optimal sample size for Gaussian designs (2015). ArXiv preprint arXiv:1508.02757
  40. 40.
    Lawley, D.N.: A general method for approximating to the distribution of likelihood ratio criteria. Biometrika 43(3/4), 295–303 (1956)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Lehmann, E.L., Romano, J.P.: Testing Statistical Hypotheses. Springer, Berlin (2006)zbMATHGoogle Scholar
  42. 42.
    Liang, H., Pang, D., et al.: Maximum likelihood estimation in logistic regression models with a diverging number of covariates. Electron. J. Stat. 6, 1838–1846 (2012)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Mammen, E.: Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Stat. 17, 382–400 (1989)MathSciNetzbMATHGoogle Scholar
  44. 44.
    McCullagh, P., Nelder, J.A.: Generalized Linear Models. Monograph on Statistics and Applied Probability. Chapman & Hall, London (1989)zbMATHGoogle Scholar
  45. 45.
    Moulton, L.H., Weissfeld, L.A., Laurent, R.T.S.: Bartlett correction factors in logistic regression models. Comput. Stat. Data Anal. 15(1), 1–11 (1993)zbMATHGoogle Scholar
  46. 46.
    Oymak, S., Tropp, J.A.: Universality laws for randomized dimension reduction, with applications. Inf. Inference J. IMA 7, 337–446 (2015)MathSciNetGoogle Scholar
  47. 47.
    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)Google Scholar
  48. 48.
    Portnoy, S.: Asymptotic behavior of M-estimators of \(p\) regression parameters when \(p^2/n\) is large. I. Consistency. Ann. Stat. 12, 1298–1309 (1984)zbMATHGoogle Scholar
  49. 49.
    Portnoy, S.: Asymptotic behavior of M-estimators of \(p\) regression parameters when \(p^2/n\) is large; II. Normal approximation. Ann. Stat. 13, 1403–1417 (1985)zbMATHGoogle Scholar
  50. 50.
    Portnoy, S.: Asymptotic behavior of the empiric distribution of m-estimated residuals from a regression model with many parameters. Ann. Stat. 14, 1152–1170 (1986)MathSciNetzbMATHGoogle Scholar
  51. 51.
    Portnoy, S., et al.: Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Stat. 16(1), 356–366 (1988)MathSciNetzbMATHGoogle Scholar
  52. 52.
    Rudelson, M., Vershynin, R., et al.: Hanson-Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18(82), 1–9 (2013)MathSciNetzbMATHGoogle Scholar
  53. 53.
    Sampford, M.R.: Some inequalities on Mill’s ratio and related functions. Ann. Math. Stat. 24(1), 130–132 (1953)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Spokoiny, V.: Penalized maximum likelihood estimation and effective dimension (2012). ArXiv preprint arXiv:1205.0498
  55. 55.
    Su, W., Bogdan, M., Candes, E.: False discoveries occur early on the Lasso path. Ann. Stat. 45, 2133–2150 (2015)MathSciNetzbMATHGoogle Scholar
  56. 56.
    Sur, P., Candès, E.J.: Additional supplementary materials for: a modern maximum-likelihood theory for high-dimensional logistic regression. https://statweb.stanford.edu/~candes/papers/proofs_LogisticAMP.pdf (2018)
  57. 57.
    Sur, P., Candès, E.J.: A modern maximum-likelihood theory for high-dimensional logistic regression (2018). ArXiv preprint arXiv:1803.06964
  58. 58.
    Sur, P., Chen, Y., Candès, E.: Supplemental materials for “the likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square”. http://statweb.stanford.edu/~candes/papers/supplement_LRT.pdf (2017)
  59. 59.
    Tang, C.Y., Leng, C.: Penalized high-dimensional empirical likelihood. Biometrika 97, 905–919 (2010)MathSciNetzbMATHGoogle Scholar
  60. 60.
    Tao, T.: Topics in Random Matrix Theory, vol. 132. American Mathematical Society, Providence (2012)zbMATHGoogle Scholar
  61. 61.
    Thrampoulidis, C., Abbasi, E., Hassibi, B.: Precise error analysis of regularized m-estimators in high-dimensions (2016). ArXiv preprint arXiv:1601.06233
  62. 62.
    Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R., et al.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014)MathSciNetzbMATHGoogle Scholar
  63. 63.
    Van de Geer, S.A., et al.: High-dimensional generalized linear models and the lasso. Ann. Stat. 36(2), 614–645 (2008)MathSciNetzbMATHGoogle Scholar
  64. 64.
    Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)Google Scholar
  65. 65.
    Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications, pp. 210–268 (2012)Google Scholar
  66. 66.
    Wilks, S.S.: The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9(1), 60–62 (1938)zbMATHGoogle Scholar
  67. 67.
    Yan, T., Li, Y., Xu, J., Yang, Y., Zhu, J.: High-dimensional Wilks phenomena in some exponential random graph models (2012). ArXiv preprint arXiv:1201.0058

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of StatisticsStanford UniversityStanfordUSA
  2. 2.Department of Electrical EngineeringPrinceton UniversityPrincetonUSA
  3. 3.Department of MathematicsStanford UniversityStanfordUSA

Personalised recommendations