Abstract
We propose a semiparametric Wald statistic to test the validity of logistic regression models based on case-control data. The test statistic is constructed using a semiparametric ROC curve estimator and a nonparametric ROC curve estimator. The statistic has an asymptotic chisquared distribution and is an alternative to the Kolmogorov-Smirnov-type statistic proposed by Qin and Zhang in 1997, the chi-squared-type statistic proposed by Zhang in 1999 and the information matrix test statistic proposed by Zhang in 2001. The statistic is easy to compute in the sense that it requires none of the following methods: using a bootstrap method to find its critical values, partitioning the sample data or inverting a high-dimensional matrix. We present some results on simulation and on analysis of two real examples. Moreover, we discuss how to extend our statistic to a family of statistics and how to construct its Kolmogorov-Smirnov counterpart.
Similar content being viewed by others
References
Breslow N, Day N E. Statistical Methods in Cancer Research, 1. The Analysis of Case-Control Studies. Lyon: IARC Press, 1980
Prentice R L, Pyke R. Logistic disease incidence models and case-control studies. Biometrika, 66: 403–411 (1979)
Wang C Y, Carroll R J. On robust estimation in logistic case-control studies. Biometrika, 80: 237–241 (1993)
Qin J, Zhang B. A goodness of fit test for logistic regression models based on case-control data. Biometrika, 84: 609–618 (1997)
Vardi Y. Nonparametric estimation in the presence of length bias. Ann Statist, 10: 616–620 (1982)
Vardi Y. Empirical distributions in selection bias models. Ann Statist, 13: 178–203 (1985)
Gill R D, Vardi Y, Wellner J A. Large sample theory of empirical distributions in biased sampling models. Ann Statist, 16: 1069–1112 (1988)
Qin J. Empirical likelihood in biased sample problems. Ann Statist, 21: 1182–1196 (1993)
Kay R, Little S. Transformations of the explanatory variables in the logistic regression model for binary data. Biometrika, 74: 495–501 (1987)
Zhang B. A chi-squared goodness-of-fit test for logistic regression models based on case-control data. Biometrika, 86: 531–539 (1999)
Zhang B. An information matrix test for logistic regression models based on case-control data. Biometrika, 88: 921–932 (2001)
White H. Maximum likelihood estimation of misspecified models. Econometrica, 50: 1–25 (1982)
Zhou X H, McClish D K, Obuchowski N A. Statistical Methods in Diagnostic Medicine. New York: Wiley, 2002
Pepe M S. The Statistical Evaluation of Medical Tests for Classification and Prediction. New York: Oxford University Press, 2003
Qin J, Zhang B. Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika, 90: 585–596 (2003)
Wan S W, Zhang B. Smooth semiparametric receiver operating characteristic curves for continuous diagnostic tests. Stat Med, 26: 2565–2586 (2007)
Day N E, Kerridge D F. A general maximum likelihood discriminant. Biometrics, 23: 313–323 (1967)
Kac M, Kiefer J, Wolfowitz J. On tests of normality and other tests of goodness of fit based on distance methods. Ann Math Statist, 26: 189–211 (1955)
Glovsky L, Rigrodsky S. A developmental analysis of mentally deficient children with early histories of aphasis. Training School Bull, 61: 76–96 (1964)
Hosmer D J, Lemeshow S. Applied Logistic Regression. New York: John Wiley, 1989
Wieand S, Gail M H, James B R, et al. A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data. Biometrika, 76: 585–592 (1989)
Bahadur R R. A note on quantiles in large samples. Ann Math Statist, 37: 577–580 (1966)
van de Varrt A W, Wellner J A. Weak Convergence and Empirical Processes with Applications to Statistics. New York: Springer, 1996
Billingsley P. Convergence of Probability Measures. New York: John Wiley, 1968
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the 11.5 Natural Scientific Plan (Grant No. 2006BAD09A04) and Nanjing University Start Fund (Grant No. 020822410110)
Rights and permissions
About this article
Cite this article
Wan, S. A semiparametric Wald statistic for testing logistic regression models based on case-control data. Sci. China Ser. A-Math. 51, 2020–2032 (2008). https://doi.org/10.1007/s11425-008-0086-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-008-0086-z
Keywords
- case-control data
- density ratio model
- empirical likelihood
- Kolmogorov-Smirnov statistic
- logistic regression
- ROC curve