Abstract
We obtain an information bound for estimates of parameters in general regression models where data are collected under a variety of response-selective sampling schemes, together with a simple formula for the asymptotic variance of the semi-parametric maximum likelihood estimate. This is compared to the bound and the estimate is found to be fully efficient in a variety of settings. A small simulation study is reported to illustrate the small-sample efficiency of the semi-parametric estimator.
Similar content being viewed by others
References
Bickel P.J., Klaassen C.A., Ritov Y. and Wellner J.A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore
Breslow N.E. and Cain K.C. (1988). Logistic regression for two-stage case-control data. Biometrika 75: 11–20
Breslow N.E., Robins J.M. and Wellner J.A. (2000). On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli 6: 447–455
Breslow N.E., McNeney B. and Wellner J.A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Annals of Statistics 31: 1110–1139
Hsieh D.A., Manski C.F. and McFadden D. (1985). Estimation of response probabilities from augmented retrospective observations. Journal of the American Statistical Association 80: 651–662
Hu X.J. and Lawless J.F. (1996). Estimation from truncated lifetime data with supplementary information on covariates and and censoring times. Biometrika 83: 747–761
Jiang Y., Scott A.J. and Wild C.J. (2006). Secondary analyses of case-control sampled data. Statistics in Medicine 25: 1323–1339
Kalbfleisch J.D. and Lawless J.F. (1988). Likelihood analysis of multi-state models for disease incidence and mortality. Statistics in Medicine 7: 149–160
Lawless J.F., Kalbfleisch J.D. and Wild C.J. (1999). Semiparametric methods for response-selective and missing data problems. Journal of the Royal Statistical Society, Series B 61: 413–438
Lee, A. (2007a). On the semiparametric efficiency of the Scott–Wild estimator under choice-based and two-phase sampling. Journal of applied mathematics and decision sciences, Article ID 86180, vol. 2007.
Lee, A. (2007b). Semi-parametric efficiency bounds for regression models under choice-based sampling. Unpublished manuscript. Available on http://www.stat.auckland.ac.nz/~lee/.
Lee A.J., McMurchy L. and Scott A.J. (1997). Re-using data from case-control studies. Statistics in Medicine 16: 1377–1389
Lee A.J., Scott A.J. and Wild C.J. (2006). Fitting binary regression models with case-augmented samples. Biometrika 93: 385–397
Murphy S.A. and Van der Vaart A.W. (2000). On profile likelihood. Journal of the American Statistical Association 95: 449–485
Neuhaus J., Scott A.J. and Wild C.J. (2002). The analysis of retrospective family studies. Biometrika 89: 23–37
Newey W.K. (1994). The asymptotic variance of semiparametric estimators. Econometrica 62: 1349–1382
Prentice R.L. and Pyke R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66: 403–11
Robins J.M., Rotnitzky A. and Zhao L.P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89: 846–866
Robins J.M., Hsieh F. and Newey W. (1995). Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates. Journal of the Royal Statistical Society, Series B 57: 409–424
Scott A.J. and Wild C.J. (1991). Fitting logistic regression models in stratified case-control studies. Biometrics 47: 497–510
Scott A.J. and Wild C.J. (1997). Fitting regression models to case-control data by maximum likelihood. Biometrika 84: 57–71
Scott A.J. and Wild C.J. (2001). Maximum likelihood for generalised case-control studies. Journal of Statistical Planning and Inference 96: 3–27
Scott A.J. and Wild C.J. (2002). On the robustness of weighted methods for fitting models to case-control data. Journal of the Royal Statistical Society, Series B 64: 207–219
White J.E. (1982). A two-stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology 115: 119–128
Whittemore A.S. (1995). Logistic regression of family data from case-control studies. Biometrika 82: 57–67
Wild C.J. (1991). Fitting prospective regression models to case-control data. Biometrika 78: 705–717
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Lee, A., Hirose, Y. Semi-parametric efficiency bounds for regression models under response-selective sampling: the profile likelihood approach. Ann Inst Stat Math 62, 1023–1052 (2010). https://doi.org/10.1007/s10463-008-0205-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-008-0205-1