Skip to main content
Log in

Semi-parametric efficiency bounds for regression models under response-selective sampling: the profile likelihood approach

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We obtain an information bound for estimates of parameters in general regression models where data are collected under a variety of response-selective sampling schemes, together with a simple formula for the asymptotic variance of the semi-parametric maximum likelihood estimate. This is compared to the bound and the estimate is found to be fully efficient in a variety of settings. A small simulation study is reported to illustrate the small-sample efficiency of the semi-parametric estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bickel P.J., Klaassen C.A., Ritov Y. and Wellner J.A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Breslow N.E. and Cain K.C. (1988). Logistic regression for two-stage case-control data. Biometrika 75: 11–20

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow N.E., Robins J.M. and Wellner J.A. (2000). On the semi-parametric efficiency of logistic regression under case-control sampling. Bernoulli 6: 447–455

    Article  MATH  MathSciNet  Google Scholar 

  • Breslow N.E., McNeney B. and Wellner J.A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Annals of Statistics 31: 1110–1139

    Article  MATH  MathSciNet  Google Scholar 

  • Hsieh D.A., Manski C.F. and McFadden D. (1985). Estimation of response probabilities from augmented retrospective observations. Journal of the American Statistical Association 80: 651–662

    Article  MATH  Google Scholar 

  • Hu X.J. and Lawless J.F. (1996). Estimation from truncated lifetime data with supplementary information on covariates and and censoring times. Biometrika 83: 747–761

    Article  MATH  MathSciNet  Google Scholar 

  • Jiang Y., Scott A.J. and Wild C.J. (2006). Secondary analyses of case-control sampled data. Statistics in Medicine 25: 1323–1339

    Article  MathSciNet  Google Scholar 

  • Kalbfleisch J.D. and Lawless J.F. (1988). Likelihood analysis of multi-state models for disease incidence and mortality. Statistics in Medicine 7: 149–160

    Article  Google Scholar 

  • Lawless J.F., Kalbfleisch J.D. and Wild C.J. (1999). Semiparametric methods for response-selective and missing data problems. Journal of the Royal Statistical Society, Series B 61: 413–438

    Article  MATH  MathSciNet  Google Scholar 

  • Lee, A. (2007a). On the semiparametric efficiency of the Scott–Wild estimator under choice-based and two-phase sampling. Journal of applied mathematics and decision sciences, Article ID 86180, vol. 2007.

  • Lee, A. (2007b). Semi-parametric efficiency bounds for regression models under choice-based sampling. Unpublished manuscript. Available on http://www.stat.auckland.ac.nz/~lee/.

  • Lee A.J., McMurchy L. and Scott A.J. (1997). Re-using data from case-control studies. Statistics in Medicine 16: 1377–1389

    Article  Google Scholar 

  • Lee A.J., Scott A.J. and Wild C.J. (2006). Fitting binary regression models with case-augmented samples. Biometrika 93: 385–397

    Article  MATH  MathSciNet  Google Scholar 

  • Murphy S.A. and Van der Vaart A.W. (2000). On profile likelihood. Journal of the American Statistical Association 95: 449–485

    Article  MATH  MathSciNet  Google Scholar 

  • Neuhaus J., Scott A.J. and Wild C.J. (2002). The analysis of retrospective family studies. Biometrika 89: 23–37

    Article  MATH  MathSciNet  Google Scholar 

  • Newey W.K. (1994). The asymptotic variance of semiparametric estimators. Econometrica 62: 1349–1382

    Article  MATH  MathSciNet  Google Scholar 

  • Prentice R.L. and Pyke R. (1979). Logistic disease incidence models and case-control studies. Biometrika, 66: 403–11

    Article  MATH  MathSciNet  Google Scholar 

  • Robins J.M., Rotnitzky A. and Zhao L.P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89: 846–866

    Article  MATH  MathSciNet  Google Scholar 

  • Robins J.M., Hsieh F. and Newey W. (1995). Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates. Journal of the Royal Statistical Society, Series B 57: 409–424

    MATH  MathSciNet  Google Scholar 

  • Scott A.J. and Wild C.J. (1991). Fitting logistic regression models in stratified case-control studies. Biometrics 47: 497–510

    Article  MATH  MathSciNet  Google Scholar 

  • Scott A.J. and Wild C.J. (1997). Fitting regression models to case-control data by maximum likelihood. Biometrika 84: 57–71

    Article  MATH  MathSciNet  Google Scholar 

  • Scott A.J. and Wild C.J. (2001). Maximum likelihood for generalised case-control studies. Journal of Statistical Planning and Inference 96: 3–27

    Article  MATH  MathSciNet  Google Scholar 

  • Scott A.J. and Wild C.J. (2002). On the robustness of weighted methods for fitting models to case-control data. Journal of the Royal Statistical Society, Series B 64: 207–219

    Article  MATH  MathSciNet  Google Scholar 

  • White J.E. (1982). A two-stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology 115: 119–128

    Google Scholar 

  • Whittemore A.S. (1995). Logistic regression of family data from case-control studies. Biometrika 82: 57–67

    Article  MATH  Google Scholar 

  • Wild C.J. (1991). Fitting prospective regression models to case-control data. Biometrika 78: 705–717

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alan Lee.

About this article

Cite this article

Lee, A., Hirose, Y. Semi-parametric efficiency bounds for regression models under response-selective sampling: the profile likelihood approach. Ann Inst Stat Math 62, 1023–1052 (2010). https://doi.org/10.1007/s10463-008-0205-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-008-0205-1

Keywords

Navigation