Abstract
A cost-effective sampling design is desirable in large cohort studies with a limited budget due to the high cost of measurements of primary exposure variables. The outcome-dependent sampling (ODS) designs enrich the observed sample by oversampling the regions of the underlying population that convey the most information about the exposure-response relationship. The generalized linear models (GLMs) are widely used in many fields, however, much less developments have been done with the GLMs for data from the ODS designs. We study how to fit the GLMs to data obtained by the original ODS design and the two-phase ODS design, respectively. The asymptotic properties of the proposed estimators are derived. A series of simulations are conducted to assess the finite-sample performance of the proposed estimators. Applications to a Wilms tumor study and an air quality study demonstrate the practicability of the proposed methods.
Similar content being viewed by others
References
Breslow N E, Chatterjee N. Design and analysis of two-phase studies with binary outcome applied to wilms tumour prognosis. J Roy Statist Soc, 1999, 48: 457–468
Chatterjee N, Chen Y H, Breslow N E. A pseudo-score estimator for regression problems with two-phase sampling. J Amer Statist Assoc, 2003, 98: 158–168
Cleveland W S. Visualizing Data. Hobart: Hobart Press, 1993
D’Angio G J, Breslow N, Beckwith B, et al. Treatment of Wilms’ tumor. Cancer, 1989, 64: 349–360
Ding J, Chen X. Large-sample theory for generalized linear models with non-natural link and random variates. Acta Math Appl Sin Eng Ser, 2006, 22: 115–126
Ding J, Liu Y. Semiparametric empirical likelihood estimation for two-stage outcome-dependent sampling under the frame of generalized linear models. Acta Math Appl Sin Eng Ser, 2014, 30: 663–676
Ding J, Liu Y, Peden D B, et al. Regression analysis for a summed missing data problem under an outcome-dependent sampling scheme. Canad J Statist, 2012, 40: 282–303
Dobson A J. An Introductoin to Generalized Linear Models, 2nd ed. London: Chapman and Hall, 2002
Fahrmeir L, Kaufmann H. Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann Statist, 1985, 14: 342–368
Foutz R V. On the unique consistent solution to the likelihood equations. J Amer Statist Assoc, 1977, 72: 147–148
Green D M, Breslow N E, Beckwith J B, et al. Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilms tumor: A report from the National Wilms Tumor Study Group. J Clinical Oncology, 1998, 16: 237–245
McCullagh P M, Nelder J A. Generalized Linear Models, 2nd ed. London: Chapman and Hall, 1989
Qin G, Zhou H. Partial linear inference for a 2-stage outcome-dependent sampling design with a continuous outcome. Biostatistics, 2011, 12: 506–520
Song R, Zhou H, Kosorok M R. On semiparametric efficient inference for two-stage outcome dependent sampling with a continuous outcome. Biometrics, 2009, 96: 221–228
Weaver M A, Zhou H. An estimated likelihood method for continuous outcome regression models with outcomedependent sampling. J Amer Statist Assoc, 2005, 100: 459–469
Yue L, Chen X. Rate of strong consistency of quasi maximum likelihood estimate in generalized linear models. Sci China Ser A, 2004, 47: 882–893
Zhou H, Qin G, Longnecker M P. A partial linear model in the outcome-dependent sampling setting to evaluate the effect of prenatal PCB exposure on cognitive function in children. Biometrics, 2011, 67: 876–885
Zhou H, Song R, Qin J. Statistical inference for a two-stage outcome dependent sampling design with a continuous outcome. Biometrics, 2011, 67: 194–202
Zhou H, Weaver M A, Qin J, et al. A semiparametric empirical likelihood method for data from an outcome dependent sampling scheme with a continuous outcome. Biometrics, 2002, 58: 413–421
Zhou H, You J, Qin G, et al. A partially linear regression model for data from an outcome-dependent sampling design. J Roy Statist Soc Ser C, 2011, 60: 559–574
Acknowledgements
This work was supported by National Natural Science Foundation of China (Grant Nos. 11571263, 11371299 and 11101314).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yan, S., Ding, J. & Liu, Y. Statistical inference methods and applications of outcome-dependent sampling designs under generalized linear models. Sci. China Math. 60, 1219–1238 (2017). https://doi.org/10.1007/s11425-016-0152-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11425-016-0152-4