Response-Dependent Sampling with Clustered and Longitudinal Data
Prospective cohort studies typically involve repeated assessment of individuals to determine whether they have a particular health condition. The usual goal in such studies is to relate the presence of the condition to disease markers or exposure variables. Disease markers are often too difficult or costly to measure for all individuals in a sample. In such settings, two- and multi-phase sampling designs are routinely adopted to enable researchers to select individuals on whom these expensive markers are to be assessed. In this article we review the rationale and format of two-phase sampling designs in retrospective and cross-sectional studies. We then develop frameworks for multi-phase designs in the context of studies with clustered or longitudinal responses. Model-based and semi-parametric methods are discussed for estimation and inference.
KeywordsAugmented inverse probability weighting asymptotic efficiency inverse probability weighting maximum likelihood response-dependent sampling two-phase sampling; clustering longitudinal data.
Michael McIsaac’s research was supported by an Alexander Graham Bell Canada Graduate Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and Discovery Grants to Richard Cook from NSERC (RGPIN 155849) and the Canadian Institutes for Health Research (FRN 13887). Richard Cook is a Canada Research Chair in Statistical Methods for Health Research. The authors thank Dr. Dafna Gladman and Dr. Vinod Chandran for collaboration and helpful discussions regarding the research at the Centre for Prognosis Studies in Rheumatic Disease at the University of Toronto. The authors gratefully acknowledge the careful review and comments from a referee and Dr. Brajendra Sutradhar.