Multistage sampling for latent variable models
- 75 Downloads
I consider the design of multistage sampling schemes for epidemiologic studies involving latent variable models, with surrogate measurements of the latent variables on a subset of subjects. Such models arise in various situations: when detailed exposure measurements are combined with variables that can be used to assign exposures to unmeasured subjects; when biomarkers are obtained to assess an unobserved pathophysiologic process; or when additional information is to be obtained on confounding or modifying variables. In such situations, it may be possible to stratify the subsample on data available for all subjects in the main study, such as outcomes, exposure predictors, or geographic locations. Three circumstances where analytic calculations of the optimal design are possible are considered: (i) when all variables are binary; (ii) when all are normally distributed; and (iii) when the latent variable and its measurement are normally distributed, but the outcome is binary. In each of these cases, it is often possible to considerably improve the cost efficiency of the design by appropriate selection of the sampling fractions. More complex situations arise when the data are spatially distributed: the spatial correlation can be exploited to improve exposure assignment for unmeasured locations using available measurements on neighboring locations; some approaches for informative selection of the measurement sample using location and/or exposure predictor data are considered.
KeywordsStudy design Latent variable models Multistage sampling Spatial correlation Biomarkers Exposure measurement error
Unable to display preview. Download preview PDF.
- Cain K and Breslow N (1988). Logistic regression analysis and efficient design for two-stage studies. Am J Epidemiol 128: 1198–1206 Google Scholar
- Cressie NAC (1993). Statistics for spatial data. Wiley & Sons Inc., New York Google Scholar
- Haile RW, Siegmund KD, Gauderman WJ and Thomas DC (1999). Study-design issues in the development of the University of Southern California Consortium’s Colorectal Cancer Family Registry. J Natl Cancer Inst Monogr 26: 89–93 Google Scholar
- McConnell R, Berhane K, Yao L, Jerrett M, Lurmann F, Gilliland F, Kunzli N, Gauderman J, Avol E, Thomas D and Peters J (2006). Traffic, susceptibility, and childhood asthma. Environ Health Persp 114: 766–772 Google Scholar
- Nychka D and Saltzman N (1998). Design of air-quality monitoring networks. In: Nychka, D, Piegorsch, W and Cox, LH (eds) Case studies in environmental statistics, Lecture Notes in Statistics number 132, pp 51–75. Springer Verlag, New York Google Scholar
- Rosner B, Spiegelman D and Willett WC (1992). Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol 136: 1400–1413 Google Scholar
- Rothman KJ and Greenland S (1998). Modern epidemiology. Lippencott-Raven, Philadelphia Google Scholar
- Rubin D (1987). Multiple imputation for nonresponse in surveys. Wiley, New York Google Scholar
- Thomas DC, Conti DV (2006) Two stage genetic association studies. In: Encycolpedia of clinical trials(in press)Google Scholar
- White JE (1982). A two stage design for the study of the relationship between a rare exposure and a rare disease. Am J Epidemiol 115: 119–128 Google Scholar