Methods for Analyzing Secondary Outcomes in Public Health Case–Control Studies

  • Elizabeth D. Schifano
  • Haim Bar
  • Ofer HarelEmail author
Part of the ICSA Book Series in Statistics book series (ICSABSS)


Case–control studies are common in public health research. In these studies, cases are chosen based on the primary outcome but there are usually many other related variables which are collected. While the analysis of the association between the primary outcome and exposure variables is generally the main focus of the study, the association between secondary outcomes and exposure variables may also be of interest. Since the experiment was designed for the analysis of the primary outcome, the analysis of secondary outcomes may suffer from selection bias. In this chapter we will introduce the problem and the potential biased inference that can result from ignoring the sampling design. We will discuss and compare a design-based and model-based approach to account for the bias, and demonstrate the methods using a public health data set.


Secondary Outcome Propensity Score Smoking Behavior Propensity Score Match Inverse Probability Weighting 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors wish to thank Dr. David C. Christiani for generously sharing his data. This research was supported in part by the National Institute of Mental Health, Award Number K01MH087219. The content of this paper is solely the responsibility of the authors, and it does not represent the official views of the National Institute of Mental Health or the National Institutes of Health.


  1. Abadie, A., Imbens, G.: Large sample properties of matching estimators for average treatment effects. Econometrica 74(1), 235–267 (2006)MathSciNetCrossRefGoogle Scholar
  2. Amos, C.I., Wu, X., Broderick, P., Gorlov, I.P., Gu, J., Eisen, T., Dong, Q., Zhang, Q., Gu, X., Vijayakrishnan, J., Sullivan, K., Matakidou, A., Wang, Y., Mills, G., Doheny, K., Tsai, Y.Y., Chen, W.V., Shete, S., Spitz, M.R., Houlston, R.S.: Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat. Genet. 40(5), 616–622 (2008)Google Scholar
  3. Ghosh, A., Wright, F., Zou, F.: Unified analysis of secondary traits in case-control association studies. J. Am. Stat. Assoc. 108, 566–576 (2013)MathSciNetCrossRefGoogle Scholar
  4. Ho, D.E., Imai, K., King, G., Stuart, E.A.: MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42(8), 1–28 (2011).
  5. Højsgaard, S., Halekoh, U., Yan, J.: The R package geepack for generalized estimating equations. J. Stat. Softw. 15(2), 1–11 (2005)Google Scholar
  6. Lettre, G., Jackson, A.U., Gieger, C., Schumacher, F.R., Berndt, S.I., Sanna, S., Eyheramendy, S., Voight, B.F., Butler, J.L., Guiducci, C., Illig, T., Hackett, R., Heid, I.M., Jacobs, K.B., Lyssenko, V., Uda, M., Boehnke, M., Chanock, S.J., Groop, L.C., Hu, F.B., Isomaa, B., Kraft, P., Peltonen, L., Salomaa, V., Schlessinger, D., Hunter, D.J., Hayes, R.B., Abecasis, G.R., Wichmann, H.E., Mohlke, K.L., Hirschhorn, J.N.: Identification of ten loci associated with height highlights new biological pathways in human growth. Nat. Genet. 40(5), 584–591 (2008)CrossRefGoogle Scholar
  7. Lin, D.Y., Zeng, D.: Proper analysis of secondary phenotype data in case-control association studies. Genet. Epidemiol. 33(3), 356–365 (2009)Google Scholar
  8. Monsees, G.M., Tamimi, R.M., Kraft, P.: Genome-wide association scans for secondary traits using case-control samples. Genet. Epidemiol. 33, 717–728 (2009)CrossRefzbMATHGoogle Scholar
  9. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2014).
  10. Robins, J., Rotnitzky, A., Zhao, L.: Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89, 198–203 (1994)MathSciNetGoogle Scholar
  11. Rosenbaum, P., Rubin, D.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)MathSciNetCrossRefGoogle Scholar
  12. Sanna, S., Jackson, A.U., Nagaraja, R., Willer, C.J., Chen, W.M., Bonnycastle, L.L., Shen, H., Timpson, N., Lettre, G., Usala, G., Chines, P.S., Stringham, H.M., Scott, L.J., Dei, M., Lai, S., Albai, G., Crisponi, L., Naitza, S., Doheny, K.F., Pugh, E.W., Ben-Shlomo, Y., Ebrahim, S., Lawlor, D.A., Bergman, R.N., Watanabe, R.M., Uda, M., Tuomilehto, J., Coresh, J., Hirschhorn, J.N., Shuldiner, A.R., Schlessinger, D., Collins, F.S., Davey Smith, G., Boerwinkle, E., Cao, A., Boehnke, M., Abecasis, G.R., Mohlke, K.L.: Common variants in the GDF5-UQCC region are associated with variation in human height. Nat. Genet. 40(2), 198–203 (2008)CrossRefGoogle Scholar
  13. Schifano, E., Li, L., Christiani, D., Lin, X.: Genome-wide association analysis for multiple continuous phenotypes. AJHG 92(5), 744–759 (2013)CrossRefGoogle Scholar
  14. Tchetgen Tchetgen, E.: A general regression framework for a secondary outcome in case-control studies. Biostatistics 15, 117–128 (2014)CrossRefGoogle Scholar
  15. VanderWeele, T.J., Asomaning, K., Tchetgen Tchetgen, E.J., Han, Y., Spitz, M.R., Shete, S., Wu, X., Gaborieau, V., Wang, Y., McLaughlin, J., Hung, R.J., Brennan, P., Amos, C.I., Christiani, D.C., Lin, X.: Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. Am. J. Epidemiol. 175(10), 1013–1020 (2012)Google Scholar
  16. Wei, J., Carrroll, R., Muller, U., Van Keilegon, I., Chatterjee, N.: Locally efficient estimation for homoscedastic regression in the secondary analysis of case-control data. J. Roy. Stat. Soc. Ser. B 75, 186–206 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of ConnecticutStorrsUSA

Personalised recommendations