Lifetime Data Analysis

, Volume 17, Issue 2, pp 175–194 | Cite as

Missing genetic information in case-control family data with general semi-parametric shared frailty model

  • Anna Graber-Naidich
  • Malka Gorfine
  • Kathleen E. Malone
  • Li Hsu


Case-control family data are now widely used to examine the role of gene-environment interactions in the etiology of complex diseases. In these types of studies, exposure levels are obtained retrospectively and, frequently, information on most risk factors of interest is available on the probands but not on their relatives. In this work we consider correlated failure time data arising from population-based case-control family studies with missing genotypes of relatives. We present a new method for estimating the age-dependent marginalized hazard function. The proposed technique has two major advantages: (1) it is based on the pseudo full likelihood function rather than a pseudo composite likelihood function, which usually suffers from substantial efficiency loss; (2) the cumulative baseline hazard function is estimated using a two-stage estimator instead of an iterative process. We assess the performance of the proposed methodology with simulation studies, and illustrate its utility on a real data example.


Case-control family study Missing genotypes Multivariate survival analysis Frailty model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Becher H, Schmidt S, Chang-Claude J (2003) Reproductive factors and familial predisposition for breast cancer by age 50 years. A case-control-family study for assessing main effects and possible gene-environment interaction. Int J Epidemiol 32: 38–48CrossRefGoogle Scholar
  2. Chatterjee N, Kalaylioglu Z, Shih JH, Gail MH (2006) Case-control and case-only designs with genotype and family history data: estimating relative risk, residual familial aggregation, and cumulative risk. Biometrics 62: 36–48zbMATHCrossRefMathSciNetGoogle Scholar
  3. Chen L, Hsu L, Malone K (2009) A frailty-model based approach to estimating the age dependent penetrance function of candidate genes using population based case-control study designs: an application to data on BRCA1 gene. Biometrics 65: 1105–1114zbMATHCrossRefGoogle Scholar
  4. Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65: 141–151zbMATHCrossRefMathSciNetGoogle Scholar
  5. Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34: 187–220Google Scholar
  6. Duchateau L, Janssen P (2008) The frailty model. Springer, New YorkzbMATHGoogle Scholar
  7. Fine JP, Glidden DV, Lee KE (2003) A simple estimator for a shared frailty regression model. J R Stat Soc 65: 317–329zbMATHCrossRefMathSciNetGoogle Scholar
  8. Genest C, MacKay J (1986) The joy of copulas: bivariate distributions with given marginals. Am Stat 40: 280–283CrossRefMathSciNetGoogle Scholar
  9. Gill RD (1985) Discussion of the paper by D. Clayton and J. Cuzick. J R Stat Soc A 148: 108–109Google Scholar
  10. Gill RD (1989) Non- and semi-parametric maximum likelihood estimators and the von Mises method (Part 1). Scand J Stat 16: 97–128zbMATHMathSciNetGoogle Scholar
  11. Glidden DV (1999) Checking the adequacy of the gamma frailty model for multivariate failure times. Biometrika 86: 381–393zbMATHCrossRefMathSciNetGoogle Scholar
  12. Gorfine M, Zucker DM, Hsu L (2009) Case-control survival analysis with a general semiparametric shared frailty model—a pseudo full likelihood approach. Ann Stat 37: 1489–1517zbMATHCrossRefMathSciNetGoogle Scholar
  13. Henderson R, Oman P (1999) Effect of frailty on marginal regression estimates in survival analysis. J R Stat Soc B 61: 367–379zbMATHCrossRefMathSciNetGoogle Scholar
  14. Hopper JL (2003) Comentary: case-control-family design: a paradigm for future epidemiology reserach. Int J Epidemiol 32: 48–50CrossRefGoogle Scholar
  15. Hougaard P (1986) Survival models for heterogeneous populations derived from stable distributions. Biometrika 73: 387–396zbMATHCrossRefMathSciNetGoogle Scholar
  16. Hougaard P (2000) Analysis of multivariate survival data. Springer, New YorkzbMATHGoogle Scholar
  17. Hsu L, Chen L, Gorfine M, Malone K (2004) Semiparametric estimation of marginal hazard function from casef́bcontrol family studies. Biometrics 60: 936–944CrossRefMathSciNetGoogle Scholar
  18. Hsu L, Gorfine M (2006) Multivariate survival analysis for case-control family data. Biostatistics 7: 387–398zbMATHCrossRefGoogle Scholar
  19. Hsu L, Gorfine M, Malone K (2007) On robustness of marginal regression coefficient estimates and hazard functions in multivariate survival analysis of family data when the frailty distribution is misspecified. Stat Med 26: 4657–4678CrossRefMathSciNetGoogle Scholar
  20. Klein JP (1992) Semiparametric estimation of random effects using the Cox model based on the EM algorithm. Biometrics 48: 795–806CrossRefGoogle Scholar
  21. Kosorok MR, Lee BL, Fine JP (2004) Robust inference for univariate proportional hazards frailty regression models. Ann Stat 32: 1448–1491zbMATHCrossRefMathSciNetGoogle Scholar
  22. Malone KE, Daling JR, Neal C, Suter NM, O’Brien C, Cushing-Haugen K, Jonasdottir TJ, Thompson JD, Ostrander EA (2000) Frequency of BRCA1/BRCA2 mutations in a population-based sample of young breast carcinoma cases. Cancer 88: 1393–1402CrossRefGoogle Scholar
  23. Malone KE, Daling JR, Doody DR, Hsu L, Bernstein L, Coates RJ, Marchbanks PA, Simon MS, McDonald JA, Norman SA, Strom BL, Burkman RT, Ursin G, Deapen D, Weiss LK, Folger S, Madeoy JJ, Friedrichsen DM, Suter NM, Humphrey MC, Spirtas R, Ostrander EA (2006) Prevalence and predictors of BRCA1 and BRCA2 mutations in a population-based study of breast cancer in white and black American women ages 35 to 64 years. Cancer Res 66: 8297–8308CrossRefGoogle Scholar
  24. Marchbanks PA et al (2002) The NICHD women’s contracetive and reproductive experiences study: methods and results. Ann Epidemiol 26: 213–221CrossRefGoogle Scholar
  25. Marshall AW, Olkin I (1988) Families of multivariate distributions. J Am Stat Assoc 83: 834–841zbMATHCrossRefMathSciNetGoogle Scholar
  26. McGilchrist CA (1993) REML estimation for survival models with frailty. Biometrics 49: 221–225CrossRefGoogle Scholar
  27. Nielsen GG, Gill RD, Andersen PK, Sørensen TIA (1992) A counting process approach to maximum likelihood estimation in frailty models. Scand J Stat 19: 25–43zbMATHGoogle Scholar
  28. Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84: 487–493zbMATHCrossRefMathSciNetGoogle Scholar
  29. Ripatti S, Palmgren J (2000) Estimation of multivariate frailty models using penalized partial likelihood. Biometrics 56: 1016–1022zbMATHCrossRefMathSciNetGoogle Scholar
  30. Shih JH, Chatterjee N (2002) Analysis of survival data from case-control family studies. Biometrics 58: 502–509CrossRefMathSciNetGoogle Scholar
  31. Shih JH, Louis TA (1995) Inferences on the association parameter in copula models for bivariate survival data. Biometrics 51: 1384–1399zbMATHCrossRefMathSciNetGoogle Scholar
  32. Vaida F, Xu RH (2000) Proportional hazards model with random effects. Stat Med 19: 3309–3324CrossRefGoogle Scholar
  33. Zeger SL, Liang K-Y, Albert PS (1988) Models for longitudinal data: a generalized estimating equation approach. Biometrics 44: 1049–1060zbMATHCrossRefMathSciNetGoogle Scholar
  34. Zhao LP, Hsu L, Holte S, Chen Y, Quiaoit F, Prentice RL (1998) Combined association and aggregation analysis of data from case-control family studies. Biometrika 85: 299–315CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Anna Graber-Naidich
    • 1
  • Malka Gorfine
    • 1
  • Kathleen E. Malone
    • 2
  • Li Hsu
    • 2
  1. 1.Faculty of Industrial Engineering and ManagementTechnion City, HaifaIsrael
  2. 2.Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattleUSA

Personalised recommendations