Skip to main content

Advertisement

Log in

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

  • Original Research
  • Published:
Behavior Genetics Aims and scope Submit manuscript

Abstract

Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Agresti A, Kateri M (2011) Categorical data analysis. Springer, New York

    Google Scholar 

  • Allen NE, Sudlow C, Peakman T, Collins R et al (2014) UK biobank data: come and get it. Sci Transl Med 6:224

    Article  Google Scholar 

  • Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, Norton PA (2011) Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstet Gynecol 118:1345

    Article  Google Scholar 

  • Bates DM (2010) lme4: Mixed-effects modeling with r

  • Benyamin B, Visscher PM, McRae AF (2009) Family-based genome-wide association studies

  • Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25

    Google Scholar 

  • Breslow NE, Lin X (1995) Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82:81–91

    Article  Google Scholar 

  • Chen H (2019) Gmmat: Generalized linear mixed model association tests version 1.1. 0

  • Chen M-H, Yang Q (2009) Gwaf: an r package for genome-wide association analyses with family data. Bioinformatics 26:580–581

    Article  Google Scholar 

  • Chen M-H, Liu X, Wei F, Larson MG, Fox CS, Vasan RS, Yang Q (2011) A comparison of strategies for analyzing dichotomous outcomes in genome-wide association studies with general pedigrees. Genet Epidemiol 35:650–657

    Article  Google Scholar 

  • Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A et al (2006) A genome wide association study identifies il23r as an inflammatory bowel disease gene. Science 314:1461–1463

    Article  Google Scholar 

  • Eu-Ahsunthornwattana J, Miller EN, Fakiola M, Jeronimo SM, Blackwell JM, Cordell HJ (2014) Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS Genet 10:e1004445

    Article  Google Scholar 

  • Gage JL, de Leon N, Clayton MK (2018) Comparing genome-wide association study results from different measurements of an underlying phenotype. G3: Genes Genom Genet 8(11):3715–3722

    Article  Google Scholar 

  • Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP (2019) Genetic association testing using the genesis r/bioconductor package. Bioinformatics 35:5346

    PubMed  Google Scholar 

  • Graham RR, Cotsapas C, Davies L, Hackett R, Lessard CJ, Leon JM, Burtt NP, Guiducci C, Parkin M, Gates C et al (2008) Genetic variants near tnfaip3 on 6q23 are associated with systemic lupus erythematosus. Nat Genet 40:1059

    Article  Google Scholar 

  • Halekoh U, Højsgaard S, Yan J et al (2006) The r package geepack for generalized estimating equations. J Stat Softw 15:1–11

    Article  Google Scholar 

  • Hardin JW, Hilbe JM (2012) Generalized estimating equations. Chapman and Hall/CRC, Boca Raton

    Book  Google Scholar 

  • Harville DA, Mee RW (1984) A mixed-model procedure for analyzing ordered categorical data. Biometrics 40:393–408

    Article  Google Scholar 

  • Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-y, Freimer NB, Sabatti C, Eskin E, et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348

  • Kranzler HR, Zhou H, Kember RL, Smith RV, Justice AC, Damrauer S, Tsao PS, Klarin D, Baras A, Reid J et al (2019) Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun 10:1499

    Article  Google Scholar 

  • Lee SH, Goddard ME, Visscher PM, van der Werf JH (2010) Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits. Genet Selection Evol 42:22

    Article  Google Scholar 

  • Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Amer J Hum Genet 88:294–305

    Article  Google Scholar 

  • Li X, Basu S, Miller MB, Iacono WG, McGue M (2011) A rapid generalized least squares model for a genome-wide quantitative trait association analysis in families. Hum Hered 71:67–82

    Article  Google Scholar 

  • Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22

    Article  Google Scholar 

  • Lin X, Breslow NE (1996) Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc 91:1007–1016

    Article  Google Scholar 

  • Lipsitz SR, Dear KB, Zhao L (1994) Jackknife estimators of variance for parameter estimates from estimating equations with applications to clustered survival data. Biometrics 50:842–846

    Article  Google Scholar 

  • Madsen L, Birkes D (2013) Simulating dependent discrete data. J Stat Comput Simul 83:677–691

    Article  Google Scholar 

  • Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, Schork N, Sul JH, Iacono WG, McGue M (2012) The Minnesota center for twin and family research genome-wide association study. Twin Res Hum Genet 15(6):767–774

    Article  Google Scholar 

  • Paik MC (1988) Repeated measurement analysis for nonnormal data in small samples. Commun Stat 17:1155–1171

    Article  Google Scholar 

  • Park JY, Wu C, Basu S, McGue M, Pan W (2018) Adaptive snp-set association testing in generalized linear mixed models with application to family studies. Behav Genet 48:55–66

    Article  Google Scholar 

  • Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459

    Article  Google Scholar 

  • Rabe-Hesketh S, Skrondal A, Gjessing HK (2008) Biometrical modeling of twin and family data using standard mixed model software. Biometrics 64:280–288

    Article  Google Scholar 

  • Robins LN (1988) The composite international diagnostic interview. Arch Gen Psychiatry 45(12):1069

    Article  Google Scholar 

  • Robins L, Babor T, Cottler L (1987) Composite international diagnostic interview: expanded substance abuse module. Authors, St. Louis

  • Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12(1):77. https://doi.org/10.1186/1471-2105-12-77

    Article  Google Scholar 

  • Sun Y, Chang S, Wang F, Sun H, Ni Z, Yue W, Zhou H, Gelernter J, Malison RT, Kalayasiri R et al (2019) Genome-wide association study of alcohol dependence in male han chinese and cross-ethnic polygenic risk score comparison. Transl Psychiatry 9:1–10

    Article  Google Scholar 

  • Teerlink C, Farnham J, Allen-Brady K, Camp NJ, Thomas A, Leachman S, Cannon-Albright L (2012) A unique genome-wide association analysis in extended utah high-risk pedigrees identifies a novel melanoma risk variant on chromosome arm 10q. Hum Genet 131:77–85

    Article  Google Scholar 

  • Verhulst B, Neale MC, Kendler KS (2015) The heritability of alcohol use disorders: a meta-analysis of twin and adoption studies. Psychol Med 45:1061–1072

    Article  Google Scholar 

  • Wu C, DeWan A, Hoh J, Wang Z (2011) A comparison of association methods correcting for population stratification in case-control studies. Ann Hum Genet 75:418–427

    Article  Google Scholar 

  • Yan Y, Burbridge C, Shi J, Liu J, Kusalik A (2018) Comparing four genome-wide association study (gwas) programs with varied input data quantity. In 2018 IEEE international conference on bioinformatics and biomedicine (BIBM)

  • Yang J, Lee SH, Goddard ME, Visscher PM (2011) Gcta: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82

    Article  Google Scholar 

  • Zheng X (2013) A tutorial for the r package snprelate. University of Washington, Washington, USA

  • Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by NIH Grant Nos. R01-DA033958 and R21-DA046188 (PI: Saonli Basu).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Souvik Seal.

Ethics declarations

Conflict of interest

Souvik Seal, Jeffrey A. Boatman, Saonli Basu, and Matt McGue have no conflict of interests to declare.

Human and Animal Rights

As part of the Genes, Environment and Development Initiative (GEDI), the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study (GWAS). The procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national).

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Edited by Stacey Cherny.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seal, S., Boatman, J.A., McGue, M. et al. Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data. Behav Genet 50, 423–439 (2020). https://doi.org/10.1007/s10519-020-10010-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10519-020-10010-2

Keywords

Navigation