Abstract
Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study.
Similar content being viewed by others
References
Agresti A, Kateri M (2011) Categorical data analysis. Springer, New York
Allen NE, Sudlow C, Peakman T, Collins R et al (2014) UK biobank data: come and get it. Sci Transl Med 6:224
Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, Norton PA (2011) Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstet Gynecol 118:1345
Bates DM (2010) lme4: Mixed-effects modeling with r
Benyamin B, Visscher PM, McRae AF (2009) Family-based genome-wide association studies
Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
Breslow NE, Lin X (1995) Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82:81–91
Chen H (2019) Gmmat: Generalized linear mixed model association tests version 1.1. 0
Chen M-H, Yang Q (2009) Gwaf: an r package for genome-wide association analyses with family data. Bioinformatics 26:580–581
Chen M-H, Liu X, Wei F, Larson MG, Fox CS, Vasan RS, Yang Q (2011) A comparison of strategies for analyzing dichotomous outcomes in genome-wide association studies with general pedigrees. Genet Epidemiol 35:650–657
Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A et al (2006) A genome wide association study identifies il23r as an inflammatory bowel disease gene. Science 314:1461–1463
Eu-Ahsunthornwattana J, Miller EN, Fakiola M, Jeronimo SM, Blackwell JM, Cordell HJ (2014) Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS Genet 10:e1004445
Gage JL, de Leon N, Clayton MK (2018) Comparing genome-wide association study results from different measurements of an underlying phenotype. G3: Genes Genom Genet 8(11):3715–3722
Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP (2019) Genetic association testing using the genesis r/bioconductor package. Bioinformatics 35:5346
Graham RR, Cotsapas C, Davies L, Hackett R, Lessard CJ, Leon JM, Burtt NP, Guiducci C, Parkin M, Gates C et al (2008) Genetic variants near tnfaip3 on 6q23 are associated with systemic lupus erythematosus. Nat Genet 40:1059
Halekoh U, Højsgaard S, Yan J et al (2006) The r package geepack for generalized estimating equations. J Stat Softw 15:1–11
Hardin JW, Hilbe JM (2012) Generalized estimating equations. Chapman and Hall/CRC, Boca Raton
Harville DA, Mee RW (1984) A mixed-model procedure for analyzing ordered categorical data. Biometrics 40:393–408
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-y, Freimer NB, Sabatti C, Eskin E, et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348
Kranzler HR, Zhou H, Kember RL, Smith RV, Justice AC, Damrauer S, Tsao PS, Klarin D, Baras A, Reid J et al (2019) Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun 10:1499
Lee SH, Goddard ME, Visscher PM, van der Werf JH (2010) Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits. Genet Selection Evol 42:22
Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Amer J Hum Genet 88:294–305
Li X, Basu S, Miller MB, Iacono WG, McGue M (2011) A rapid generalized least squares model for a genome-wide quantitative trait association analysis in families. Hum Hered 71:67–82
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Lin X, Breslow NE (1996) Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc 91:1007–1016
Lipsitz SR, Dear KB, Zhao L (1994) Jackknife estimators of variance for parameter estimates from estimating equations with applications to clustered survival data. Biometrics 50:842–846
Madsen L, Birkes D (2013) Simulating dependent discrete data. J Stat Comput Simul 83:677–691
Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, Schork N, Sul JH, Iacono WG, McGue M (2012) The Minnesota center for twin and family research genome-wide association study. Twin Res Hum Genet 15(6):767–774
Paik MC (1988) Repeated measurement analysis for nonnormal data in small samples. Commun Stat 17:1155–1171
Park JY, Wu C, Basu S, McGue M, Pan W (2018) Adaptive snp-set association testing in generalized linear mixed models with application to family studies. Behav Genet 48:55–66
Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459
Rabe-Hesketh S, Skrondal A, Gjessing HK (2008) Biometrical modeling of twin and family data using standard mixed model software. Biometrics 64:280–288
Robins LN (1988) The composite international diagnostic interview. Arch Gen Psychiatry 45(12):1069
Robins L, Babor T, Cottler L (1987) Composite international diagnostic interview: expanded substance abuse module. Authors, St. Louis
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12(1):77. https://doi.org/10.1186/1471-2105-12-77
Sun Y, Chang S, Wang F, Sun H, Ni Z, Yue W, Zhou H, Gelernter J, Malison RT, Kalayasiri R et al (2019) Genome-wide association study of alcohol dependence in male han chinese and cross-ethnic polygenic risk score comparison. Transl Psychiatry 9:1–10
Teerlink C, Farnham J, Allen-Brady K, Camp NJ, Thomas A, Leachman S, Cannon-Albright L (2012) A unique genome-wide association analysis in extended utah high-risk pedigrees identifies a novel melanoma risk variant on chromosome arm 10q. Hum Genet 131:77–85
Verhulst B, Neale MC, Kendler KS (2015) The heritability of alcohol use disorders: a meta-analysis of twin and adoption studies. Psychol Med 45:1061–1072
Wu C, DeWan A, Hoh J, Wang Z (2011) A comparison of association methods correcting for population stratification in case-control studies. Ann Hum Genet 75:418–427
Yan Y, Burbridge C, Shi J, Liu J, Kusalik A (2018) Comparing four genome-wide association study (gwas) programs with varied input data quantity. In 2018 IEEE international conference on bioinformatics and biomedicine (BIBM)
Yang J, Lee SH, Goddard ME, Visscher PM (2011) Gcta: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82
Zheng X (2013) A tutorial for the r package snprelate. University of Washington, Washington, USA
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821
Acknowledgements
This research was supported by NIH Grant Nos. R01-DA033958 and R21-DA046188 (PI: Saonli Basu).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Souvik Seal, Jeffrey A. Boatman, Saonli Basu, and Matt McGue have no conflict of interests to declare.
Human and Animal Rights
As part of the Genes, Environment and Development Initiative (GEDI), the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study (GWAS). The procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national).
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Edited by Stacey Cherny.
Rights and permissions
About this article
Cite this article
Seal, S., Boatman, J.A., McGue, M. et al. Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data. Behav Genet 50, 423–439 (2020). https://doi.org/10.1007/s10519-020-10010-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10519-020-10010-2