Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Seal, Souvik; Boatman, Jeffrey A.; McGue, Matt; Basu, Saonli

doi:10.1007/s10519-020-10010-2

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Original Research
Published: 17 August 2020

Volume 50, pages 423–439, (2020)
Cite this article

Behavior Genetics Aims and scope Submit manuscript

Souvik Seal ORCID: orcid.org/0000-0003-3268-610X¹,
Jeffrey A. Boatman¹,
Matt McGue² &
…
Saonli Basu¹

353 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Article Open access 25 May 2023

Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes

Article 14 November 2015

BG2: Bayesian variable selection in generalized linear mixed models with nonlocal priors for non-Gaussian GWAS data

Article Open access 15 September 2023

References

Agresti A, Kateri M (2011) Categorical data analysis. Springer, New York
Google Scholar
Allen NE, Sudlow C, Peakman T, Collins R et al (2014) UK biobank data: come and get it. Sci Transl Med 6:224
Article Google Scholar
Allen-Brady K, Cannon-Albright L, Farnham JM, Teerlink C, Vierhout ME, van Kempen LC, Kluivers KB, Norton PA (2011) Identification of six loci associated with pelvic organ prolapse using genome-wide association analysis. Obstet Gynecol 118:1345
Article Google Scholar
Bates DM (2010) lme4: Mixed-effects modeling with r
Benyamin B, Visscher PM, McRae AF (2009) Family-based genome-wide association studies
Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25
Google Scholar
Breslow NE, Lin X (1995) Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82:81–91
Article Google Scholar
Chen H (2019) Gmmat: Generalized linear mixed model association tests version 1.1. 0
Chen M-H, Yang Q (2009) Gwaf: an r package for genome-wide association analyses with family data. Bioinformatics 26:580–581
Article Google Scholar
Chen M-H, Liu X, Wei F, Larson MG, Fox CS, Vasan RS, Yang Q (2011) A comparison of strategies for analyzing dichotomous outcomes in genome-wide association studies with general pedigrees. Genet Epidemiol 35:650–657
Article Google Scholar
Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ, Steinhart AH, Abraham C, Regueiro M, Griffiths A et al (2006) A genome wide association study identifies il23r as an inflammatory bowel disease gene. Science 314:1461–1463
Article Google Scholar
Eu-Ahsunthornwattana J, Miller EN, Fakiola M, Jeronimo SM, Blackwell JM, Cordell HJ (2014) Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS Genet 10:e1004445
Article Google Scholar
Gage JL, de Leon N, Clayton MK (2018) Comparing genome-wide association study results from different measurements of an underlying phenotype. G3: Genes Genom Genet 8(11):3715–3722
Article Google Scholar
Gogarten SM, Sofer T, Chen H, Yu C, Brody JA, Thornton TA, Rice KM, Conomos MP (2019) Genetic association testing using the genesis r/bioconductor package. Bioinformatics 35:5346
PubMed Google Scholar
Graham RR, Cotsapas C, Davies L, Hackett R, Lessard CJ, Leon JM, Burtt NP, Guiducci C, Parkin M, Gates C et al (2008) Genetic variants near tnfaip3 on 6q23 are associated with systemic lupus erythematosus. Nat Genet 40:1059
Article Google Scholar
Halekoh U, Højsgaard S, Yan J et al (2006) The r package geepack for generalized estimating equations. J Stat Softw 15:1–11
Article Google Scholar
Hardin JW, Hilbe JM (2012) Generalized estimating equations. Chapman and Hall/CRC, Boca Raton
Book Google Scholar
Harville DA, Mee RW (1984) A mixed-model procedure for analyzing ordered categorical data. Biometrics 40:393–408
Article Google Scholar
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S-y, Freimer NB, Sabatti C, Eskin E, et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348
Kranzler HR, Zhou H, Kember RL, Smith RV, Justice AC, Damrauer S, Tsao PS, Klarin D, Baras A, Reid J et al (2019) Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun 10:1499
Article Google Scholar
Lee SH, Goddard ME, Visscher PM, van der Werf JH (2010) Using the realized relationship matrix to disentangle confounding factors for the estimation of genetic variance components of complex traits. Genet Selection Evol 42:22
Article Google Scholar
Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Amer J Hum Genet 88:294–305
Article Google Scholar
Li X, Basu S, Miller MB, Iacono WG, McGue M (2011) A rapid generalized least squares model for a genome-wide quantitative trait association analysis in families. Hum Hered 71:67–82
Article Google Scholar
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Article Google Scholar
Lin X, Breslow NE (1996) Bias correction in generalized linear mixed models with multiple components of dispersion. J Am Stat Assoc 91:1007–1016
Article Google Scholar
Lipsitz SR, Dear KB, Zhao L (1994) Jackknife estimators of variance for parameter estimates from estimating equations with applications to clustered survival data. Biometrics 50:842–846
Article Google Scholar
Madsen L, Birkes D (2013) Simulating dependent discrete data. J Stat Comput Simul 83:677–691
Article Google Scholar
Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, Schork N, Sul JH, Iacono WG, McGue M (2012) The Minnesota center for twin and family research genome-wide association study. Twin Res Hum Genet 15(6):767–774
Article Google Scholar
Paik MC (1988) Repeated measurement analysis for nonnormal data in small samples. Commun Stat 17:1155–1171
Article Google Scholar
Park JY, Wu C, Basu S, McGue M, Pan W (2018) Adaptive snp-set association testing in generalized linear mixed models with application to family studies. Behav Genet 48:55–66
Article Google Scholar
Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459
Article Google Scholar
Rabe-Hesketh S, Skrondal A, Gjessing HK (2008) Biometrical modeling of twin and family data using standard mixed model software. Biometrics 64:280–288
Article Google Scholar
Robins LN (1988) The composite international diagnostic interview. Arch Gen Psychiatry 45(12):1069
Article Google Scholar
Robins L, Babor T, Cottler L (1987) Composite international diagnostic interview: expanded substance abuse module. Authors, St. Louis
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12(1):77. https://doi.org/10.1186/1471-2105-12-77
Article Google Scholar
Sun Y, Chang S, Wang F, Sun H, Ni Z, Yue W, Zhou H, Gelernter J, Malison RT, Kalayasiri R et al (2019) Genome-wide association study of alcohol dependence in male han chinese and cross-ethnic polygenic risk score comparison. Transl Psychiatry 9:1–10
Article Google Scholar
Teerlink C, Farnham J, Allen-Brady K, Camp NJ, Thomas A, Leachman S, Cannon-Albright L (2012) A unique genome-wide association analysis in extended utah high-risk pedigrees identifies a novel melanoma risk variant on chromosome arm 10q. Hum Genet 131:77–85
Article Google Scholar
Verhulst B, Neale MC, Kendler KS (2015) The heritability of alcohol use disorders: a meta-analysis of twin and adoption studies. Psychol Med 45:1061–1072
Article Google Scholar
Wu C, DeWan A, Hoh J, Wang Z (2011) A comparison of association methods correcting for population stratification in case-control studies. Ann Hum Genet 75:418–427
Article Google Scholar
Yan Y, Burbridge C, Shi J, Liu J, Kusalik A (2018) Comparing four genome-wide association study (gwas) programs with varied input data quantity. In 2018 IEEE international conference on bioinformatics and biomedicine (BIBM)
Yang J, Lee SH, Goddard ME, Visscher PM (2011) Gcta: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82
Article Google Scholar
Zheng X (2013) A tutorial for the r package snprelate. University of Washington, Washington, USA
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821
Article Google Scholar

Download references

Acknowledgements

This research was supported by NIH Grant Nos. R01-DA033958 and R21-DA046188 (PI: Saonli Basu).

Author information

Authors and Affiliations

Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
Souvik Seal, Jeffrey A. Boatman & Saonli Basu
Department of Psychology, University of Minnesota, Minneapolis, MN, USA
Matt McGue

Authors

Souvik Seal
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey A. Boatman
View author publications
You can also search for this author in PubMed Google Scholar
Matt McGue
View author publications
You can also search for this author in PubMed Google Scholar
Saonli Basu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Souvik Seal.

Ethics declarations

Conflict of interest

Souvik Seal, Jeffrey A. Boatman, Saonli Basu, and Matt McGue have no conflict of interests to declare.

Human and Animal Rights

As part of the Genes, Environment and Development Initiative (GEDI), the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study (GWAS). The procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national).

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Edited by Stacey Cherny.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seal, S., Boatman, J.A., McGue, M. et al. Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data. Behav Genet 50, 423–439 (2020). https://doi.org/10.1007/s10519-020-10010-2

Download citation

Received: 10 November 2019
Accepted: 27 July 2020
Published: 17 August 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10519-020-10010-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Abstract

Access this article

Similar content being viewed by others

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes

BG2: Bayesian variable selection in generalized linear mixed models with nonlocal priors for non-Gaussian GWAS data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and Animal Rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modeling the Dependence Structure in Genome Wide Association Studies of Binary Phenotypes in Family Data

Abstract

Access this article

Similar content being viewed by others

Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data

Statistical equivalent of the classical TDT for quantitative traits and multivariate phenotypes

BG2: Bayesian variable selection in generalized linear mixed models with nonlocal priors for non-Gaussian GWAS data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and Animal Rights

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation