The impact of disregarding family structure on genome-wide association analysis of complex diseases in cohorts with simple pedigrees

  • Alireza NazarianEmail author
  • Konstantin G. Arbeev
  • Alexander M. KulminskiEmail author
Human Genetics • Original Paper


The generalized linear mixed models (GLMMs) methodology is the standard framework for genome-wide association studies (GWAS) of complex diseases in family-based cohorts. Fitting GLMMs in very large cohorts, however, can be computationally demanding. Also, the modified versions of GLMM using faster algorithms may underperform, for instance when a single nucleotide polymorphism (SNP) is correlated with fixed-effects covariates. We investigated the extent to which disregarding family structure may compromise GWAS in cohorts with simple pedigrees by contrasting logistic regression models (i.e., with no family structure) to three LMMs-based ones. Our analyses showed that the logistic regression models in general resulted in smaller P values compared with the LMMs-based models; however, the differences in P values were mostly minor. Disregarding family structure had little impact on determining disease-associated SNPs at genome-wide level of significance (i.e., P < 5E-08) as the four P values resulted from the tested methods for any SNP were all below or all above 5E-08. Nevertheless, larger discrepancies were detected between logistic regression and LMMs-based models at suggestive level of significance (i.e., of 5E-08 ≤ P < 5E-06). The SNP effects estimated by the logistic regression models were not statistically different from those estimated by GLMMs that implemented Wald’s test. However, several SNP effects were significantly different from their counterparts in LMMs analyses. We suggest that fitting GLMMs with Wald’s test on a pre-selected subset of SNPs obtained from logistic regression models can ensure the balance between the speed of analyses and the accuracy of parameters.


Complex diseases Family-based GWAS Logistic regression GLMMs framework 



Funding support for the Late Onset Alzheimer’s Disease Family Study (LOADFS) was provided through the Division of Neuroscience, NIA. The LOADFS includes a genome-wide association study funded as part of the Division of Neuroscience, NIA. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by Genetic Consortium for Late Onset Alzheimer’s Disease.

The Framingham Heart Study (FHS) is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (Contract No. N01-HC-25195 and HHSN268201500001I). Funding for SHARe Affymetrix genotyping was provided by NHLBI Contract N02-HL-64278. SHARe Illumina genotyping was provided under an agreement between Illumina and Boston University. Funding for CARe genotyping was provided by NHLBI Contract N01-HC-65226. Funding support for the Framingham Dementia dataset was provided by NIH/NIA grant R01 AG08122. Funding support for the Framingham Inflammatory Markers was provided by NIH grants R01 HL064753, R01 HL076784, and R01 AG028321. Funding support for the Framingham C-reactive protein dataset was provided by NIH grants R01 HL064753, R01 HL076784, and R01 AG028321. Funding support for the Framingham Adiponectin dataset was provided by NIH/NHLBI grant R01-DK-080739. Funding support for the Framingham Interleukin-6 dataset was provided by NIH grants R01 HL064753, R01 HL076784, and R01 AG028321.

Authors’ contributions

The authors’ responsibilities were as follows: A.N. and A.M.K. designed the study, K.G.A and A.N. prepared and analyzed data, A.N. and A.M.K. wrote the manuscript, and all authors read and approved the final manuscript.

Funding information

This research was supported by Grants from the National Institute on Aging (P01AG043352 and R01AG047310). The funders had no role in study design, data collection and analysis, decision to publish, or manuscript preparation.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval and consent to participate

This study focuses on secondary analysis of data obtained from dbGaP upon approval by local Institutional Review Board (IRB), and does not involve gathering data from human subjects directly. All procedures performed were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.


This manuscript was not prepared in collaboration with LOADFS investigators and does not necessarily reflect the opinions or views of LOADFS. This manuscript was not prepared in collaboration with investigators of the FHS and does not necessarily reflect the opinions or views of the FHS, Boston University, or NHLBI. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Supplementary material

13353_2019_526_MOESM1_ESM.docx (2.8 mb)
ESM 1 (DOCX 2.83 mb)


  1. Allison PD (1999) Comparing logit and probit coefficients across groups. Sociol Methods Res 28:186–208. CrossRefGoogle Scholar
  2. Aulchenko YS, de Koning D-J, Haley C (2007) Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177:577–585. CrossRefPubMedPubMedCentralGoogle Scholar
  3. Aulchenko YS, Struchalin MV, van Duijn CM (2010) ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11:134. CrossRefPubMedPubMedCentralGoogle Scholar
  4. Bakshi A, Zhu Z, Vinkhuyzen AAE et al (2016) Fast set-based association analysis using summary data from GWAS identifies novel gene loci for human complex traits. Sci Rep 6:32894. CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67:1–48. CrossRefGoogle Scholar
  6. Chen H, Wang C, Conomos MP et al (2016) Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am J Hum Genet 98:653–666. CrossRefPubMedPubMedCentralGoogle Scholar
  7. Conomos MP, Miller MB, Thornton TA (2015) Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet Epidemiol 39:276–293. CrossRefPubMedPubMedCentralGoogle Scholar
  8. Dawber TR, Meadors GF, Moore FE (1951) Epidemiological approaches to heart disease: the Framingham study. Am J Public Health Nations Health 41:279–286. CrossRefGoogle Scholar
  9. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004. CrossRefGoogle Scholar
  10. Eu-ahsunthornwattana J, Miller EN, Fakiola M et al (2014) Comparison of methods to account for relatedness in genome-wide association studies with family-based data. PLoS Genet 10:e1004445. CrossRefPubMedPubMedCentralGoogle Scholar
  11. Evangelou E, Trikalinos TA, Salanti G, Ioannidis JPA (2006) Family-based versus unrelated case-control designs for genetic associations. PLoS Genet 2:e123. CrossRefPubMedPubMedCentralGoogle Scholar
  12. Feinleib M, Kannel WB, Garrison RJ et al (1975) The Framingham offspring study: design and preliminary data. Prev Med 4:518–525. CrossRefGoogle Scholar
  13. Gatz M, Pedersen NL, Berg S et al (1997) Heritability for Alzheimer’s disease: the study of dementia in Swedish twins. J Gerontol A Biol Sci Med Sci 52:M117–M125. CrossRefGoogle Scholar
  14. Gatz M, Reynolds CA, Fratiglioni L et al (2006) Role of genes and environments for explaining Alzheimer disease. Arch Gen Psychiatry 63:168–174. CrossRefPubMedGoogle Scholar
  15. Gordon D, Haynes C, Johnnidis C et al (2004) A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur J Hum Genet 12:752–761. CrossRefPubMedPubMedCentralGoogle Scholar
  16. Kang HM, Sul JH, Service SK et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354. CrossRefPubMedPubMedCentralGoogle Scholar
  17. Kulminski AM, Loika Y, Culminskaya I et al (2016) Explicating heterogeneity of complex traits has strong potential for improving GWAS efficiency. Sci Rep 6:35390. CrossRefPubMedPubMedCentralGoogle Scholar
  18. Kupper N, Willemsen G, Riese H et al (2005) Heritability of daytime ambulatory blood pressure in an extended twin design. Hypertens 45:80–85. CrossRefGoogle Scholar
  19. Kupper N, Ge D, Treiber FA, Snieder H (2006) Emergence of novel genetic effects on blood pressure and hemodynamics in adolescence: the Georgia Cardiovascular Twin Study. Hypertens 47:948–954. CrossRefGoogle Scholar
  20. Lee JH, Cheng R, Graff-Radford N et al (2008) Analyses of the national institute on aging late-onset Alzheimer’s disease family study: implication of additional loci. Arch Neurol 65:1518–1526. CrossRefPubMedPubMedCentralGoogle Scholar
  21. Lloyd-Jones LR, Robinson MR, Yang J, Visscher PM (2018) Transformation of summary statistics from linear mixed model association on all-or-none traits to odds ratio. Genetics 208:1397–1408. CrossRefPubMedPubMedCentralGoogle Scholar
  22. Manichaikul A, Chen W-M, Williams K et al (2012) Analysis of family- and population-based samples in cohort genome-wide association studies. Hum Genet 131:275–287. CrossRefPubMedGoogle Scholar
  23. McArdle PF, O’Connell JR, Pollin TI et al (2007) Accounting for relatedness in family based genetic association studies. Hum Hered 64:234–242. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Nazarian A, Gezan SA (2016) GenoMatrix: a software package for pedigree-based and genomic prediction analyses on complex traits. J Hered 107:372–379. CrossRefPubMedPubMedCentralGoogle Scholar
  25. Nazarian A, Yashin AI, Kulminski AM (2018) Genome-wide analysis of genetic predisposition to Alzheimer’s disease and related sex disparities. Alzheimer’s Research & Therapy 11:5.
  26. Price AL, Zaitlen NA, Reich D, Patterson N (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11:459–463. CrossRefPubMedPubMedCentralGoogle Scholar
  27. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. CrossRefPubMedPubMedCentralGoogle Scholar
  28. Qin H, Morris N, Kang SJ et al (2010) Interrogating local population structure for fine mapping in genome-wide association studies. Bioinformatics 26:2961–2968. CrossRefPubMedPubMedCentralGoogle Scholar
  29. Shih PB, O’Connor DT (2008) Hereditary determinants of human hypertension. Hypertension 51:1456–1464. CrossRefPubMedPubMedCentralGoogle Scholar
  30. Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516PubMedPubMedCentralGoogle Scholar
  31. Splansky GL, Corey D, Yang Q et al (2007) The third generation cohort of the national heart, lung, and blood institute’s Framingham Heart Study: design, recruitment, and initial examination. Am J Epidemiol 165:1328–1335. CrossRefPubMedGoogle Scholar
  32. Tang W, Hong Y, Province MA et al (2006) Familial clustering for features of the metabolic syndrome: the National Heart, Lung, and Blood Institute (NHLBI) Family Heart Study. Diabetes Care 29:631–636. CrossRefPubMedGoogle Scholar
  33. Vattikuti S, Guo J, Chow CC (2012) Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet 8:e1002637. CrossRefPubMedPubMedCentralGoogle Scholar
  34. Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. CrossRefPubMedPubMedCentralGoogle Scholar
  35. Zhou X, Carbonetto P, Stephens M (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet 9:e1003264. CrossRefPubMedPubMedCentralGoogle Scholar
  36. Zondervan KT, Cardon LR (2007) Designing candidate gene and genome-wide case-control association studies. Nat Protoc 2:2492–2501. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Institute of Plant Genetics, Polish Academy of Sciences, Poznan 2019

Authors and Affiliations

  1. 1.Biodemography of Aging Research Unit, Social Science Research InstituteDuke UniversityDurhamUSA

Personalised recommendations