Single Marker Family-Based Association Analysis Not Conditional on Parental Information

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1666)

Abstract

Family-based association analysis unconditional on parental genotypes models the effects of observed genotypes. This approach has been shown to have greater power than conditional methods. In this chapter, we review popular association analysis methods accounting for familial correlations: the marginal model using generalized estimating equations (GEE), the mixed model with a polygenic random component, and genome-wide association analyses. The marginal approach does not explicitly model familial correlations but uses the information to improve the efficiency of parameter estimates. This model, using GEE, is useful when the correlation structure is not of interest; the correlations are treated as nuisance parameters. In the mixed model, familial correlations are modeled as random effects, e.g., the polygenic inheritance model accounts for correlations originating from shared genomic components within a family. These unconditional methods provide a flexible modeling framework for general pedigree data to accommodate traits with various distributions and many types of covariate effects. Genome-wide association studies usually test more than 10,000 SNPs and thus traditional statistical methods accounting for the familial correlations often suffer from a computational burden. Multiple approaches that have been recently proposed to avoid this computational issue are reviewed. The single-marker analysis procedures are demonstrated using the R package gee and the ASSOC program in the S.A.G.E. package, including how to prepare input data, conduct the analysis, and interpret the output. ASSOC allows models to include random components of additional familial correlations that may be not sufficiently explained by a polygenic effect and addresses nonnormality of response variables by transformation methods. With its ease of use, ASSOC provides a useful tool for association analysis of large pedigree data.

Key Words

Family-based association test Unconditional method Observed genotype Polygenic inheritance Linear mixed model Marginal model Generalized estimating equations GEE Generalized linear mixed model Variance components ASSOC S.A.G.E. R package gee Working correlation Heritability Genome-wide association studies GWAS 

References

  1. 1.
    Aulchenko YS, de Koning DJ, Haley C (2007) Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177:577–585CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Liang K-Y, Zeger S (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22CrossRefGoogle Scholar
  3. 3.
    Diggle P, Heagerty P, Liang K-Y (2002) Analysis of longitudinal data, 2nd edn. Oxford University Press, New YorkGoogle Scholar
  4. 4.
    Davis CS (2002) Statistical methods for the analysis of repeated measurements. Springer, New YorkGoogle Scholar
  5. 5.
    Zhao L, Prentice R (1990) Correlated binary regression using a quadratic exponential model. Biometrika 77:642–648CrossRefGoogle Scholar
  6. 6.
    Balemia A, Leea A (2009) Comparison of GEE1 and GEE2 estimation applied to clustered logistic regression. J Stat Comput Simul 79:361–378CrossRefGoogle Scholar
  7. 7.
    McLean RA, Sanders WL, Stroup WW (1991) A unified approach to mixed linear models. Am Stat 45:54–64Google Scholar
  8. 8.
    Fisher R (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans Roy Soc Edinb 52:399–433CrossRefGoogle Scholar
  9. 9.
    Breslow NE, Clayton DG (1993) Approximate inference in generalized linear mixed models. J Am Stat Assoc 88:9–25Google Scholar
  10. 10.
    Gray-McGuire C, Bochud M, Goodloe R, Elston RC (2009) Genetic association tests: a method for the joint analysis of family and case-control data. Hum Genomics 4:2–20CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23:1294–1296CrossRefPubMedGoogle Scholar
  12. 12.
    McCulloch CE, Neuhaus JM (2011) Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics 67(1):270–279CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Carroll RJ, Ruppert D (1984) Power transformation when fitting theoretical models to data. J Am Stat Ass 79:321–328CrossRefGoogle Scholar
  14. 14.
    Carroll RJ, Ruppert D (1988) Transformation and weighting in regression. Chapman and Hall/CRC, LondonCrossRefGoogle Scholar
  15. 15.
    Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc Ser B 26:211–252Google Scholar
  16. 16.
    George VT, Elston RC (1987) Testing the association between polymorphic markers and quantitative traits in pedigrees. Genet Epidemiol 4:193–201CrossRefPubMedGoogle Scholar
  17. 17.
    George VT, Elston RC (1988) Generalized Modulus Power Transformations. Comm Stat Theory Meth 17:2933–2952CrossRefGoogle Scholar
  18. 18.
    Elston RC, George VT, Severtson F (1992) The Elston-Stewart algorithm for continuous genotypes and environmental factors. Hum Hered 42:16–27CrossRefPubMedGoogle Scholar
  19. 19.
    Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88(1):76–82CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178(3):1709–1723CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Kang HM, Sul JH, Service, S.K, Zaitlen NA, Kong S, Freimer NB, Sabatti C (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42(4):348–354CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44(7):821–824CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11(4):407–409Google Scholar
  24. 24.
    McPeek MP, Wu X, Ober C (2004) Best linear unbiased allele-frequency estimation in complex pedigrees. Biometrics 60:359–367CrossRefPubMedGoogle Scholar
  25. 25.
    Thornton T, McPeek MS (2007) Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet 81(2):321–337CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Park S, Lee S, Lee Y, Herold C, Hooli B, Mullin K, Park T, Park C, Bertram L, Lange C, Tanzi R, Won S (2015) Adjusting heterogeneous ascertainment bias for genetic association analysis with extended families. BMC Med Genet 19(16):62CrossRefGoogle Scholar
  27. 27.
    Won S, Lange C (2013) A general framework for robust and efficient association analysis in family-based designs: quantitative and dichotomous phenotypes. Stat Med 32(25):4482–4498CrossRefPubMedGoogle Scholar
  28. 28.
    Amin N, van Duijn CM, Aulchenko YS (2007) A genomic background based method for association analysis in related individuals. PLoS One 2:e1274CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208CrossRefPubMedGoogle Scholar
  30. 30.
    Pfeiffer RM, Pee D, Landi MT (2008) On combining family and case-control studies. Genet Epidemiol 32:638–646CrossRefPubMedGoogle Scholar
  31. 31.
    Pan W (2001) Akaike's information criterion in generalized estimating equations. Biometrics 57:120–125CrossRefPubMedGoogle Scholar
  32. 32.
    Ritland K (1996) Inferring the genetic basis of inbreeding depression in plants. Genome 39:1–8CrossRefPubMedGoogle Scholar
  33. 33.
    Agresti A (2002) Categorical data analysis, 2nd edn. John Wiley and Sons, New YorkCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Molecular Diagnostics Team, IVD Business UnitSK TelecomSeoulSouth Korea
  2. 2.Department of Public Health Science, Graduate School of Public HealthSeoul National UniversitySeoulSouth Korea

Personalised recommendations