Skip to main content

Statistical Analysis of Genomic Data

  • Protocol
  • First Online:
Genome-Wide Association Studies and Genomic Prediction

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1019))

Abstract

In this chapter we describe methods for statistical analysis of GWAS data with the goal of quantifying evidence for genomic effects associated with trait variation, while avoiding spurious associations due to evidence not being well quantified or due to population structure.

Single marker analysis and imputation are discussed in Sect. 1, and a Bayesian multi-locus analysis using the BayesQTLBIC R package (1, 2) is described in Sect. 2. The multi-locus analysis, applied in a genomic window, enables local inference of the QTL genetic architecture and is an alternative to imputation. Multi-locus analysis with BayesQTLBIC, including calculation of posterior probabilities for alternative models, posterior probabilities for number of QTL, marginal probabilities for markers, and Bayes factors for individual chromosomes, is demonstrated for simulated QTL data. Methods for correcting the population structure and the possible effects of population structure on power are discussed in Sect. 3. Section 4 considers analysis combining information from linkage and linkage disequilibrium when sampling from a pedigree. Section 5 considers combining information from two different studies—showing that data from an existing QTL mapping family can be profitably used in combination with an association study—prior odds are higher for candidate genes mapping into a QTL region in the QTL mapping family, and, optionally, the number of markers genotyped in an association study can be reduced. Examples using R and the R packages BayesQTLBIC, ncdf are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ball, R. D. 2001: Bayesian methods for quantitative trait loci mapping based on model selection: approximate analysis using the Bayesian Information Criterion. Genetics 159: 1351–1364. http://www.genetics.org/cgi/content/abstract/159/3/1351 Accessed 29/5/2012.

  2. Ball, R. D. 2009: BayesQTLBIC—Bayesian multi-locus QTL analysis based on the BIC criterion. http://cran.r-project.org/web/packages/BayesQTLBIC/index.html Accessed 29/5/2012.

  3. Sen, S. and Churchill, G. A. 2001: A statistical framework for quantitative trait mapping. Genetics 159: 371–387.

    PubMed  CAS  Google Scholar 

  4. Marchini, J. Howie, B., Myers, S., McVean, G. and Donnelly, P. 2007: A new multipoint method for genome-wide association studies via imputation of genotypes. Nature Genetics 8: 1750–1761.

    Google Scholar 

  5. Servin B. and Stephens, M. 2007: Imputation-based analysis of association studies: Candidate regions and quantitative traits. PLoS Genet 3(7): 1296–1308. e114. doi: 10.1371/journal.pgen.0030114

    Google Scholar 

  6. Stephens, M. and Balding, D. J. 2009: Bayesian statistical methods for association studies. Nat. Rev. Genet 10: 681–690.

    Article  PubMed  CAS  Google Scholar 

  7. HapMap project 2012: http://hapmap.ncbi.nlm.nih.gov/ Accessed 31/5/2012.

  8. Raftery, A. E. 1995: Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111–196, Cambridge, Mass.: Blackwells.

    Google Scholar 

  9. Ball, R. D. 2007b: Quantifying evidence for candidate gene polymorphisms: Bayesian analysis combining sequence-specific and quantitative trait loci colocation information. Genetics 177: 2399–2416. http://www.genetics.org/cgi/content/abstract/177/4/2399 Accessed 29/5/2012.

  10. Sillanpää, M. J. and Bhattacharjee, M. 2005: Bayesian association-based fine mapping in small chromosomal segments. Genetics 169: 427–439.

    Article  PubMed  Google Scholar 

  11. Astle, W. and Balding, D. 2009: Population structure and cryptic relatedness in genetic association studies. Statistical Science 24: 451–471.

    Article  Google Scholar 

  12. Devlin, B. and Roeder, K. 1999: Genomic control for association studies. Biometrics 55: 997–1004.

    Article  PubMed  CAS  Google Scholar 

  13. Pritchard, J. K., Stephens, M. and Donnelly, P. 2000a: Inference of population structure using multilocus genotype data, Genetics 155: 945–959.

    PubMed  CAS  Google Scholar 

  14. Pritchard, J. K., Stephens, M., Rosenberg, N. A., and Donnelly, P. 2000b: Association mapping in structured populations, Am. J. Hum. Genet. 67: 170–181.

    Article  PubMed  CAS  Google Scholar 

  15. Falush, D., Stephens, M. and Pritchard, J. K. 2003: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164: 1567–1587.

    PubMed  CAS  Google Scholar 

  16. Setakis, E., Stirnadel, H. and Balding, D. J. 2006: Logistic regression protects against population structure in genetic association studies. Genome Res. 16: 290–296.

    Article  PubMed  CAS  Google Scholar 

  17. Zhang, S. Zhu, X. and Zhao, H. 2003: On a semiparametric test to detect associations between quantitative traits and candidate genes using unrelated individuals. Genet. Epidemiol. 24: 44–56.

    Article  PubMed  Google Scholar 

  18. Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. 2006: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38: 904–909.

    Article  PubMed  CAS  Google Scholar 

  19. Ritland, K. 1996: Estimators for pairwise relatedness and individual inbreeding coefficients. Genetical Research 67: 175–185.

    Article  Google Scholar 

  20. Zhang, Z., Ersoz, E. Lai, C.-Q., … and Buckler, E.S. 2011: Mixed linear model approach adapted for genome-wide association studies. Nature Genetics 42: 355–360.

    Google Scholar 

  21. Spencer, C. C. A., Su, Z., Donnelly, P., Marchini J. 2009: Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip. PLoS Genet 5(5).

    Google Scholar 

  22. Weir, B. S., Anderson, A. D., and Hepler, A. B. 2006: Genetic relatedness analysis: modern data and new challenges. Nature Reviews Genetics 7: 771–780.

    Article  PubMed  CAS  Google Scholar 

  23. Meuwissen, T. H. E., and Goddard, M. E. 2000: Fine mapping of quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics 155: 421–430.

    PubMed  CAS  Google Scholar 

  24. Meuwissen, T. H. E., and Goddard, M. E. 2001: Prediction of identity-by-descent probabilities from marker haplotypes. Genet. Sel. Evol. 33: 605–634

    Article  PubMed  CAS  Google Scholar 

  25. Meuwissen, T. H. E., Karlsen, A., Lien, S., Oldsaker, I., and Goddard, M. 2002: Fine mapping of a quantitative trait locus for twinning rate using combined linkage and linkage disequilibrium mapping. Genetics 161: 373–379.

    PubMed  CAS  Google Scholar 

  26. Falconer, D. S., and Mackay, T. F. C. 1996: Introduction to Quantitative Genetics. Addison-Wesley Longman, Harlow, England.

    Google Scholar 

  27. Lander, E. S. and Botstein, D. 1989: Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121: 185–199.

    PubMed  CAS  Google Scholar 

  28. Gilmour, A. R., Gogel, B. J., Cullis, B. R., and Thompson, R. 2009: ASReml User Guide Release 3.0 VSN International Ltd, Hemel Hempstead, HP1 1ES, UK. www.vsni.co.uk

  29. Ball, R. D. 2003: lmeSplines—an R package for fitting smoothing spline terms in LME models. R News 3/3 p 24–28. http://cran.r-project.org/web/packages/lmeSplines/index.html. Accessed 29/5/2012.

  30. Ball, R. D. 2007: Statistical analysis and experimental design Chapter 8, In: Association mapping in plants. N. C. Oraguzie et al. editors, Springer Verlag, ISBN 0387358447 (69pp).

    Google Scholar 

  31. Liu, J. S., Sabatti, C., Teng, J., Keats, B. J. B. and Risch, N. 2001: Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Research 11: 1716–1724.

    Article  PubMed  CAS  Google Scholar 

  32. Wu, R. and Zeng, Z.-B. 2001: Joint linkage and linkage disequilibrium mapping in natural populations. Genetics 157: 899–909.

    PubMed  CAS  Google Scholar 

  33. Wu, R., Ma, C. X. and Casella, G. 2002: Joint linkage and linkage disequilibrium mapping in natural populations. Genetics 160: 779–792.

    PubMed  CAS  Google Scholar 

  34. Farnir, F., Grisart, B., Coppieters, W., Riquet, J., Berzi, P., et al. 2002: Simultaneous mining of linkage and linkage disequilibrium to fine map quantitative trait loci in outbred half-sib pedigrees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics 161: 275–287.

    PubMed  CAS  Google Scholar 

  35. Perez-Enciso, M. 2003: Fine mapping of complex trait genes combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics 163: 1497–1510.

    PubMed  CAS  Google Scholar 

  36. Fan, R. and Jung, J. 2002: Association Studies of QTL for multi-allele Markers by mixed models. Hum. Hered. 54: 132–150.

    Article  PubMed  Google Scholar 

  37. Lund, M. S., Sorensen, P., Guldbrandtsen, P., and Sorensen, D. A. 2003: Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. Genetics 163: 405–410.

    PubMed  CAS  Google Scholar 

  38. Meuwissen, T. H. E., Hayes, B. J., and Goddard, M. E. 2001: Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829.

    PubMed  CAS  Google Scholar 

  39. Meuwissen, T. H. E., and Goddard, M. E. 2004: Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet. Sel. Evol. 36: 261–279.

    Article  PubMed  CAS  Google Scholar 

  40. Lee, S. H. and van der Werf, J. H. J. 2005: The role of pedigree information in combined linkage disequilibrium and linkage mapping of quantitative trait loci in a general complex pedigree. Genetics 169: 455–466.

    Article  PubMed  CAS  Google Scholar 

  41. Heath, S. C. 1997: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am. J. Hum. Genet. 61: 748–760.

    Article  PubMed  CAS  Google Scholar 

  42. Heath, S. 2003: Loki 2.4.5—A package for multipoint linkage analysis on large pedigrees using reversible jump Markov chain Monte Carlo. Centre National de Génotypage, Evry Cedex, France. http://www.stat.washington.edu/thompson/Genepi/Loki.shtml Accessed 31/5/2012.

  43. Gao, G. and Hoeschele, I. 2005: Approximating identity-by-descent matrices using multiple haplotype configurations on pedigrees. Genetics 171: 365–376.

    Article  PubMed  CAS  Google Scholar 

  44. Ball, R. D. 2004, 2011 : ldDesign — design of experiments for genome-wide association studies version 2 incorporating quantitative traits and case-control studies. http://cran.r-project.org/web/packages/ldDesign/index.html Accessed 29/5/2012.

  45. Ball, R. D. 2005: Experimental designs for reliable detection of linkage disequilibrium in unstructured random population association studies. Genetics 170: 859–873. http://www.genetics.org/cgi/content/abstract/170/2/859 Accessed 29/5/2012.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Ball, R.D. (2013). Statistical Analysis of Genomic Data. In: Gondro, C., van der Werf, J., Hayes, B. (eds) Genome-Wide Association Studies and Genomic Prediction. Methods in Molecular Biology, vol 1019. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-447-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-62703-447-0_7

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-62703-446-3

  • Online ISBN: 978-1-62703-447-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics