Abstract
With increasing marker density, estimation of recombination rate between a marker and a causal mutation using linkage analysis becomes less important. Instead, linkage disequilibrium (LD) becomes the major indicator for gene mapping through genome-wide association studies (GWAS). In addition to the linkage between the marker and the causal mutation, many other factors may contribute to the LD, including population structure and cryptic relationships among individuals. As statistical methods and software evolve to improve statistical power and computing speed in GWAS, the corresponding outputs must also evolve to facilitate the interpretation of input data, the analytical process, and final association results. In this chapter, our descriptions focus on (1) considerations in creating a Manhattan plot displaying the strength of LD and locations of markers across a genome; (2) criteria for genome-wide significance threshold and the different appearance of Manhattan plots in single-locus and multiple-locus models; (3) exploration of population structure and kinship among individuals; (4) quantile–quantile (QQ) plot; (5) LD decay across the genome and LD between the associated markers and their neighbors; (6) exploration of individual and marker information on Manhattan and QQ plots via interactive visualization using HTML. The ultimate objective of this chapter is to help users to connect input data to GWAS outputs to balance power and false positives, and connect GWAS outputs to the selection of candidate genes using LD extent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Dimensionality M, Pan Q, Hu T et al (2013) Genome-wide association studies and genomic prediction, vol 1019. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-447-0
Yano K, Yamamoto E, Aya K et al (2016) Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat Genet 48:927–934. https://doi.org/10.1038/ng.3596
Gibson G (2010) Hints of hidden heritability in GWAS. Nat Genet 42:558–560. https://doi.org/10.1038/ng0710-558
Lee C-Y, Kim T-S, Lee S et al (2015) Concept of genome-wide association studies. In: Current technologies in plant molecular breeding. Springer, New York, NY, pp 175–204. https://doi.org/10.1007/978-94-017-9996-6_6
Bush WS, Moore JH (2012) Chapter 11: Genome-wide association studies. PLoS Comput Biol 8:e1002822. https://doi.org/10.1371/journal.pcbi.1002822
Wang MH, Cordell HJ, Van Steen K (2019) Statistical methods for genome-wide association studies. Semin Cancer Biol 55:53–60. https://doi.org/10.1016/j.semcancer.2018.04.008
Chen K, Baxter T, Muir WM et al (2007) Genetic resources, genome mapping and evolutionary genomics of the pig (Sus scrofa). Int J Biol Sci 3:153–165
Benson AK, Kelly SA, Legge R et al (2010) Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. Proc Natl Acad Sci 107:18933–18938. https://doi.org/10.1073/pnas.1007028107
Andreescu C, Avendano S, Brown SR et al (2007) Linkage disequilibrium in related breeding lines of chickens. Genetics 177:2161–2169. https://doi.org/10.1534/genetics.107.082206
Zhu XM, Shao XY, Pei YH et al (2018) Genetic diversity and genome-wide association study of major ear quantitative traits using high-density SNPs in maize. Front Plant Sci 9:1–16. https://doi.org/10.3389/fpls.2018.00966
Zhang H, Yin L, Wang M et al (2019) Factors affecting the accuracy of genomic selection for agricultural economic traits in maize, cattle, and pig populations. Front Genet 10:1–10. https://doi.org/10.3389/fgene.2019.00189
Pereira HD, Marcelo J, Viana S et al (2018) Relevance of genetic relationship in GWAS and genomic prediction Relevance of genetic relationship in GWAS and genomic prediction. J Appl Genet 59:1. https://doi.org/10.1007/s13353-017-0417-2
Stich B, Melchinger AE (2009) Comparison of mixed-model approaches for association mapping in rapeseed, potato, sugar beet, maize, and Arabidopsis. BMC Genomics 10:94. https://doi.org/10.1186/1471-2164-10-94. 1471-2164-10-94 [pii]
Benjamini Y, Yekutieli D (2005) Quantitative trait loci analysis using the false discovery rate. Genetics 171:783–790. https://doi.org/10.1534/genetics.104.036699
Schork AJ, Thompson WK, Pham P et al (2013) All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet 9:e1003449. https://doi.org/10.1371/journal.pgen.1003449
Yan G, Qiao R, Zhang F et al (2017) Imputation-based whole-genome sequence association study rediscovered the missing QTL for lumbar number in Sutai pigs. Sci Rep 47:615. https://doi.org/10.1038/s41598-017-00729-0
Zhang Z, Ober U, Erbe M et al (2014) Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS One 9:e0093017. https://doi.org/10.1371/journal.pone.0093017
Han Y, Zhao X, Cao G et al (2015) Genetic characteristics of soybean resistance to HG type 0 and HG type 1.2.3.5.7 of the cyst nematode analyzed by genome-wide association mapping. BMC Genomics 16:1–11. https://doi.org/10.1186/s12864-015-1800-1
Sukumaran S, Reynolds MP, Sansaloni CP (2018) Genome-wide association analyses identify QTL hotspots for yield and component traits in durum wheat grown under yield potential, drought, and heat stress environments. Front Plant Sci 9:81. https://doi.org/10.3389/fpls.2018.00081
Martinez SA, Godoy J, Huang M et al (2018) Genome-wide association mapping for tolerance to preharvest sprouting and low falling numbers in wheat. Front Plant Sci 9:1–16. https://doi.org/10.3389/fpls.2018.00141
Saatchi M, Schnabel RD, Taylor JF et al (2014) Large-effect pleiotropic or closely linked QTL segregate within and across ten US cattle breeds. BMC Genomics 15:442. https://doi.org/10.1186/1471-2164-15-442
Tan B, Ingvarsson PK (2019) Integrating genome-wide association mapping of additive and dominance genetic effects to improve genomic prediction accuracy in Eucalyptus. BioRxiv 2019:841049. https://doi.org/10.1101/841049
Wei X, Zhang J (2016) The genomic architecture of interactions between natural genetic polymorphisms and environments in yeast growth. Genetics 205:genetics.116.195487. https://doi.org/10.1534/genetics.116.195487
Vinkhuyzen AAE, Pedersen NL, Yang J et al (2012) Common SNPs explain some of the variation in the personality dimensions of neuroticism and extraversion. Transl Psychiatry 2:e125. https://doi.org/10.1038/tp.2012.49
Chen CY, Misztal I, Aguilar I et al (2011) Genome-wide marker-assisted selection combining all pedigree phenotypic information with genotypic data in one step: an example using broiler chickens. J Anim Sci 89:23–28. https://doi.org/10.2527/jas.2010-3071
Gusev A, Bhatia G, Zaitlen N et al (2013) Quantifying missing heritability at known GWAS loci. PLoS Genet 9:e1003993. https://doi.org/10.1371/journal.pgen.1003993
Eichler EE, Flint J, Gibson G et al (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11:446–450. https://doi.org/10.1038/nrg2809
Bradbury PJ, Zhang Z, Kroon DE et al (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633–2635. https://doi.org/10.1093/bioinformatics/btm308
Yang J, Lee SH, Goddard ME et al (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575
Tang Y, Liu X, Wang J et al (2016) GAPIT Version 2: an enhanced integrated tool for genomic association and prediction. Plant Genome 9(2):1. https://doi.org/10.3835/plantgenome2015.11.0120
Kaur S, Zhang X, Mohan A et al (2017) Genome-wide association study reveals novel genes associated with culm cellulose content in bread wheat (Triticum aestivum, L.). Front Plant Sci 8:1–7. https://doi.org/10.3389/fpls.2017.01913
Hickey JM (2013) Genome-wide association studies and genomic prediction, vol 1019. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-447-0
Hayes B (2013) Genome-wide association studies and genomic prediction, vol 1019. Humana Press, Totowa, NJ, pp 149–169. https://doi.org/10.1007/978-1-62703-447-0
Ziegler A, König IR, Thompson JR (2008) Biostatistical aspects of genome-wide association studies. Biom J 50:8. https://doi.org/10.1002/bimj.200710398
Almli LM, Duncan R, Feng H et al (2014) Correcting systematic inflation in genetic association tests that consider interaction effects application to a genome-wide association study of posttraumatic stress disorder. JAMA Psychiatry 71:1392–1399. https://doi.org/10.1001/jamapsychiatry.2014.1339
Yu J, Buckler ES (2006) Genetic association mapping and genome organization of maize. Curr Opin Biotechnol 17:155–160. https://doi.org/10.1016/j.copbio.2006.02.003
Gianola D, De Los CG, Hill WG et al (2009) Additive genetic variability and the Bayesian alphabet. Genetics 183:347–363. https://doi.org/10.1534/genetics.109.103952
Evangelou E, Ioannidis JPA (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389. https://doi.org/10.1038/nrg3472
González-Camacho JM, de Los CG, Pérez P et al (2012) Genome-enabled prediction of genetic values using radial basis function neural networks. Theor Appl Genet 125:759–771. https://doi.org/10.1007/s00122-012-1868-9
Huang M, Liu X, Zhou Y et al (2018) BLINK : a package for the next level of genome-wide association studies with both individuals and markers Meng Huang. Gigascience 8:1–12. https://doi.org/10.1093/gigascience/giy154
Liu X, Huang M, Fan B et al (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet 12:e1005767. https://doi.org/10.1371/journal.pgen.1005767
Segura V, Vilhjálmsson BJ, Platt A et al (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44:825–830. https://doi.org/10.1038/ng.2314
Kammerer S, Roth RB, Reneland R et al (2004) Large-scale association study identifies ICAM gene region as breast and prostate cancer susceptibility locus. Cancer Res 64:8906–8910. https://doi.org/10.1158/0008-5472.CAN-04-1788
Yin L, Zhang H, Tang Z et al (2020) rMVP: a memory-efficient, visualization-enhanced, and parallel-1 accelerated tool for genome-wide association study. BioRxiv
Turner S (2018) qqman: an R package for visualizing GWAS results using Q-Q and Manhattan plots. J Open Source Softw 3:371. https://doi.org/10.1101/005165
Barrett JC, Fry B, Maller J et al (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263. https://doi.org/10.1093/bioinformatics/bth457
Brown PJ, Upadyayula N, Mahone GS et al (2011) Distinct genetic architectures for male and female inflorescence traits of maize. PLoS Genet 7:e1002383. https://doi.org/10.1371/journal.pgen.1002383
Ma J, Iannuccelli N, Duan Y et al (2010) Recombinational landscape of porcine X chromosome and individual variation in female meiotic recombination associated with haplotypes of Chinese pigs. BMC Genomics 11:159. https://doi.org/10.1186/1471-2164-11-159
Kover PX, Valdar W, Trakalo J et al (2009) A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet 5:e1000551. https://doi.org/10.1371/journal.pgen.1000551
Buckler ES, Holland JB, Bradbury PJ et al (2009) The genetic architecture of maize flowering time. Science 325:714–718. https://doi.org/10.1126/science.1174276
Tian F, Bradbury PJ, Brown PJ et al (2011) Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat Genet 43:159–162. https://doi.org/10.1038/ng.746
Lipka AE, Tian F, Wang Q et al (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399. https://doi.org/10.1093/bioinformatics/bts444. bts444 [pii]
Piepho HP (2001) A quick method for computing approximate thresholds for quantitative trait loci detection. Genetics 157:425–432
Connolly S, Heron EA (2014) Review of statistical methodologies for the detection of parent-of-origin effects in family trio genome-wide association data with binary disease traits. Brief Bioinform 16:429–448. https://doi.org/10.1093/bib/bbu017
Churchill GA, Doerge RW (2008) Naive application of permutation testing leads to inflated type I error rates. Genetics 178:609–610. https://doi.org/10.1534/genetics.107.074609
de Bakker PIW, Yelensky R, Péer I et al (2005) Efficiency and power in genetic association studies. Nat Genet 37:1217–1223. https://doi.org/10.1038/ng1669
Ganjgahi H, Winkler AM, Glahn DC et al (2018) Fast and powerful genome wide association of dense genetic data with high dimensional imaging phenotypes. Nat Commun 9:3254. https://doi.org/10.1038/s41467-018-05444-6
Chen CW, Yang HC (2019) OPATs: omnibus P-value association tests. Brief Bioinform 20:1–14. https://doi.org/10.1093/bib/bbx068
Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilità. Pubbl Del R Ist Super Di Sci Econ e Commer Di Firenze
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56:52. https://doi.org/10.2307/2282330
Ingvordsen CH, Backes G, Lyngkjær MF et al (2015) Genome-wide association study of production and stability traits in barley cultivated under future climate scenarios. Mol Breed 35:84. https://doi.org/10.1007/s11032-015-0283-8
Simes RJ (1986) A improved Bonferroni procedure for multiple tests of significance. Biometrika 73:751–754
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300. https://doi.org/10.2307/2346101
Zhao F, McParland S, Kearney F et al (2015) Detection of selection signatures in dairy and beef cattle using high-density genomic information. Genet Sel Evol 47:49. https://doi.org/10.1186/s12711-015-0127-3
Doerge RW, Churchill GA (1996) Permutation tests for multiple loci affecting a quantitative character. Genetics 142:285–294. https://doi.org/10.1111/j.1369-7625.2010.00632.x
Phipson B, Smyth GK (2010) Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Stat Appl Genet Mol Biol 9:39. https://doi.org/10.2202/1544-6115.1585
Wall JD, Pritchard JK (2003) Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 4:587–597. https://doi.org/10.1038/nrg1123
De La Vega FM, Isaac H, Collins A et al (2005) The linkage disequilibrium maps of three human chromosomes across four populations reflect their demographic history and a common underlying recombination pattern. Genome Res 15:454–462. https://doi.org/10.1101/gr.3241705
Lipka AE, Kandianis CB, Hudson ME et al (2015) From association to prediction: statistical methods for the dissection and selection of complex traits in plants. Curr Opin Plant Biol 24:110–118. https://doi.org/10.1016/j.pbi.2015.02.010
Jernigan KL, Godoy JV, Huang M et al (2018) Genetic dissection of end-use quality traits in adapted soft white winter wheat. Front Plant Sci 9:271. https://doi.org/10.3389/fpls.2018.00271
Hu G, Li Z, Lu Y et al (2017) Genome-wide association study identified multiple genetic loci on chilling resistance during germination in maize. Sci Rep 7:1–11. https://doi.org/10.1038/s41598-017-11318-6
Pleil JD (2016) QQ-plots for assessing distributions of biomarker measurements and generating defensible summary statistics. J Breath Res 10:035001. https://doi.org/10.1088/1752-7155/10/3/035001
Wilk MB, Gnanadesikan R (1968) Probability plotting methods for the analysis of data. Biometrika 55:1. https://doi.org/10.1093/biomet/55.1.1
Neyman J (1937) Outline of a theory of statistical estimation based on the classical theory of probability. Phil Trans R Soc London Ser A Math Phys Sci 236:333. https://doi.org/10.1098/rsta.1937.0005
Robinson GK (1975) Some counterexamples to the theory of confidence intervals. Biometrika 62:155. https://doi.org/10.2307/2334498
Holland D, Fan CC, Frei O et al (2017) Estimating inflation in GWAS summary statistics due to variance distortion from cryptic relatedness. BioRxiv. https://doi.org/10.1101/164939
Lee S, Abecasis GR, Boehnke M et al (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95:5. https://doi.org/10.1016/j.ajhg.2014.06.009
Swarts K, Li H, Romero Navarro JA et al (2014) Novel methods to optimize genotypic imputation for low‐coverage, next‐generation sequence data in crop plants. Plant Genome 7:1–12. https://doi.org/10.3835/plantgenome2014.05.0023
Howie B, Fuchsberger C, Stephens M et al (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44:955–959. https://doi.org/10.1038/ng.2354
Ayres DL, Darling A, Zwickl DJ et al (2012) BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics. Syst Biol 61:170. https://doi.org/10.1093/sysbio/syr100
Zhang Z, Buckler ES, Casstevens TM et al (2009) Software engineering the mixed model for genome-wide association studies on large samples. Brief Bioinform 10:664–675. https://doi.org/10.1093/bib/bbp050
Raj A, Stephens M, Pritchard JK (2014) FastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics 197:573. https://doi.org/10.1534/genetics.114.164350
Duan F, Ogden D, Xu L et al (2013) Principal component analysis of canine hip dysplasia phenotypes and their statistical power for genome-wide association mapping. J Appl Stat 40:235–251. https://doi.org/10.1080/02664763.2012.740617
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155:945–959. https://doi.org/10.1111/j.1471-8286.2007.01758.x
Saatchi M, Miraei-Ashtiani SR, Nejati Javaremi A et al (2010) The impact of information quantity and strength of relationship between training set and validation set on accuracy of genomic estimated breeding values. African. J Biotechnol 9:438–442. https://doi.org/10.5897/AJB09.1024
Daetwyler HD, Pong-Wong R, Villanueva B et al (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031. https://doi.org/10.1534/genetics.110.116855. genetics.110.116855 [pii]
Cheshire J (2009) Lattice: multivariate data visualization with R. J Stat Softw Bk Rev 25(2). https://doi.org/10.1111/j.1467-985x.2009.00624_12.x
Carson S, Chris P, Toby H, et al (2016) plotly: create interactive web graphics via “plotly. js.” R Packag Version
Acknowledgments
This project was partially funded by the National Science Foundation of the United States (Award # DBI 1661348 and ISO 2029933), the United States Department of Agriculture - National Institute of Food and Agriculture (Hatch project 1014919, Award #s 2018-70005-28792, 2019-67013-29171, and 2020-67021-32460), the Washington Grain Commission, the United States (Endowment and Award #s 126593 and 134574), the Program of Chinese National Beef Cattle and Yak Industrial Technology System, China (Award # CARS-37), Fundamental Research Funds for the Central Universities, China (Southwest Minzu University, Award # 2020NQN26), and Sichuan Science and Technology Program, China (Award #s 2021YJ0269 and 2021YJ0266).
Author Contributions
Jiabo Wang: software, data curation, writing—original draft preparation, visualization, investigation.
Alexander E. Lipka: revision of the manuscript.
Jianming Yu: revision of the manuscript.
Zhiwu Zhang: conceptualization and revision of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Wang, J., Yu, J., Lipka, A.E., Zhang, Z. (2022). Interpretation of Manhattan Plots and Other Outputs of Genome-Wide Association Studies. In: Torkamaneh, D., Belzile, F. (eds) Genome-Wide Association Studies. Methods in Molecular Biology, vol 2481. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2237-7_5
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2237-7_5
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2236-0
Online ISBN: 978-1-0716-2237-7
eBook Packages: Springer Protocols