Advertisement

Theoretical and Applied Genetics

, Volume 130, Issue 11, pp 2327–2343 | Cite as

An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding

  • Jianbo He
  • Shan Meng
  • Tuanjie Zhao
  • Guangnan Xing
  • Shouping Yang
  • Yan Li
  • Rongzhan Guan
  • Jiangjie Lu
  • Yufeng Wang
  • Qiuju Xia
  • Bing Yang
  • Junyi Gai
Original Article

Abstract

Key message

The innovative RTM-GWAS procedure provides a relatively thorough detection of QTL and their multiple alleles for germplasm population characterization, gene network identification, and genomic selection strategy innovation in plant breeding.

Abstract

The previous genome-wide association studies (GWAS) have been concentrated on finding a handful of major quantitative trait loci (QTL), but plant breeders are interested in revealing the whole-genome QTL-allele constitution in breeding materials/germplasm (in which tremendous historical allelic variation has been accumulated) for genome-wide improvement. To match this requirement, two innovations were suggested for GWAS: first grouping tightly linked sequential SNPs into linkage disequilibrium blocks (SNPLDBs) to form markers with multi-allelic haplotypes, and second utilizing two-stage association analysis for QTL identification, where the markers were preselected by single-locus model followed by multi-locus multi-allele model stepwise regression. Our proposed GWAS procedure is characterized as a novel restricted two-stage multi-locus multi-allele GWAS (RTM-GWAS, https://github.com/njau-sri/rtm-gwas). The Chinese soybean germplasm population (CSGP) composed of 1024 accessions with 36,952 SNPLDBs (generated from 145,558 SNPs, with reduced linkage disequilibrium decay distance) was used to demonstrate the power and efficiency of RTM-GWAS. Using the CSGP marker information, simulation studies demonstrated that RTM-GWAS achieved the highest QTL detection power and efficiency compared with the previous procedures, especially under large sample size and high trait heritability conditions. A relatively thorough detection of QTL with their multiple alleles was achieved by RTM-GWAS compared with the linear mixed model method on 100-seed weight in CSGP. A QTL-allele matrix (402 alleles of 139 QTL × 1024 accessions) was established as a compact form of the population genetic constitution. The 100-seed weight QTL-allele matrix was used for genetic characterization, candidate gene prediction, and genomic selection for optimal crosses in the germplasm population.

Notes

Acknowledgements

This work was supported by the China National Key R & D Program for Crop Breeding (2016YFD0100304), the China National Key Basic Research Program (2011CB1093), the China National Hightech R&D Program (2012AA101106), the Natural Science Foundation of China (31571695), the MOE 111 Project (B08025), Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT13073), the MOA Public Profit Program (201203026-4), the MOA CARS-04 program, the Jiangsu Higher Education PAPD Program, and the Jiangsu JCIC-MCP Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Compliance with ethical standards

Conflict of interest

The authors declare no conflict of interest.

Supplementary material

122_2017_2962_MOESM1_ESM.xlsx (52 kb)
Supplementary material 1 (XLSX 52 kb)
122_2017_2962_MOESM2_ESM.docx (1.6 mb)
Supplementary material 2 (DOCX 1657 kb)

References

  1. Andolfatto P, Davison D, Erezyilmaz D, Hu TT, Mast J, Sunayama-Morita T, Stern DL (2011) Multiplexed shotgun genotyping for rapid and efficient genetic mapping. Genome Res 21:610–617CrossRefPubMedPubMedCentralGoogle Scholar
  2. Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R, Muliyati NW, Zhang X, Amer MA, Baxter I, Brachi B, Chory J, Dean C, Debieu M, de Meaux J, Ecker JR, Faure N, Kniskern JM, Jones JD, Michael T, Nemri A, Roux F, Salt DE, Tang C, Todesco M, Traw MB, Weigel D, Marjoram P, Borevitz JO, Bergelson J, Nordborg M (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631CrossRefPubMedPubMedCentralGoogle Scholar
  3. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376CrossRefPubMedPubMedCentralGoogle Scholar
  4. Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN (2005) Demonstrating stratification in a European American population. Nat Genet 37:868–872CrossRefPubMedGoogle Scholar
  5. De Coninck A, De Baets B, Kourounis D, Verbosio F, Schenk O, Maenhout S, Fostier J (2016) Needles: toward large-scale genomic prediction with marker-by-environment interaction. Genetics 203:543–555CrossRefPubMedPubMedCentralGoogle Scholar
  6. Desta ZA, Ortiz R (2014) Genomic selection: genome-wide prediction in plant improvement. Trends Plant Sci 19:592–601CrossRefPubMedGoogle Scholar
  7. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004CrossRefPubMedGoogle Scholar
  8. Dhanapal AP, Ray JD, Singh SK, Hoyos-Villegas V, Smith JR, Purcell LC, Andy King C, Cregan PB, Song Q, Fritschi FB (2015) Genome-wide association study (GWAS) of carbon isotope ratio (δ13C) in diverse soybean [Glycine max (L.) Merr.] genotypes. Theor Appl Genet 128:73–91CrossRefPubMedGoogle Scholar
  9. Ding K, Zhou K, Zhang J, Knight J, Zhang X, Shen Y (2005) The effect of haplotype-block definitions on inference of haplotype-block structure and htSNPs selection. Mol Biol Evol 22:148–159CrossRefPubMedGoogle Scholar
  10. Excoffier L, Lischer HE (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567CrossRefPubMedGoogle Scholar
  11. Farnir F, Coppieters W, Arranz JJ, Berzi P, Cambisano N, Grisart B, Karim L, Marcq F, Moreau L, Mni M, Nezer C, Simon P, Vanmanshoven P, Wagenaar D, Georges M (2000) Extensive genome-wide linkage disequilibrium in cattle. Genome Res 10:220–227CrossRefPubMedGoogle Scholar
  12. Felsenstein J (1989) PHYLIP—phylogeny inference package (version 3.2). Cladistics 5:164–166Google Scholar
  13. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229CrossRefPubMedGoogle Scholar
  14. Hanson CH, Robinson HF, Comstock RE (1956) Biometrical studies of yield in segregating populations of Korean Lespedeza. Agron J 48:268CrossRefGoogle Scholar
  15. Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49:1–12CrossRefGoogle Scholar
  16. Huang X, Han B (2014) Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol 65:531–551CrossRefPubMedGoogle Scholar
  17. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Li C, Zhu C, Lu T, Zhang Z, Li M, Fan D, Guo Y, Wang A, Wang L, Deng L, Li W, Lu Y, Weng Q, Liu K, Huang T, Zhou T, Jing Y, Li W, Lin Z, Buckler ES, Qian Q, Zhang QF, Li J, Han B (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42:961–967CrossRefPubMedGoogle Scholar
  18. Jia G, Huang X, Zhi H, Zhao Y, Zhao Q, Li W, Chai Y, Yang L, Liu K, Lu H, Zhu C, Lu Y, Zhou C, Fan D, Weng Q, Guo Y, Huang T, Zhang L, Lu T, Feng Q, Hao H, Liu H, Lu P, Zhang N, Li Y, Guo E, Wang S, Wang S, Liu J, Zhang W, Chen G, Zhang B, Li W, Wang Y, Li H, Zhao B, Li J, Diao X, Han B (2013) A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat Genet 45:957–961CrossRefPubMedGoogle Scholar
  19. Jiang Y, Reif JC (2015) Modeling epistasis in genomic selection. Genetics 201:759–768CrossRefPubMedPubMedCentralGoogle Scholar
  20. Jonas E, de Koning DJ (2013) Does genomic selection have a future in plant breeding? Trends Biotechnol 31:497–504CrossRefPubMedGoogle Scholar
  21. Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, Eskin E (2008) Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723CrossRefPubMedPubMedCentralGoogle Scholar
  22. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354CrossRefPubMedPubMedCentralGoogle Scholar
  23. Karkkainen HP, Sillanpaa MJ (2012) Back to basics for Bayesian model building in genomic selection. Genetics 191:969–987CrossRefPubMedPubMedCentralGoogle Scholar
  24. Li Z, Sillanpaa MJ (2012) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125:419–435CrossRefPubMedGoogle Scholar
  25. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25:1966–1967CrossRefPubMedGoogle Scholar
  26. Li H, Peng Z, Yang X, Wang W, Fu J, Wang J, Han Y, Chai Y, Guo T, Yang N, Liu J, Warburton ML, Cheng Y, Hao X, Zhang P, Zhao J, Liu Y, Wang G, Li J, Yan J (2013) Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet 45:43–50CrossRefPubMedGoogle Scholar
  27. Li S, Cao Y, He J, Zhao T, Gai J (2017) Detecting the QTL-allele system conferring flowering date in a nested association mapping population of soybean using a novel procedure. Theor Appl Genet. doi: 10.1007/s00122-017-2960-y Google Scholar
  28. Meng S, He J, Zhao T, Xing G, Li Y, Yang S, Lu J, Wang Y, Gai J (2016) Detecting the QTL-allele system of seed isoflavone content in Chinese soybean landrace population for optimal cross design and gene system exploration. Theor Appl Genet 129:1557–1576CrossRefPubMedGoogle Scholar
  29. Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829PubMedPubMedCentralGoogle Scholar
  30. Mohammadi M, Tiede T, Smith KP (2015) PopVar: A genome-wide procedure for predicting genetic variance and correlated response in biparental breeding populations. Crop Sci 55:2068CrossRefGoogle Scholar
  31. Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE, Harriman J, Glaubitz JC, Buckler ES, Kresovich S (2013) Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc Natl Acad Sci USA 110:453–458CrossRefPubMedGoogle Scholar
  32. Murray MG, Thompson WF (1980) Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res 8:4321–4325CrossRefPubMedPubMedCentralGoogle Scholar
  33. Nordborg M, Weigel D (2008) Next-generation genetics in plants. Nature 456:720–723CrossRefPubMedGoogle Scholar
  34. Pattaro C, Ruczinski I, Fallin DM, Parmigiani AG (2008) Haplotype block partitioning as a tool for dimensionality reduction in SNP association studies. BMC Genomics 9:405CrossRefPubMedPubMedCentralGoogle Scholar
  35. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2:e190CrossRefPubMedPubMedCentralGoogle Scholar
  36. Peleman JD, van der Voort JR (2003) Breeding by design. Trends Plant Sci 8:330–334CrossRefPubMedGoogle Scholar
  37. Peng B, Kimmel M (2005) simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21:3686–3687CrossRefPubMedGoogle Scholar
  38. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefPubMedGoogle Scholar
  39. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P (2000) Association mapping in structured populations. Am J Hum Genet 67:170–181CrossRefPubMedPubMedCentralGoogle Scholar
  40. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575CrossRefPubMedPubMedCentralGoogle Scholar
  41. Rakitsch B, Lippert C, Stegle O, Borgwardt K (2013) A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics 29:206–214CrossRefPubMedGoogle Scholar
  42. Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644CrossRefPubMedPubMedCentralGoogle Scholar
  43. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183CrossRefPubMedGoogle Scholar
  44. Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U, Long Q, Nordborg M (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44:825–830CrossRefPubMedPubMedCentralGoogle Scholar
  45. VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423CrossRefPubMedGoogle Scholar
  46. Vazquez AI, Veturi Y, Behring M, Shrestha S, Kirst M, Resende MF Jr, de Los Campos G (2016) Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multiomic profiles. Genetics 203:1425–1438CrossRefPubMedPubMedCentralGoogle Scholar
  47. Voight BF, Pritchard JK (2005) Confounding from cryptic relatedness in case–control association studies. PLoS Genet 1:e32CrossRefPubMedPubMedCentralGoogle Scholar
  48. Wang N, Akey JM, Zhang K, Chakraborty R, Jin L (2002) Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am J Hum Genet 71:1227–1234CrossRefPubMedPubMedCentralGoogle Scholar
  49. Wang S-B, Feng J-Y, Ren W-L, Huang B, Zhou L, Wen Y-J, Zhang J, Dunwell JM, Xu S, Zhang Y-M (2016) Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep 6:19444CrossRefPubMedPubMedCentralGoogle Scholar
  50. Weir BS (2008) Linkage disequilibrium and association mapping. Annu Rev Genom Hum Genet 9:129–142CrossRefGoogle Scholar
  51. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, McMullen MD, Gaut BS, Nielsen DM, Holland JB, Kresovich S, Buckler ES (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208CrossRefPubMedGoogle Scholar
  52. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67CrossRefGoogle Scholar
  53. Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136:1457–1468PubMedPubMedCentralGoogle Scholar
  54. Zhang K, Deng M, Chen T, Waterman MS, Sun F (2002) A dynamic programming algorithm for haplotype block partitioning. Proc Natl Acad Sci USA 99:7335–7339CrossRefPubMedPubMedCentralGoogle Scholar
  55. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet 42:355–360CrossRefPubMedPubMedCentralGoogle Scholar
  56. Zhang Y, He J, Wang Y, Xing G, Zhao J, Li Y, Yang S, Palmer RG, Zhao T, Gai J (2015a) Establishment of a 100-seed weight quantitative trait locus-allele matrix of the germplasm population for optimal recombination design in soybean breeding programmes. J Exp Bot 66:6311–6325CrossRefPubMedGoogle Scholar
  57. Zhang Y, Liu M, He J, Wang Y, Xing G, Li Y, Yang S, Zhao T, Gai J (2015b) Marker-assisted breeding for transgressive seed protein content in soybean [Glycine max (L.) Merr]. Theor Appl Genet 128:1061–1072CrossRefPubMedGoogle Scholar
  58. Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, Norton GJ, Islam MR, Reynolds A, Mezey J, McClung AM, Bustamante CD, McCouch SR (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467CrossRefPubMedPubMedCentralGoogle Scholar
  59. Zhou Z, Jiang Y, Wang Z, Gou Z, Lyu J, Li W, Yu Y, Shu L, Zhao Y, Ma Y, Fang C, Shen Y, Liu T, Li C, Li Q, Wu M, Wang M, Wu Y, Dong Y, Wan W, Wang X, Ding Z, Gao Y, Xiang H, Zhu B, Lee SH, Wang W, Tian Z (2015) Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat Biotechnol 33:408–414CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Jianbo He
    • 1
  • Shan Meng
    • 1
  • Tuanjie Zhao
    • 2
    • 4
  • Guangnan Xing
    • 2
    • 4
  • Shouping Yang
    • 3
    • 4
  • Yan Li
    • 3
    • 4
  • Rongzhan Guan
    • 4
    • 5
  • Jiangjie Lu
    • 1
  • Yufeng Wang
    • 1
  • Qiuju Xia
    • 6
  • Bing Yang
    • 6
  • Junyi Gai
    • 1
    • 2
    • 3
    • 4
    • 5
  1. 1.Soybean Research InstituteNanjing Agricultural UniversityNanjingChina
  2. 2.National Center for Soybean ImprovementMinistry of AgricultureNanjingChina
  3. 3.Key Laboratory of Biology and Genetic Improvement of Soybean (General)Ministry of AgricultureNanjingChina
  4. 4.State Key Laboratory for Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
  5. 5.Jiangsu Collaborative Innovation Center for Modern Crop ProductionNanjing Agricultural UniversityNanjingChina
  6. 6.State Key Laboratory of Agricultural GenomicsBGI-ShenzhenShenzhenChina

Personalised recommendations