Molecular Breeding

, 36:87 | Cite as

Genome-wide identification of SNPs and copy number variation in common bean (Phaseolus vulgaris L.) using genotyping-by-sequencing (GBS)

  • Andrea Ariani
  • Jorge Carlos Berny Mier y Teran
  • Paul Gepts


Next-generation sequencing technologies have increased markedly the throughput of genetic studies, allowing the identification of several thousands of SNPs within a single experiment. Even though sequencing cost is rapidly decreasing, the price for whole-genome re-sequencing of a large number of individuals is still costly, especially in plants with a large and highly redundant genome. In recent years, several reduced representation library approaches have been developed for reducing the sequencing cost per individual. Among them, genotyping-by-sequencing (GBS) represents a simple, cost-effective, and highly multiplexed alternative for species with or without an available reference genome. However, this technology requires specific optimization for each species, especially for the restriction enzyme (RE) used. Here we report on the application of GBS in a test experiment with 18 genotypes of wild and domesticated Phaseolus vulgaris. After an in silico digestion with different RE of the P. vulgaris genome reference sequence, we selected CviAII as the most suitable RE for GBS in common bean based on the high frequency and even distribution of restriction sites. A total of 44,875 SNPs, 1940 deletions, and 1693 insertions were identified, with 50 % of the variants located in genic sequences and tagging 11,027 genes. SNP and InDel distributions were positively correlated with gene density across the genome. In addition, we were able to also identify putative copy number variations of genomic segments between different genotypes. In conclusion, GBS with the CviAII enzyme results in thousands of evenly spaced markers and provides a reliable, high-throughput, and cost-effective approach for genotyping both wild and domesticated common beans.


Common bean Copy number variation (CNV) Genome-wide SNPs calling Genotyping-by-sequencing (GBS) Next-generation sequencing 



This work used the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley, supported by NIH S10 Instrumentation Grants S10RR029668 and S10RR027303. This project was supported by Agriculture and Food Research Initiative (AFRI) Competitive Grant No. 2013-67013-21224 from the USDA National Institute of Food and Agriculture.

Supplementary material

11032_2016_512_MOESM1_ESM.pdf (32 kb)
Supplementary File S1 Bean genotypes analyzed in this study with the barcodes used for multiplexed sequencing (PDF 32 kb)
11032_2016_512_MOESM2_ESM.pdf (138 kb)
Supplementary File S2 Correlation between SNP distribution (Total SNPs) and density on a 1 Mb non-overlapping bin (SNPs/Mb) with chromosome length. Regression lines and Pearson regression coefficient (r) are shown (PDF 138 kb)
11032_2016_512_MOESM3_ESM.pdf (12.4 mb)
Supplementary File S3 Distribution of variants and genes with the relative density in 1 Mb non-overlapping bins in the 11 P. vulgaris chromosomes (PDF 12663 kb)
11032_2016_512_MOESM4_ESM.pdf (107 kb)
Supplementary File S4 Read coverage in 1 Mb non-overlapping bins across the 11 chromosomes for the G19833 reference genotype (PDF 107 kb)
11032_2016_512_MOESM5_ESM.pdf (75 kb)
Supplementary File S5 RRC in the analyzed genotypes (PDF 75 kb)
11032_2016_512_MOESM6_ESM.pdf (35 kb)
Supplementary File S6 Regions harboring putative CNVs in the different genotypes. The coordinates of the genomic bins in the different chromosomes are reported in BED format (PDF 35 kb)
11032_2016_512_MOESM7_ESM.pdf (5.6 mb)
Supplementary File S7 Significant GO terms (FDR < 0.05) enriched in the genes located in putative CNVs. Test Set is the set of the up-regulated genes, Reference Set is the background of the P. vulgaris GO terms mapping (PDF 5762 kb)
11032_2016_512_MOESM8_ESM.pdf (63 kb)
Supplementary File S8 Annotation, together with the best Arabidopsis hit, of the genes located in putative CNVs. When available the best Arabidopsis hit common name was used (PDF 62 kb)


  1. Ali OA, O’Rourke SM, Amish SJ, Meek MH, Luikart G, Jeffres C, Miller MR (2016) RAD capture (Rapture): flexible and efficient sequence-based genotyping. Genetics 202:389–400CrossRefPubMedGoogle Scholar
  2. Altmann A, Weber P, Bader D, Preuss M, Binder EB, Mϋller-Myhsok B (2012) A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum Genet 131:1451–1454CrossRefGoogle Scholar
  3. Altshuler D, Pollare VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Landes ES (2000) An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:513–516CrossRefPubMedGoogle Scholar
  4. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3:e3376CrossRefPubMedPubMedCentralGoogle Scholar
  5. Beebe S, Ramirez J, Jarvis A, Rao MI, Mosquera G, Bueno JM, Blair MW (2011) Genetic improvement of common beans and the challenges of climate change. In: Yadav SS, Redden RJ, Hatfield JL, Lotze-Campen H, Hall AE (eds) Crop adaption to climate change. Wiley-Blackwell, Oxford, pp 356–369CrossRefGoogle Scholar
  6. Beissinger TM, Hirsch CN, Sekhon RS, Foester JM, Johnson JM, Muttoni G, Vaillancourt B, Buell CR, Kaeppler SM, de Leon N (2013) Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics 193:1073–1081CrossRefPubMedPubMedCentralGoogle Scholar
  7. Bitocchi E, Bellucci E, Giardini A, Rau D, Rodriguez M, Biagetti E, Santilocchi R, Spagnoletti Zeuli P, Gioia T, Logozzo G, Attene G, Nanni L, Papa R (2013) Molecular analysis of the parallel domestication of the common bean (Phaseolus vulgaris) in Mesoamerica and the Andes. New Phytol 197:300–313CrossRefPubMedGoogle Scholar
  8. Blair MW, Diaz LM, Buendia HF, Duque MC (2009) Genetic diversity, seed size associations and population structure of a core collection of common beans (Phaseolus vulgaris L.). Theor Appl Genet 119:955–972CrossRefPubMedGoogle Scholar
  9. Broughton WJ, Hernandez G, Blair M, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.)—model food legumes. Plant Soil 252:55–128CrossRefGoogle Scholar
  10. Cabanski CR, Cavin K, Bizon C, Parker Wilkerson MD, Wilhelmsen JS, Perou CM, Marron JS, Hayes DN (2012) ReQON: a bioconductor package for recalibrating quality scores from next-generation sequencing data. BMC Bioinformatics 13:221CrossRefPubMedPubMedCentralGoogle Scholar
  11. Chacón SMI, Pickersgill B, Debouck DG, Arias JS (2007) Phylogeographic analysis of the chloroplast DNA variation in wild common bean (Phaseolus vulgaris L.) in the Americas. Plant Syst Evol 266:175–195CrossRefGoogle Scholar
  12. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczyinski B, de Hoon MJL (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25:1422–1423CrossRefPubMedPubMedCentralGoogle Scholar
  13. Conesa A, Götz S, García-Gómez JM et al (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676CrossRefPubMedGoogle Scholar
  14. Cook DE, Lee TG, Guo X et al (2012) Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338:1206–1209CrossRefPubMedGoogle Scholar
  15. Danecek P, Auton A, Abecasis G et al (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158CrossRefPubMedPubMedCentralGoogle Scholar
  16. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510CrossRefPubMedGoogle Scholar
  17. De Donato M, Peters SO, Mitchell SE, Hussain T, Imumorin IG (2013) Genotyping-by-sequencing (GBS): a novel, efficient and cost-effective genotyping method for cattle using next-generation sequencing. PLoS ONE 8:e62137CrossRefPubMedPubMedCentralGoogle Scholar
  18. DeBolt S (2010) Copy number variation shapes genome diversity in Arabidopsis over immediate family generational scales. Genome Biol Evol 2:441–453CrossRefPubMedPubMedCentralGoogle Scholar
  19. Descham S, Campbell MA (2010) Utilization of next-generation sequencing platforms in plant genomics and genetic variants discovery. Mol Breed 25:553–570CrossRefGoogle Scholar
  20. Elshire RJ, Glaubitz JC, Sun Q, Polanf JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379CrossRefPubMedPubMedCentralGoogle Scholar
  21. Freytag GF, Debouck DG (2002) Taxonomy, distribution, and ecology of the genus Phaseolus (LeguminosaePapilionoideae) in North America, Mexico and Central America. Botanical Research Institute of Texas, Fort WorthGoogle Scholar
  22. Gepts P (1998) Origin and evolution of common bean: past events and recent trends. HortScience 33:1124–1130Google Scholar
  23. Gepts P (2014) Beans: origins and development. In: Smith C (ed) Encyclopedia of global archaeology. Springer, Berlin, pp 822–827CrossRefGoogle Scholar
  24. Gepts P, Aragão F, de Barros E, Blair MW, Brondani R, Broughton W, Galasso I, Hernández G, Kami J, Lariguet P, McClean P, Melotto M, Miklas P, Pauls P, Pedrosa-Harand A, Porch T, Sánchez F, Sparvoli F, Yu K (2008) Genomics of Phaseolus beans, a major source of dietary protein and micronutrients in the tropics. In: Moore PH, Ming R (eds) Genomics of tropical crop plants. Springer, Berlin, pp 113–143CrossRefGoogle Scholar
  25. Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, Sun Q, Buckler ES (2014) TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS ONE 9:e90346CrossRefPubMedPubMedCentralGoogle Scholar
  26. Goodstein DM, Shu S, Howson R et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186CrossRefPubMedGoogle Scholar
  27. Gouy M, Guindon S, Gascuel O (2010) SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 27:221–224CrossRefPubMedGoogle Scholar
  28. Grativol C, Hemerly AS, Ferreira PCG (2012) Genetic and epigenetic regulation of stress responses in natural plant populations. Biochim Biophys Acta 1819:176–185CrossRefPubMedGoogle Scholar
  29. Greminger MP, Stölting KN, Nater A, Goossens B, Arora N, Bruggmann R, Patrignani A, Nussberger B, Sharma R, Kraus RH, Ambu LN, Singleton I, Chikhi L, van Schaik CP, Krützen M (2014) Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms. BMC Genomics 15:16CrossRefPubMedPubMedCentralGoogle Scholar
  30. Hart JP, Griffiths PD (2015) Genotyping-by-sequencing enabled mapping and marker development for the potyvirus resistance allele in common bean. Plant Genome. doi: 10.3835/plantgenome2014.09.0058 Google Scholar
  31. Henry IM, Zinkgraf MS, Groover AT, Comai L (2015) A system for dosage-based functional genomics in poplar. Plant Cell 27:2370–2383CrossRefPubMedPubMedCentralGoogle Scholar
  32. Iquira E, Humira S, François B (2015) Association mapping of QTLs for sclerotinia stem rot resistance in a collection of soybean plant introductions using a genotyping by sequencing (GBS) approach. BMC Plant Biol 15:5CrossRefPubMedPubMedCentralGoogle Scholar
  33. Jaganathan D, Thudi M, Kale S et al (2015) Genotyping-by-sequencing based intra-specific genetic map refines a QTL-hotspot region for drought tolerance in chickpea. Mol Genet Genomics 290:559–571CrossRefPubMedGoogle Scholar
  34. Kami J, Velásquez VB, Debouck DG, Gepts P (1995) Identification of presumed ancestral DNA sequences of phaseolin in Phaseolus vulgaris. Proc Natl Acad Sci 92:1101–1104CrossRefPubMedPubMedCentralGoogle Scholar
  35. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120CrossRefPubMedGoogle Scholar
  36. Kwak M, Gepts P (2009) Structure of genetic diversity in the two major gene pools of common bean (Phaseolus vulgaris L., Fabaceae). Theor Appl Genet 118:979–992CrossRefPubMedGoogle Scholar
  37. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760CrossRefPubMedPubMedCentralGoogle Scholar
  38. Li H, Vikram P, Singh RP et al (2015) A high density GBS map of bread wheat and its application for dissecting complex disease resistance traits. BMC Genomics 16:216CrossRefPubMedPubMedCentralGoogle Scholar
  39. Liu H, Bayer M, Druka A, Russel JR, Hackett CA, Poland J, Ramsay L, Hedley PE, Waugh R (2014) An evaluation of genotyping by sequencing (GBS) to map the Breviarisatum-e (ari-e) locus in cultivated barley. BMC Genomics 15:104CrossRefPubMedPubMedCentralGoogle Scholar
  40. McHale LK, Haun WJ, Xu WW et al (2012) Structural variants in the soybean genome localize to clusters of biotic stress-response genes. Plant Physiol 159:1295–1308CrossRefPubMedPubMedCentralGoogle Scholar
  41. Miklas PN, Kelly JD, Beede SE, Blair MW (2006) Common bean breeding for resistance against biotic and abiotic stresses: from classical to MAS breeding. Euphytica 145:105–131CrossRefGoogle Scholar
  42. Monson-Miller J, Sanchez-Mendez D, Fass J, Henry IM, Tai TH, Comai L (2012) Reference genome-independent assessment of mutation density using restriction enzyme-phased sequencing. BMS Genomics 13:72CrossRefGoogle Scholar
  43. Pallotta MA, Warner P, Fox RL, Kuchel H, Jefferies SJ, Langridge P (2003) Marker assisted wheat breeding in the southern region of Australia. In: Proceedings of the 10th international wheat genetics symposium, Paestum, Italy, pp 1–6Google Scholar
  44. Schmutz J, McClean PE, Mamidi S, We GA, Cannon SB et al (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46:707–713CrossRefPubMedGoogle Scholar
  45. Schnable PS, Ware D, Fulton RS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115CrossRefPubMedGoogle Scholar
  46. Schröder S, Mamidi S, Lee R et al (2016) Optimization of genotyping by sequencing (GBS) data in common bean (Phaseolus vulgaris L.). Mol Breed 36:1–9CrossRefGoogle Scholar
  47. Singh SP, Gepts P, Debouck DG (1991) Races of common bean (Phaseolus vulgaris L., Fabaceae). Econ Bot 45:379–396CrossRefGoogle Scholar
  48. Stapley J, Reger J, Feulner PG, Smadja C, Galindo J, Ekblom R, Bennison C, Ball AD, Beckerman AP, Slate J (2010) Adaptation genomics: the next generation. Trends Ecol Evol 25:705–712CrossRefPubMedGoogle Scholar
  49. Talukder ZI, Anderson E, Miklas PN, Blair MW, Osorno J, Dilawari M, Hossain KG (2010) Genetic diversity and selection of genotypes to enhance Zn and Fe content in common bean. Can J Plant Sci 90:49–60CrossRefGoogle Scholar
  50. Thudi M, Li Y, Jackson SA, May GD, Varshney RK (2012) Current state-of-art of sequencing technologies for plant genomics research. Brief Funct Genomics 11:3–11CrossRefPubMedGoogle Scholar
  51. Varshney RK, Terauchi R, McCouch SR (2014) Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol 12:e1001883CrossRefPubMedPubMedCentralGoogle Scholar
  52. Żmieńko A, Samelak A, Kozłowski P, Figlerowicz M (2014) Copy number polymorphism in plant genomes. Theor Appl Genet 127:1–18CrossRefPubMedGoogle Scholar
  53. Zou X, Shi S, Austin RS, Merico D, Munholland S, Marsolaris F, Navabi A, Crosby WL, Pauls KP, Yu K, Cui Y (2014) Genome-wide single nucleotide polymorphism and insertion–deletion discovery through next-generation sequencing of reduced representation libraries in common bean. Mol Breed 33:769–778CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Andrea Ariani
    • 1
  • Jorge Carlos Berny Mier y Teran
    • 1
  • Paul Gepts
    • 1
  1. 1.Department of Plant Sciences/MS1University of CaliforniaDavisUSA

Personalised recommendations