Molecular Breeding

, Volume 33, Issue 4, pp 769–778 | Cite as

Genome-wide single nucleotide polymorphism and Insertion-Deletion discovery through next-generation sequencing of reduced representation libraries in common bean

  • Xiaolu Zou
  • Chun Shi
  • Ryan S. Austin
  • Daniele Merico
  • Seth Munholland
  • Frédéric Marsolais
  • Alireza Navabi
  • William L. Crosby
  • K. Peter Pauls
  • Kangfu Yu
  • Yuhai Cui
Article

Abstract

Single nucleotide polymorphisms (SNPs) and insertions-deletions (InDels) are valuable molecular markers for genomics and genetics studies and molecular breeding. The advent of next-generation sequencing techniques has enabled researchers to approach high-throughput and cost-effective SNP and InDel discovery on a genomic scale. In this report, 36 common bean genotypes grown in Canada were used to construct reduced representation libraries for next-generation sequencing. Using 76 million sequence reads generated by the Illumina HiSeq 2000 Sequencing System, we identified a total of 43,698 putative SNPs and 1,267 putative InDels. Of the SNPs, 43,504 were bi-allelic and 194 were tri-allelic, and the InDels comprised 574 insertions and 693 deletions. The putative bi-allelic SNPs were distributed across all 11 chromosomes with the highest number of SNPs observed in chromosome 2 (4,788), and the lowest in chromosome 10 (2,941). With the aid of the recent release of the first chromosome-scale version of Phaseolus vulgaris, 24,907 bi-allelic SNPs, 79 tri-allelic SNPs, 315 insertions, and 377 deletions were located in 8,758, 77, 273, and 364 genes, respectively. Among these 24,907 bi-allelic SNPs, 7,168 nonsynonymous bi-allelic SNPs were identified within 36 common bean genotypes that were located in 4,303 genes. A total of 113 putative SNPs were randomly chosen for validation using high-resolution melt analysis. Of the 113 candidate SNPs, 105 (92.9 %) contained the predicted SNPs.

Keywords

Single nucleotide polymorphism Insertion Deletion Next-generation sequencing Phaseolus vulgaris L. 

Introduction

Common bean (Phaseolus vulgaris L.) is one of the most important legume crops for human consumption worldwide. It is a diploid annual legume (2n = 2x = 22) with a relatively small genome size (580 Mb) (Bennett and Leitch 1995). Common bean has relatively low levels of duplication and repetitive regions in the genome compared to other plants, and molecular and genetic mapping experiments revealed that most loci are single copy (Freyre et al. 1998; McClean et al. 2002; Vallejos et al. 1992). Moreover, common bean gene families tend to be small, and the traditionally large families such as plant resistance genes and protein kinases are of moderate size (Gepts et al. 2008; Rivkin et al. 1999; Vallad et al. 2001). Common bean originated in the Americas, and then diverged into two major gene pools, the Mesoamerican and Andean (Gepts 1998; Gepts and Debouck 1991). The two gene pools underwent domestications independently, and each evolved into three races: Durango, Jalisco, and Mesoamerica in the Mesoamerican gene pool, and Chile, Nueva Granada, and Peru in the Andean gene pool. The former gene pool is represented by the small- and medium-seeded white, pinto, pink, black, and some snap beans, and the latter is represented by the large-seeded kidney, cranberry, and many snap beans (Kwak and Gepts 2009; Mamidi et al. 2011; McClean et al. 2004; Singh et al. 1991).

Single nucleotide polymorphisms (SNPs) and insertions-deletions (InDels) represent the most abundant DNA polymorphisms present in eukaryotic genomes (Galeano et al. 2009; Hillier et al. 2008; Hyten et al. 2010a, b; Lai et al. 2010; Salathia et al. 2007; Subbaiyan et al. 2012; Wang et al. 1998). SNP and InDel markers have become powerful tools for many research applications such as genetic mapping, association studies, diversity analysis, marker-assisted selection, and map-based cloning of genes (Blair et al. 2013; Galeano et al. 2012; Hayashi et al. 2006; Mammadov et al. 2012; Salathia et al. 2007; Shen et al. 2004; Shi et al. 2011; Varshney et al. 2009). Meanwhile, next-generation sequencing technologies provide rapid and cost-effective approaches for the discovery of DNA polymorphisms on a genomic scale (Davey et al. 2011; Depristo et al. 2011; Kumar et al. 2012; Shendure and Ji 2008; Varshney et al. 2009). With the ever-increasing throughput of next-generation sequencing and the development and improvement of bioinformatics tools, discovery of DNA sequence variations is readily accomplished by comparing the whole-genome sequences of individuals with reference genome sequences (Arai-Kichise et al. 2011; Hyten et al. 2010a; Ossowski et al. 2008). When a reference genome sequence is not available, a common practice for SNP and InDel discovery is to assemble sequence reads into contigs and then align all reads against them to call variants (Fu and Peterson 2012; Gaur et al. 2012; Hyten et al. 2010b; Lai et al. 2012; Oliver et al. 2011; Ratan et al. 2010; You et al. 2011). Plant genomes are usually large and complex with a great abundance of repetitive sequences. These features pose challenges for SNP and InDel discovery (Adams and Wendel 2005; Bennetzen et al. 2005; Davey et al. 2011; Deschamps and Campbell 2010). Several genome complexity reduction techniques have been applied for high-throughput genetic marker discovery, including reduced-representation libraries (RRLs) (Altshuler et al. 2000), complexity reduction of polymorphic sequences (CRoPS) (van Orsouw et al. 2007), restriction-site-associated DNA sequencing (RAD-Seq) (Baird et al. 2008), and genotyping by sequencing (GBS) (Davey et al. 2011; Elshire et al. 2011). RRLs were first used in the human genome, and this approach has been adapted for genome-wide SNP and InDel discovery in many animal and plant species (Altshuler et al. 2000; Davey et al. 2011; Depristo et al. 2011; Van Tassell et al. 2008). In general, sequencing is performed on RRLs which reduce the complexity of a pooled DNA sample from a population of interest using a restriction enzyme digestion followed by size selection; sequencing reads are then aligned to at least a draft genome sequence for identifying DNA variants.

To date, considerable effort has been made toward DNA polymorphism discovery in common bean. Several thousand SNPs and hundreds of InDels have been discovered through expressed sequence tag data mining or partial re-sequencing of certain genotypes (Gaitan-Solis et al. 2008; Galeano et al. 2009; McConnell et al. 2010; Ramírez et al. 2005; Souza et al. 2012). In addition, Hyten et al. (2010b) discovered 3,487 SNPs from Mesoamerican genotype BAT93 and Andean genotype JaloEEP558 through next-generation sequencing of a multi-tier RRL when the genome sequence was still unavailable.

The first chromosome-scale version of P. vulgaris sequence assembly has also been recently released to the public, and provides an invaluable resource for common bean genome mapping and marker development (Phaseolus vulgaris v1.0, DOE-JGI and USDA-NIFA, http://www.phytozome.net/commonbean). The objective of this study was to discover SNPs and InDels in common bean through next-generation sequencing of RRLs using the latest release of the common bean sequence assembly.

Materials and methods

Library construction and sequencing

Thirty-six common bean genotypes grown in Canada were used for SNP and InDel discovery (Supplementary Table S1). Six pools of DNA containing 36 common bean DNA samples were prepared using DNeasy Plant Mini kit (Qiagen Inc., Toronto, Canada). A 5.5-μg sample of DNA from each pool was digested with HaeIII (New England Biolabs Ltd, Whitby, CA, USA) as suggested by the manufacturer. DNA fragments between 250 and 350 bp were selected, ligated with different adapters, and amplified using TruSeqTM DNA Sample Preparation Kit (Illumina, Inc., San Diego, CA, USA) following the manufacturer’s instructions. Libraries were then sent to The Centre for Applied Genomics, The Hospital for Sick Children, Toronto for further preparation and Illumina sequencing. First, libraries were size-selected to 400 and 600 bp using E-Gel® SizeSelect™ 2 % Agarose system (Life Technologies Inc., Burlington, ON, Canada) and purified using Qiagen MiniElute PCR Purification Kit (Qiagen) following the recommended protocol. Libraries were then sequenced on a HiSeq 2000 instrument following Illumina’s recommended guidelines.

Discovery of SNPs and InDels

Reads passing Illumina’s filter that were at least 25 bp long were retained. Quality filtered reads were aligned to P. vulgaris reference genome (Phaseolus vulgaris v1.0, DOE-JGI and USDA-NIFA, http://www.phytozome.net/commonbean) using BWA 0.6.2 (Li and Durbin 2009). Subsequently, base quality recalibration and local re-alignment were performed using GATK (GenomeAnalysisTK-1.6-6-g4bc04e2) (McKenna et al. 2010). After refining the alignment, variant calling (SNV or InDel) was performed using FreeBayes 0.9.5 (arXiv:1207.3907 [q-bio.GN]). FreeBayes is capable of calling tri-allelic sites. Variants with Freebayes QUAL scores less than 10 and a read depth less than 3 were assumed to be false positives and filtered out. Finally, variants were annotated using Annovar (Wang et al. 2010). The raw sequencing reads have been deposited in the NCBI Sequence Read Archive [Genbank: SPR022760]. The SNPs and InDels data are available at http://bioinfo.uwindsor.ca/cgi-bin/gb2/gbrowse/US_PVulgaris.

SNP validation

A total of 113 putative Class I (C/T and G/A) and Class II (C/A and G/T) SNPs were randomly chosen for validation using high-resolution melt (HRM) analysis (Herrmann et al. 2006). HRM primers were designed using Beacon Designer 7.91 (Supplementary Table S2). HRM PCR was conducted on a Bio-Rad CFX96 qRT-PCR machine and the PCR conditions were the same as described by Wang et al. (2012). HRM data were analyzed by Precision Melt Analysis software according to its user manual (Bio-Rad).

Results and discussion

DNA deep sequencing and alignment

Thirty-six common bean genotypes grown in Canada were used to construct reduced representation libraries for next-generation sequencing (Supplementary Table S1). They all belonged to race Mesoamerica of the Middle American Gene Pool. Two main clusters could be distinguished based on a cluster analysis (Supplementary Fig. S1). These 36 genotypes were randomly divided into six pools. RRLs were constructed from the six pools using the restriction enzyme HaeIII, which recognizes the sequence ‘GGCC’ and generates blunt-ended fragments starting with CC. DNA fragments between 250 and 350 bp long were selected and sequenced by HiSeq 2000. Short sequences from the six pools were aligned to the Phaseolus vulgaris v1.0 genome assembly, which is approximately 521.1 Mb in size. The in silico digestion of Phaseolus vulgaris v1.0 with HaeIII and selection for 250–350 bp fragments showed an expected sequence coverage of 3 % of the reference genome, that is, 15,829,160 bp uniquely aligned to the common bean genome.

A total of 76,175,885 sequence reads from six pools were obtained from Illumina sequencing, yielding a total of 7.5 GB of raw sequence data (Table 1). After appropriate preprocessing, quality filtered reads were aligned to the P. vulgaris reference genome. A total of 13,759,003 high-quality (HQ) reads were mapped to the common bean genome, corresponding to 1.2 GB of sequence. However, these aligned reads may have high error rates due to sequencing, base-calling, and alignment errors (Depristo et al. 2011; Nielsen et al. 2011). To further reduce the false-positive calls, base quality score recalibration and local re-alignment were carried out using GATK (McKenna et al. 2010). A minimum mapping quality score for HQ aligned reads was set to Q20, which translates to a maximum 1 % chance that the alignment is wrong. A total of 6,943,305 HQ aligned reads remained after GATK refinement. These HQ aligned reads covered 646,470,773 nucleotides and had an average length of 99 bases.
Table 1

Summary of sequence read alignment

Adaptor

Total reads

Reads aligned

Reads aligned (%)

Reads aligned bases

HQ aligned readsa

HQ aligned bases

Mean of read length

AD002

11,950,780

3,207,921

26.8428

289,198,694

1,637,383

151,567,251

98

AD004

8,033,960

741,004

9.2234

61,892,574

353,160

30,677,334

99

AD005

8,818,234

2,106,158

23.8841

196,211,262

998,080

94,349,333

99

AD006

15,840,014

3,445,448

21.7515

313,795,690

1,781,815

165,876,676

98

AD007

15,311,985

2,544,529

16.6179

229,324,952

1,294,337

119,366,169

99

AD012

16,220,912

1,713,943

10.5663

162,985,963

878,530

84,634,010

100

Total/averageb

76,175,885

13,759,003

18.0622b

1,253,409,135

6,943,305

646,470,773

99b

aHQ high quality

bAverage values

SNP and InDel discovery and SNP validation

A total of 192,172 putative SNPs (191,370 bi-allelic SNPs and 802 tri-allelic SNPs) and 3,722 putative InDels (1,545 insertions and 2,177 deletions) were identified using the program FreeBayes with QUAL scores ≥10 (90 % of identified variants are true positives) and a minimum read depth of 3 (Table 2). Most plant genomes contain a remarkable proportion of repetitive sequences. Repetitive elements in Arabidopsis and rice occupy more than 35 % of each genome (Arabidopsis Genome Initiative 2000; Sasaki 2005). It has been shown that 57 % of the soybean genome is heterochromatic (Schmutz et al. 2010). In common bean, approximately 48 % of the genome was estimated to be heterochromatic (Fonsêca et al. 2010). Variant discovery relies on the alignment of short sequence reads to the reference genome. SNPs and InDels in low-information-content repetitive regions do not align well to the reference genome (Treangen and Salzberg 2012), resulting in increased false-positive rates and variant miscalls. Thus, these SNPs and InDels from within repetitive regions are usually removed from the final list of polymorphism discovery (Subbaiyan et al. 2012; Wu et al. 2010). After filtering out SNPs and InDels in the repetitive regions, the remaining number of SNPs was 43,698, comprising 43,504 bi-allelic and 194 tri-allelic, and the remaining number of InDels was 1,267, comprising 574 insertions and 693 deletions.
Table 2

Summary of SNP and InDel discovery

Category

SNPs

InDels

Bi-allelic SNPs

Tri-allelic SNPs

Insertions

Deletions

Non-repetitive regions

43,504 (22.7 %)

194 (22.7 %)

574 (37.2 %)

693 (31.8 %)

Repetitive regions

147,866 (77.3 %)

608 (77.3 %)

971 (62.8 %)

1,484 (68.2 %)

Total

191,370

802

1,545

2,177

In order to evaluate the quality of SNPs identified, a total of 113 putative SNPs were randomly chosen for validation using HRM analysis. Of the 113 candidate SNPs, 105 (92.9 %) contained the predicted SNPs (Supplementary Table S2). This validation rate was higher than the 86 % validation rate reported in common bean and 78 % reported in wheat, but similar to the 92.5 % obtained in soybean or 91 % obtained in cattle through sequencing RRLs and predicting SNPs from a depth of greater than two reads (Hyten et al. 2010a, b; Lai et al. 2012; Van Tassell et al. 2008). The experimental validation rate was also in line with expectations based on the Freebayes quality threshold.

Analysis of SNPs and InDels

The putative bi-allelic SNPs were distributed across all 11 chromosomes (Supplementary Fig. S2). The highest number of SNPs was observed in chromosome 2 (4,788), and the lowest number was observed in chromosome 10 (2,941). Furthermore, the distribution of SNPs within each chromosome was non-random (Fig. 1). Several regions, such as the region between 59.3 and 59.4 Mb in chromosome 8 (75 SNPs), the region between 14.5 and 14.6 Mb in chromosome 9 (64 SNPs), the region between 17.3 and 17.4 Mb in chromosome 10 (73 SNPs), and the region between 7.0 and 7.1 Mb in chromosome 11 (62 SNPs), had higher levels of SNP density. There were some regions on each chromosome that did not have any SNPs. Altogether, these regions without any SNPs account for 12.3 % of all chromosome sequences. The average SNP density was much lower than that reported in common bean and other plant species, though the number of putative SNPs discovered in this work numbered far more than that published for common bean (Gaitan-Solis et al. 2008; Galeano et al. 2009; Hyten et al. 2010b; Zhu et al. 2003). The low SNP rate may reflect less diversity of sample genotypes, low coverage of RRLs on the genome, or the increased stringency in the analysis pipeline used in this study (Altshuler et al. 2000; Bitocchi et al. 2012; Depristo et al. 2011; Doebley et al. 2006; Kwak and Gepts 2009; Lai et al. 2012; Shi et al. 2011; You et al. 2011). The 43,504 bi-allelic SNPs discovered from 36 common bean local genotypes were classified as transitions (Ts) or transversions (Tv) based on nucleotide substitutions (Supplementary Table S3). Among the transitions, the number of C/T transitions was almost equal to that of G/A transitions, while A/C or G/T transversions were relatively higher in number than C/G or T/A transversions. The Ts/Tv ratio was 1.2, which is similar to that observed in other species (Choi et al. 2007; Keller et al. 2007; Maughan et al. 2010; Nelson et al. 2011). Among the 574 insertions and 693 deletions detected, the length of InDels ranged from 1 to 5 bp, and the majority of the InDels were mononucleotide insertions and deletions (Fig. 2).
Fig. 1

Distribution of bi-allelic SNPs in the 11 common bean chromosomes (Chr.). The x-axis represents the physical distance of each chromosome in Mb. The total size in each chromosome is shown in the squarebrackets. The y-axis indicates the number of SNPs. The total number of SNPs in each chromosome is shown in the parentheses

Fig. 2

Distribution of insertions and deletions based on their length. The x-axis shows the number of nucleotide deletions (gray) or insertions (black). The y-axis shows the number of InDels at each length

Annotation of SNPs and InDels

The recent release of the first draft genome of P. vulgaris has greatly simplified the annotation of SNPs and InDels. Accordingly, a number of 24,907 bi-allelic SNPs, 79 tri-allelic SNPs, 315 insertions, and 377 deletions were located in 8,758, 77, 273, and 364 genes, respectively (Table 3). The proportions of genic bi-allelic SNPs identified as coding, intronic, or untranslated region (UTR) (62.0, 30.8, and 7.2 %, respectively) were similar to the proportions identified in Arabidopsis (64.1, 26.8, and 9.1 %, respectively), but different from those in rice (43.5, 41.6, and 15.7 %, respectively) (Clark et al. 2007; McNally et al. 2009). The proportions of insertions and deletions identified as coding, intronic, or UTR were 20.6, 60.0, and 19.4, and 33.4, 52.0, and 14.6 %, respectively. Meanwhile, 7,168 nonsynonymous bi-allelic SNPs were identified within 36 common bean lines that were located in 4,303 genes (data not shown). The ratio of nonsynonymous to synonymous substitutions was 0.89. This ratio is similar to ratios seen in Arabidopsis (0.83), maize (0.79), and sorghum (0.8) but lower than those in soybean (1.40) and rice (1.2) (Clark et al. 2007; Hufford et al. 2012; Lam et al. 2010; McNally et al. 2009; Nelson et al. 2011). This suggests that these local genotypes are closely related and have adapted to similar environments through artificial selection (Clark et al. 2007; Doebley et al. 2006; Hufford et al. 2012; Hyten et al. 2006; Lam et al. 2010; McNally et al. 2009; Nelson et al. 2011; Shi et al. 2011). In addition, 11 bi-allelic SNPs were expected to introduce premature stop codons and 181 bi-allelic SNPs removed annotated stop codons.
Table 3

Summary of annotated SNPs and InDels

Category

SNPs

InDels

Bi-allelic SNPs

Tri-allelic SNPs

Insertions

Deletions

CDSa

15,448 (62.0 %)

41 (51.9 %)

65 (20.6 %)

126 (33.4 %)

Introns

7,666 (30.8 %)

32 (40.5 %)

189 (60.0 %)

196 (52.0 %)

UTRsb

1,793 (7.2 %)

6 (7.6 %)

61 (19.4 %)

55 (14.6 %)

Total

24,907

79

315

377

aCDS coding DNA sequence

bUTR untranslated region

Conclusions

Using reduced representation libraries, coupled with next-generation sequencing and the latest release of the common bean genome sequence assembly, we identified 43,698 putative SNPs and annotated 24,986 putative SNPs in genic regions. Additionally, 1,267 putative InDels were identified, including 692 putative InDels located in genic regions. The combination of SNPs and InDels discovered in this study and the SNP and InDel resources already available will help to anchor and orient scaffolds arising from future whole-genome sequencing efforts against common bean. Furthermore, the variants identified will also be useful for genetic diversity analyses, QTL mapping, genome-wide association studies, and marker-assisted breeding in common bean.

Notes

Acknowledgments

We thank Dr. Sergio Pereira and the bioinformatics team at The Centre for Applied Genomics, The Hospital for Sick Children for the next-generation sequencing and data analysis. Phaseolus vulgaris v1.0 data were produced by the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/ in collaboration with the user community. This work is supported by Ontario Research Fund (ORF), Ontario White Bean Producers’ Marketing Board (OWBPMB), Ontario Colored Bean Growers’ Association (OCBGA), and Agriculture and Agri-Food Canada (AAFC).

Supplementary material

11032_2013_9997_MOESM1_ESM.docx (102 kb)
Supplementary material 1 (DOCX 101 kb)

References

  1. Adams KL, Wendel JF (2005) Polyploidy and genome evolution in plants. Curr Opin Plant Biol 8:135–141PubMedCrossRefGoogle Scholar
  2. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES (2000) An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature 407:513–516PubMedCrossRefGoogle Scholar
  3. Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815CrossRefGoogle Scholar
  4. Arai-Kichise Y, Shiwa Y, Nagasaki H, Ebana K, Yoshikawa H, Yano M, Wakasa K (2011) Discovery of genome-wide DNA polymorphisms in a landrace cultivar of Japonica rice by whole-genome sequencing. Plant Cell Physiol 52:274–282PubMedCentralPubMedCrossRefGoogle Scholar
  5. Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3:e3376PubMedCentralPubMedCrossRefGoogle Scholar
  6. Bennett MD, Leitch IJ (1995) Nuclear DNA amounts in angiosperms. Ann Bot 76:113–176CrossRefGoogle Scholar
  7. Bennetzen JL, Ma J, Devos KM (2005) Mechanisms of recent genome size variation in flowering plants. Ann Bot 95:127–132PubMedCrossRefGoogle Scholar
  8. Bitocchi E, Nanni L, Bellucci E, Rossi M, Giardini A, Zeuli PS, Logozzo G, Stougaard J, McClean P, Attene G, Papa R (2012) Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data. Proc Natl Acad Sci USA 109:E788–E796PubMedCentralPubMedCrossRefGoogle Scholar
  9. Blair MW, Cortés AJ, Penmetsa RV, Farmer A, Carrasquilla-Garcia N, Cook DR (2013) A high-throughput SNP marker system for parental polymorphism screening, and diversity analysis in common bean (Phaseolus vulgaris L.). Theor Appl Genet 126:535–548PubMedCrossRefGoogle Scholar
  10. Choi IY, Hyten DL, Matukumalli LK, Song Q, Chaky JM, Quigley CV, Chase K, Lark KG, Reiter RS, Yoon MS, Hwang EY, Yi SI, Young ND, Shoemaker RC, Van Tassell CP, Specht JE, Cregan PB (2007) A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics 176:685–696PubMedCentralPubMedCrossRefGoogle Scholar
  11. Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT, Fu G, Hinds DA, Chen H, Frazer KA, Huson DH, Schölkopf B, Nordborg M, Rätsch G, Ecker JR, Weigel D (2007) Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 317:338–342PubMedCrossRefGoogle Scholar
  12. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510PubMedCrossRefGoogle Scholar
  13. Depristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ (2011) A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43:491–501PubMedCentralPubMedCrossRefGoogle Scholar
  14. Deschamps S, Campbell MA (2010) Utilization of next-generation sequencing platforms in plant genomics and genetic variant discovery. Mol Breed 25:553–570CrossRefGoogle Scholar
  15. Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127:1309–1321PubMedCrossRefGoogle Scholar
  16. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6:e19379PubMedCentralPubMedCrossRefGoogle Scholar
  17. Fonsêca A, Ferreira J, Dos Santos TRB, Mosiolek M, Bellucci E, Kami J, Gepts P, Geffroy V, Schweizer D, Dos Santos KGB, Pedrosa-Harand A (2010) Cytogenetic map of common bean (Phaseolus vulgaris L.). Chromosome Res 18:487–502PubMedCentralPubMedCrossRefGoogle Scholar
  18. Freyre R, Skroch PW, Geffroy V, Adam-Blondon AF, Shirmohamadali A, Johnson WC, Llaca V, Nodari RO, Pereira PA, Tsai SM, Tohme J, Dron M, Nienhuis J, Vallejos CE, Gepts P (1998) Towards an integrated linkage map of common bean. 4. Development of a core linkage map and alignment of RFLP maps. Theor Appl Genet 97:847–856CrossRefGoogle Scholar
  19. Fu YB, Peterson GW (2012) Developing genomic resources in two Linum species via 454 pyrosequencing and genomic reduction. Mol Ecol Resour 12:492–500PubMedCrossRefGoogle Scholar
  20. Gaitan-Solis E, Choi IY, Quigley C, Cregan P, Tohme J (2008) Single nucleotide polymorphisms in common bean: their discovery and genotyping using a multiplex detection system. Plant Genome 1:125–134CrossRefGoogle Scholar
  21. Galeano CH, Fernández AC, Gómez M, Blair MW (2009) Single strand conformation polymorphism based SNP and Indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.). BMC Genomics 10:629PubMedCentralPubMedCrossRefGoogle Scholar
  22. Galeano CH, Cortés AJ, Fernández AC, Soler T, Franco-Herrera N, Makunde G, Vanderleyden J, Blair MW (2012) Gene-based single nucleotide polymorphism markers for genetic and association mapping in common bean. BMC Genet 13:48PubMedCentralPubMedCrossRefGoogle Scholar
  23. Gaur R, Azam S, Jeena G, Khan AW, Choudhary S, Jain M, Yadav G, Tyagi AK, Chattopadhyay D, Bhatia S (2012) High-throughput SNP discovery and genotyping for constructing a saturated linkage map of chickpea (Cicer arietinum L.). DNA Res 19:357–373PubMedCentralPubMedCrossRefGoogle Scholar
  24. Gepts P (1998) Origin and evolution of common bean: past events and recent trends. HortScience 33:1124–1130Google Scholar
  25. Gepts P, Debouck D (1991) Origin, domestication, and evolution of the common bean (Phaseolus vulgaris L.). In: van Schoonhaven A, Voysest O (eds) Common beans: research for crop improvement. C.A.B. International, Oxon, pp 7–53Google Scholar
  26. Gepts P, Aragão FL, Barros E, Blair M, Brondani R, Broughton W, Galasso I, Hernández G, Kami J, Lariguet P, McClean P, Melotto M, Miklas P, Pauls P, Pedrosa-Harand A, Porch T, Sánchez F, Sparvoli F, Yu K (2008) Genomics of Phaseolus beans, a major source of dietary protein and micronutrients in the tropics. In: Moore P, Ming R (eds) Genomics of Tropical Crop Plants. Springer, New York, pp 113–143CrossRefGoogle Scholar
  27. Hayashi K, Yoshida H, Ashikawa I (2006) Development of PCR-based allele-specific and InDel marker sets for nine rice blast resistance genes. Theor Appl Genet 113:251–260PubMedCrossRefGoogle Scholar
  28. Herrmann MG, Durtschi JD, Bromley LK, Wittwer CT, Voelkerding KV (2006) Amplicon DNA melting analysis for mutation scanning and genotyping: cross-platform comparison of instruments and dyes. Clin Chem 52:494–503PubMedCrossRefGoogle Scholar
  29. Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M, Huang W, Magrini VJ, Richt RJ, Sander SN, Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK, Mardis ER (2008) Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5:183–188PubMedCrossRefGoogle Scholar
  30. Hufford MB, Xu X, Van Heerwaarden J, Pyhäjärvi T, Chia JM, Cartwright RA, Elshire RJ, Glaubitz JC, Guill KE, Kaeppler SM, Lai J, Morrell PL, Shannon LM, Song C, Springer NM, Swanson-Wagner RA, Tiffin P, Wang J, Zhang G, Doebley J, McMullen MD, Ware D, Buckler ES, Yang S, Ross-Ibarra J (2012) Comparative population genomics of maize domestication and improvement. Nat Genet 44:808–811PubMedCrossRefGoogle Scholar
  31. Hyten DL, Song Q, Zhu Y, Choi IY, Nelson RL, Costa JM, Specht JE, Shoemaker RC, Cregan PB (2006) Impact of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci USA 103:16666–16671PubMedCentralPubMedCrossRefGoogle Scholar
  32. Hyten DL, Cannon SB, Song Q, Weeks N, Fickus EW, Shoemaker RC, Specht JE, Farmer AD, May GD, Cregan PB (2010a) High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics 11:38PubMedCentralPubMedCrossRefGoogle Scholar
  33. Hyten DL, Song Q, Fickus EW, Quigley CV, Lim JS, Choi IY, Hwang EY, Pastor-Corrales M, Cregan PB (2010b) High-throughput SNP discovery and assay development in common bean. BMC Genomics 11:475PubMedCentralPubMedCrossRefGoogle Scholar
  34. Keller I, Bensasson D, Nichols RA (2007) Transition-transversion bias is not universal: a counter example from grasshopper pseudogenes. PLoS Genet 3:0185–0191CrossRefGoogle Scholar
  35. Kumar S, Banks TW, Cloutier S (2012) SNP discovery through next-generation sequencing and its applications. Int J Plant Genomics. Article ID 831460. doi:10.1155/2012/831460
  36. Kwak M, Gepts P (2009) Structure of genetic diversity in the two major gene pools of common bean (Phaseolus vulgaris L., Fabaceae). Theor Appl Genet 118:979–992PubMedCrossRefGoogle Scholar
  37. Lai J, Li R, Xu X, Jin W, Xu M, Zhao H, Xiang Z, Song W, Ying K, Zhang M, Jiao Y, Ni P, Zhang J, Li D, Guo X, Ye K, Jian M, Wang B, Zheng H, Liang H, Zhang X, Wang S, Chen S, Li J, Fu Y, Springer NM, Yang H, Wang J, Dai J, Schnable PS, Wang J (2010) Genome-wide patterns of genetic variation among elite maize inbred lines. Nat Genet 42:1027–1030PubMedCrossRefGoogle Scholar
  38. Lai K, Duran C, Berkman PJ, Lorenc MT, Stiller J, Manoli S, Hayden MJ, Forrest KL, Fleury D, Baumann U, Zander M, Mason AS, Batley J, Edwards D (2012) Single nucleotide polymorphism discovery from wheat next-generation sequence data. Plant Biotechnol J 10:743–749PubMedCrossRefGoogle Scholar
  39. Lam HM, Xu X, Liu X, Chen W, Yang G, Wong FL, Li MW, He W, Qin N, Wang B, Li J, Jian M, Wang J, Shao G, Sun SSM, Zhang G (2010) Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet 42:1053–1059PubMedCrossRefGoogle Scholar
  40. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760PubMedCentralPubMedCrossRefGoogle Scholar
  41. Mamidi S, Rossi M, Annam D, Moghaddam S, Lee R, Papa R, McClean P (2011) Investigation of the domestication of common bean (Phaseolus vulgaris) using multilocus sequence data. Funct Plant Biol 38:953–967CrossRefGoogle Scholar
  42. Mammadov J, Aggarwal R, Buyyarapu R, Kumpatla S (2012) SNP markers and their impact on plant breeding. Int J Plant Genomics. Article ID 728398. doi:10.1155/2012/728398
  43. Maughan PJ, Yourstone SM, Byers RL, Smith SM, Udall JA (2010) Single-nucleotide polymorphism genotyping in mapping populations via genomic reduction and next-generation sequencing: proof of concept. Plant Genome 3:166–178CrossRefGoogle Scholar
  44. McClean PE, Lee RK, Otto C, Gepts P, Bassett MJ (2002) Molecular and phenotypic mapping of genes controlling seed coat pattern and color in common bean (Phaseolus vulgaris L.). J Hered 93:148–152PubMedCrossRefGoogle Scholar
  45. McClean P, Kami J, Gepts P (2004) Genomics and genetic diversity in common bean. In: Wilson RF, Stalker HT, Brummer EC (eds) Legume crop genomics. AOCS Press, ChampaignGoogle Scholar
  46. McConnell M, Mamidi S, Lee R, Chikara S, Rossi M, Papa R, McClean P (2010) Syntenic relationships among legumes revealed using a gene-based genetic linkage map of common bean (Phaseolus vulgaris L.). Theor Appl Genet 121:1103–1116PubMedCrossRefGoogle Scholar
  47. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303PubMedCentralPubMedCrossRefGoogle Scholar
  48. McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM, Hoen DR, Bureau TE, Stokowski R, Ballinger DG, Frazer KA, Cox DR, Padhukasahasram B, Bustamante CD, Weigel D, Mackill DJ, Bruskiewich RM, Rätsch G, Buell CR, Leung H, Leach JE (2009) Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc Natl Acad Sci USA 106:12273–12278PubMedCentralPubMedCrossRefGoogle Scholar
  49. Nelson JC, Wang S, Wu Y, Li X, Antony G, White FF, Yu J (2011) Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum. BMC Genomics 12:352PubMedCentralPubMedCrossRefGoogle Scholar
  50. Nielsen R, Paul JS, Albrechtsen A, Song YS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451PubMedCentralPubMedCrossRefGoogle Scholar
  51. Oliver RE, Lazo GR, Lutz JD, Rubenfield MJ, Tinker NA, Anderson JM, Wisniewski Morehead NH, Adhikary D, Jellen EN, Maughan PJ, Brown Guedira GL, Chao S, Beattie AD, Carson ML, Rines HW, Obert DE, Bonman JM, Jackson EW (2011) Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology. BMC Genomics 12:77PubMedCentralPubMedCrossRefGoogle Scholar
  52. Ossowski S, Schneeberger K, Clark RM, Lanz C, Warthmann N, Weigel D (2008) Sequencing of natural strains of Arabidopsis thaliana with short reads. Genome Res 18:2024–2033PubMedCentralPubMedCrossRefGoogle Scholar
  53. Ramírez M, Graham MA, Blanco-López L, Silvente S, Medrano-Soto A, Blair MW, Hernández G, Vance CP, Lara M (2005) Sequencing and analysis of common bean ESTs. Building a foundation for functional genomics. Plant Physiol 137:1211–1227PubMedCentralPubMedCrossRefGoogle Scholar
  54. Ratan A, Zhang Y, Hayes VM, Schuster SC, Miller W (2010) Calling SNPs without a reference sequence. BMC Bioinformatics 11:130PubMedCentralPubMedCrossRefGoogle Scholar
  55. Rivkin MI, Vallejos CE, McClean PE (1999) Disease-resistance related sequences in common bean. Genome 42:41–47PubMedCrossRefGoogle Scholar
  56. Salathia N, Lee HN, Sangster TA, Morneau K, Landry CR, Schellenberg K, Behere AS, Gunderson KL, Cavalieri D, Jander G, Queitsch C (2007) Indel arrays: an affordable alternative for genotyping. Plant J 51:727–737PubMedCrossRefGoogle Scholar
  57. Sasaki T (2005) The map-based sequence of the rice genome. Nature 436:793–800CrossRefGoogle Scholar
  58. Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183PubMedCrossRefGoogle Scholar
  59. Shen YJ, Jiang H, Jin JP, Zhang ZB, Xi B, He YY, Wang G, Wang C, Qian L, Li X, Yu QB, Liu HJ, Chen DH, Gao JH, Huang H, Shi TL, Yang ZN (2004) Development of genome-wide DNA polymorphism database for map-based cloning of rice genes. Plant Physiol 135:1198–1205PubMedCentralPubMedCrossRefGoogle Scholar
  60. Shendure J, Ji H (2008) Next-generation DNA sequencing. Nat Biotechnol 26:1135–1145PubMedCrossRefGoogle Scholar
  61. Shi C, Navabi A, Yu K (2011) Association mapping of common bacterial blight resistance QTL in Ontario bean breeding populations. BMC Plant Biol 11:52PubMedCentralPubMedCrossRefGoogle Scholar
  62. Singh SP, Gepts P, Debouck DG (1991) Races of common bean (Phaseolus vulgaris, Fabaceae). Econ Bot 45:379–396CrossRefGoogle Scholar
  63. Souza TLPO, de Barros EG, Bellato CM, Hwang EY, Cregan PB, Pastor-Corrales MA (2012) Single nucleotide polymorphism discovery in common bean. Mol Breed 30:419–428CrossRefGoogle Scholar
  64. Subbaiyan GK, Waters DLE, Katiyar SK, Sadananda AR, Vaddadi S, Henry RJ (2012) Genome-wide DNA polymorphisms in elite indica rice inbreds discovered by whole-genome sequencing. Plant Biotechnol J 10:623–634PubMedCrossRefGoogle Scholar
  65. Treangen TJ, Salzberg SL (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13:36–46Google Scholar
  66. Vallad G, Rivkin M, Vallejos C, McClean P (2001) Cloning and homology modelling of a Pto-like protein kinase family of common bean (Phaseolus vulgaris L.). Theor Appl Genet 103:1046–1058CrossRefGoogle Scholar
  67. Vallejos CE, Sakiyama NS, Chase CD (1992) A molecular marker-based linkage map of Phaseolus vulgaris L. Genetics 131:733–740PubMedCentralPubMedGoogle Scholar
  68. van Orsouw NJ, Hogers RCJ, Janssen A, Yalcin F, Snoeijers S, Verstege E, Schneiders H, van der Poel H, van Oeveren J, Verstegen H, van Eijk MJT (2007) Complexity reduction of polymorphic sequences (CRoPS™): a novel approach for large-scale polymorphism discovery in complex genomes. PLoS ONE 2:e1172PubMedCentralPubMedCrossRefGoogle Scholar
  69. Van Tassell CP, Smith TPL, Matukumalli LK, Taylor JF, Schnabel RD, Lawley CT, Haudenschild CD, Moore SS, Warren WC, Sonstegard TS (2008) SNP discovery and allele frequency estimation by deep sequencing of reduced representation libraries. Nat Methods 5:247–252PubMedCrossRefGoogle Scholar
  70. Varshney RK, Nayak SN, May GD, Jackson SA (2009) Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 27:522–530PubMedCrossRefGoogle Scholar
  71. Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lipshutz R, Chee M, Lander ES (1998) Large-scale identification, mapping, and genotyping of single- nucleotide polymorphisms in the human genome. Science 280:1077–1082PubMedCrossRefGoogle Scholar
  72. Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164PubMedCentralPubMedCrossRefGoogle Scholar
  73. Wang Y, Yu KF, Poysa V, Shi C, Zhou YH (2012) A single point mutation in GmHMA3 affects Cadimum (Cd) translocation and accumulation in soybean seeds. Mol Plant 5:1154–1156PubMedCrossRefGoogle Scholar
  74. Wu X, Ren C, Joshi T, Vuong T, Xu D, Nguyen HT (2010) SNP discovery by high-throughput sequencing in soybean. BMC Genomics 11:469PubMedCentralPubMedCrossRefGoogle Scholar
  75. You FM, Huo N, Deal KR, Gu YQ, Luo MC, McGuire PE, Dvorak J, Anderson OD (2011) Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. BMC Genomics 12:59PubMedCentralPubMedCrossRefGoogle Scholar
  76. Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, Cregan PB (2003) Single-nucleotide polymorphisms in soybean. Genetics 163:1123–1134PubMedCentralPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Xiaolu Zou
    • 1
  • Chun Shi
    • 2
  • Ryan S. Austin
    • 1
  • Daniele Merico
    • 3
  • Seth Munholland
    • 5
  • Frédéric Marsolais
    • 1
  • Alireza Navabi
    • 2
    • 4
  • William L. Crosby
    • 5
  • K. Peter Pauls
    • 4
  • Kangfu Yu
    • 2
  • Yuhai Cui
    • 1
  1. 1.Southern Crop Protection and Food Research CentreAgriculture and Agri-Food CanadaLondonCanada
  2. 2.Greenhouse and Processing Crops Research CentreAgriculture and Agri-Food CanadaHarrowCanada
  3. 3.The Centre for Applied GenomicsThe Hospital for Sick ChildrenTorontoCanada
  4. 4.Department of Plant AgricultureUniversity of GuelphGuelphCanada
  5. 5.Department of Biological SciencesUniversity of WindsorWindsorCanada

Personalised recommendations