Introduction

Microsatellites, or simple sequence repeats (SSRs), are short (1–6 bp) repeat DNA motifs that are usually single locus markers with characteristics of hypervariability, abundance and reproducibility. The variation of the SSR repeat units can be easily differentiated by PCR products amplified with primers flanking the SSR motif. SSRs have been widely used for bacteria screening (Lin et al. 2005), plant genotyping (Chen et al. 2006), linkage mapping (Zhang et al. 2002), gene tagging (Roy et al. 2002), and map-based gene cloning (Tekeoglu et al. 2002).

The availability of ESTs greatly accelerates the systematic identification of SSRs and corresponding marker development based on computer analytical approaches (Varshney et al. 2002; Gao et al. 2003; Thiel et al. 2003; Chen et al. 2006). EST-derived SSRs have been well documented in some plant species including Arabidopsis (Depeiges et al. 1995), sugarcane (Cordeiro et al. 2001), cereal species (Kantety et al. 2002), cacao (Lima et al. 2008), and rubber tree (Feng et al. 2009). Using homology searches, putative functions can be deduced for the SSRs and thereby provide a new resource that can further aid in genetic and evolutionary studies (Cho et al. 2000; De Keyser et al. 2009).

EST-SSR and genomic SSR markers should be considered as complementary to plant genome mapping, with EST-SSR being less polymorphic but concentrated in the gene-rich regions (Varshney et al. 2006). With hundreds of thousands of ESTs available in the public domain, the process of developing EST-SSR markers has been greatly accelerated by using optimized computational pipelines and high-throughput genotyping techniques.

SSR markers have been widely used in grape genotyping. The high polymorphism of Vitis-derived microsatellite loci has been reported extensively in the literature and used for fingerprinting (Thomas and Scott 1993; Bowers et al. 1996, 1999a, b; Sefc et al. 1998; Arroyo García et al. 2002; Di Gaspero et al. 2005, 2007; Merdinoglu et al. 2005; Lamoureux et al. 2006; Costantini et al. 2007; De Mattia et al. 2007; Cipriani et al. 2008; Bocharova et al. 2009; Riaz et al. 2009). Several publications have also demonstrated transferability of SSR markers across the Vitis genus (Lin and Walker 1998; Tessier et al. 1999; Di Gaspero et al. 2000; Fernández et al. 2008).

SSR markers have been used for construction of grape genetic maps (Dalbò et al. 2000; Doligez et al. 2002; Grando et al. 2003; Adam-Blondon et al. 2004; Doucleff et al. 2004; Riaz et al. 2004, 2006; Lowe and Walker 2006; Di Gaspero et al. 2007; Vezzulli et al. 2008). While the majority of the loci in grape linkage maps are microsatellite markers developed from genomic DNA libraries, the availability of EST-SSRs will serve as new genetic markers to be included into the linkage map (Decroocq et al. 2003; Akkak et al. 2006; Salmaso et al. 2008). EST-SSRs have been reported to be less polymorphic but to have higher transferability than genomic SSRs in grape and other plants because of greater DNA sequence conservation in transcribed regions (Scott et al. 2000; Cho et al. 2000; Chabane et al. 2005).

Traditionally, SSR PCR products are separated by polyacrylamide gels (Thiel et al. 2003) or Metaphor Agarose gels (Chani et al. 2002). The electrophoresis-based technology is low-throughput and comes with imprecise sizing at times. Automatic capillary sequencing using fluorescently-labeled primers (Eujayl et al. 2002) provides more accurate and high-throughput genotyping results but the cost of dye-labeling each forward primer is high. Using M13 universal labeled primers with automatic capillary sequencing can not only reduce the cost but also provide fast and precise genotyping results (Oetting et al. 1995, Chen et al. 2006).

Here we report the identification and characterization of 1,701 unique grape EST-SSRs derived from a total of 215,609 grape ESTs. A set of SSR markers was developed from this analysis and validated by using M13 universal primers and an automatic capillary sequencing system.

Materials and methods

Plant materials

For PCR amplification, genotyping and polymorphism analysis, we selected four genotypes which are parents of two mapping populations: V. vinifera cvs. Cabernet Sauvignon and Riesling, and V. rotundifolia cvs. Noble and Summit. Genomic DNA was extracted from young leaves/shoot tips of these grape cultivars using a modified CTAB protocol (Qu et al. 1996).

Grape EST and genomic sequences retrieval from NCBI

All grape EST sequences available in the NCBI database on 10 February 2006 were retrieved. Among the total of 215,609 ESTs, 194,200 were from V. vinifera, 10,704 were V. shuttleworthii, 2,177 were V. aestivalis, 1,995 were V. riparia, and 6,533 were V. hybrids (V. rupestris A. de Serres × V. spp. b42-26).

A total of 31,910 genomics sequences were also retrieved from NCBI on 12 June 06. Among them, 30,832 were from a BAC library of V. vinifera cv. Pinot Noir, and 1,078 from V. vinifera cvs. Syrah and Maxxa.

Computer programs for mining SSRs from ESTs

A Perl script program named Microsatellite (MISA) developed by Thiel et al. (2003, http://pgrc.ipk-gatersleben.de/misa) was used to identify EST-SSRs. The SSRs are between 2 and 6 nucleotides in size. The minimal length of SSR repeats was defined as 2 × 9 = 18 bp for dinucleotides, 3 × 6 = 18 bp for trinucleotides, 4 × 5 = 20 bp for tetranucleotides, 5 × 4 = 20 bp for pentanucleotides, and 6 × 4 = 24 bp for hexanucleotides. ESTs containing SSRs were assembled in Sequencher® version 4.2 (Genecodes, Ann Arbor, Michigan, USA) under criteria of 40% minimum overlap and 90% minimum match percentage. A flow chart for mining and developing the grape EST-SSR markers is provided in Fig. 1.

Fig. 1
figure 1

Flow chart of Vitis EST-SSR identification and validation

Functional annotation of EST-SSRs

HTGOFAT, a data mining and annotation tool kit developed in Microsoft NET 2003, was utilized to functionally annotate the assembled EST-SSRs sequences (Dowd and Zaragoza 2005). The putative functional genes were classified using the Munich Information Centre for Protein Sequences (MIPS) Arabidopsis thaliana functional catalogue (MATDB, http://mips.gsf.de).

PCR and fragment analysis

EST sequences flanking the microsatellite motifs were used to design PCR primers using the program Primer3®. A total of 150 primer pairs (Table 3) were screened for assessment of polymorphisms among the four parents using a CEQ Genetic Analyzer (Beckman Coulter, California, USA).

To save cost, a 20-bp long universal M13 forward primer sequence GTT GTAAAA CGA CGG CCA GT (Oetting et al. 1995) was added as a common tail to the 5′ end of all 180 SSR forward primers. All SSR primers, including regular and M13-tailed forward primers, were synthesized by Operon Technologies (Huntsville, Alabama, USA). The universal M13 primers were labeled by Sigma-Genosys (USA) and used for CEQ Genetic Analyzer Fragment Analysis.

PCR reactions were performed in a 20-μl reaction mix including 30 ng of genomic DNA, 10 × PCR buffer (Promega), 2 μl of 2 mM dNTP (Promega), 1.0 U Taq DNA polymerase, 2.8 μl of 25 mM MgCl2 (Promega), and 0.3 μM primers. The PCR reactions were carried out in a PTC-200 thermal cycler (MJ Research) with the following thermal profile: 3 min at 94°C followed by 30 cycles of 1 min denaturation at 94°C, 1 min annealing at 48 to 58°C (based on the T m of the different primer sets), and 2 min extension at 72°C, followed by a final step of 6 min extension at 72°C. The same conditions were also used for labeling the primers.

For fragment analysis using the CEQ Genetic Analyzer, 0.25 μl of each M13 labeled PCR product was mixed with 40 μl Sample Loading Solution (Beckman Coulter 608087) with 0.2 μl 400-bp DNA size standard (Beckman Coulter 608098) and overlaid with one drop of light mineral oil, then loaded into the 96-well sample microtiter plates (Beckman Coulter 609801). CEQ Sequencing Separation Buffer (Beckman Coulter 608012) were also loaded into the 96-well separation plate (Beckman Coulter 609844). Dye-labeled amplicons were automatically sized by running on “Frag-3” separation and the GenomeLab software (Beckman Coulter) and then visually examined.

Results and discussion

Identification and characterization of grape EST-SSRs

A total of 6,447 out of 215,609 (3%) grape ESTs retrieved from NCBI on 1 February 2006 contained SSRs (Table 1). With some of them having multiple SSR sites, a total of 6,815 SSR motifs were identified among these 6,447 EST sequences. The percentage of EST-SSRs varied slightly among different Vitis species, ranging from 2.98% for V. vinifera (5,782 of 194,200), 3.50% for V. aestivalis (74 of 2,116), 3.55% for V. shuttleworthii (389 of 10,933), to 5.43% for V. riparia (59 out of 1,087). The EST-SSRs accounted for 2.71% for a Vitis hybrid of (V. rupestris A. de Serres × V. spp. b42-26) (177 of 6,533; Electronic Supplementary Material 1).

Table 1 Characterization of grape redundant and non-redundant EST and genomic SSRs

Among the redundant EST-derived SSR repeats, tri-nucleotide, which accounted for 50.2% of total SSRs, was the most abundant repeat unit followed by di (28.5%), hexa (11.4%), penta (9.6%), and tetranucleotide (6.1%; Table 1). These findings are in agreement with previous observations on abundance of SSR repeat units in barley, maize, rice, sorghum, and wheat (Kantety et al. 2002). The dominance of trinucleotide SSRs was viewed as the result of a frame shift in size of one amino acid read, or three nucleotides, a selection against possible frame shift mutations (Metzgar et al. 2000; Toth et al. 2000; Wren et al. 2000; Cordeiro et al. 2001). For the same reason, a higher percentage was also observed in hexanucleotide SSRs than tetra- and penta-repeats. In both non-redundant and redundant EST-SSRs (Table 1), di- and tri-repeats were accounted for about 80% of the total EST-SSRs for each group (redundant: di-28.5%, tri-50.2%; non-redundant: di-39%, tri-41.7%). Interestingly, the proportion of tri repeats dropped from 50.2% in redundant to 41.7% in non-redundant ESTs while di repeats increased from 28.5 to 39.0% after eliminating the redundancy by contig assembling (Table 1). The result was interpreted to suggest that tri-repeat SSRs were mainly found in coding regions (Yu et al. 2004) and many of these redundant EST-SSRs were eliminated because these sequences contain tri-repeats representing putative amino acid runs (Li et al. 2004) as overexpressed ESTs representing the same set of genes. Another explanation is the effect of gene duplication and paralogy. Depending on the parameters used for clustering, untranslated regions of paralogous genes, which are more divergent and contain all types of SSR, might have remained separated, while ESTs covering exons of paralogous genes, which are more conserved and highly enriched in tri-nucleotide SSR, might have collapsed more frequently into a “single” redundant EST.

Comparison between genomic and EST derived SSRs

Unlike the tri-nucleotide repeats as the dominant type in SSR-ESTs, the number of genome sequence-derived SSRs were dominated by di-nucleotide repeats that accounted for 51.9% of total genomic SSRs, followed by tri- (25.3%), tetra- (25.2%), penta- (7.4%), and hexa-SSRs (3.9%; Table 1). Similar patterns for EST-SSRs having a higher proportion in tri-repeats than genomic SSRs were reported in the literacture (Cardle et al. 2000). Among the top 20 SSRs in ranking, the most abundant di-nucleotide repeat in non-redundant ESTs was AG/CT which accounted for 17.9% of total EST-SSRs, followed by AT/AT (8.4%; Table 2), while the most abundant di-nucleotide repeat in genomic sequences was AT/AT which accounted for 33.0%, followed by AG/CT (15.5%) and AC/GT (3.5%). The most common EST-derived tri-nucleotide repeat was AAG/CTT (14.0%), while AAT/ATT (18.6%) was the most abundant tri-nucleotide SSRs derived from genomic sequences. Among grape genomic sequences, around 67.1% of the SSRs belonged to three types of repeats: AT/AT (33.0%), AAT/ATT (18.6%), and AG/CT (15.5%; Tables 2 and 3).

Table 2 Top 20 SSR motifs in grape ESTs and genomic sequences
Table 3 Genotyping and allelic details of the 145 EST-SSRs in four grape cultivars

Functional analysis of EST-SSR sequences

The 1,701 assembled non-redundant EST-SSRs were functionally annotated using the HTGOFAT program (Dowd and Zaragoza 2005). Fifty-eight percent (994 out of 1,701) of the EST-SSRs were annotated and grouped by the Biological Process Classification using the MIPS MATDB Arabidopsis Scheme. The most abundant EST-SSRs belonged to the categories of protein-binding (22%) and subcellular localization (18% ; Fig. 2), which demonstrated a similar pattern to wheat, rice, maize and barley (Tang et al. 2006). The 150 validated markers were further functionally annotated (Electronic Supplementary Material 2) and estimation of their genomic/chromosome locations by comparison to grapevine genome assembly (Jaillon et al. 2007; Velasco et al. 2007) is given in Electronic Supplementary Material 3.

Fig. 2
figure 2

Functional prediction of 994 grape SSR-EST based on the MIP MATDB classification scheme

SSR marker development and validation

The Beckman CEQ8800 Genetic Analyzer was used for the SSR validation and analysis. This system can detect DNA fragment length polymorphism in a “single base pair”. A set of 150 primer pairs was initially screened for SSR marker development and validation. Parents of two mapping populations, V. vinifera Riesling × Cabernet Sauvignon (Riaz et al. 2004) and V. rotundifolia Summit × Noble (Ren et al. 2000) were used for the screening. Results showed that 145 out of 150 primers had well-amplified fragments among the four cultivars (Table 3). Some of the fragment sizes exceeded expected sizes possibly due to their having introns within the flanking regions or the length of the repeat being shorter than the source species, and less prone to polymerase slippage. Polymorphisms were found in 66 primer pairs between Riesling and Cabernet Sauvignon, and 40 between Summit and Noble. Only 16 of the polymorphic primers shared the same polymorphic lengths between these two parent pairs, reflecting the fact that the alleles between V. vinifera and V. rotundifolia grape are distinct (Riaz et al. 2008).

The homo and heterozygosity of these 145 loci were screened in the four testing cultivars; 92 of 144 were identified as homozygous and 52 were heterozygous loci in Riesling, while 86 of 136 were homozygous and 49 were heterozygous loci in Cabernet Sauvignon. Among the 92 Riesling and 86 Cabernet Sauvignon homozygous loci, 68 are common in both parents (Table 4). As for the muscadine grapes Noble and Summit, Noble showed 97 homozygous and 39 heterozygous loci and the respective number for Summit was 108 and 34 (Table 4). Some of those microsatellite loci were selected for having long stretches in V. vinifera grapes, and thus may show more polymorphisms than in muscadines. From this screening of 145 loci, muscadine grapes demonstrated a higher homozygosity compared to V. vinifera grapes.

Table 4 Level of heterozygosity in two Vitis vinifera and two Muscadinia rotundifolia genotypes for a set of 145 SSR markers

Pairwise comparison between Riesling and Cabernet Sauvignon showed that 72 loci were monomorphic with either one allele (60) or two (12). Seventy loci were heterozygous in at least one cultivar with either two (53), three (10), or four alleles (7; Table 5). The muscadine Noble and Summit showed 50 heterozygous loci in at least one parent with either two (31), three (17), or four alleles (2). Ninety homozygous loci were found in both cultivars with either one (84) or two alleles (6; Table 5). According to the results from these 145 EST-SSR loci, the percentage of polymorphisms is about 49% between Riesling and Cabernet Sauvignon, and 29% between Summit and Noble. However, those polymorphic SSRs that are homozygous (e.g. aa × bb) in both parents cannot be mapped in F1 populations although they are useful for mapping in F2 or backcross populations (Chen et al. 2006). The heterozygous monomorphic SSRs (e.g. ab × ab) can be used for mapping in F1 populations (Table 5). As a result, the estimated number of SSRs that can be mapped in the F1 populations between Riesling and Cabernet Sauvignon is about 46%, which means that out of the total 1,037 SSRs with successful primers designed, around 477 EST-SSR putative markers can be mapped in the F1 population, and about 33% of the total SSRs (342 EST-SSR loci) can be mapped in the F1 of Summit × Noble.

Table 5 Distribution of the segregation types expected for the two mapping populations

EST-SSR marker transferability was evaluated and the current research showed a high transferability across species. All but two of the 145 EST-SSR markers in Vitis vinifera appeared in the muscadine as well. This result indicated that development of EST-SSR markers is a cost-effective method for obtaining additional markers for grape genome typing and gene mapping.

EST-SSRs provided sources of additional markers for marker development. Compared to genomic-derived markers, EST-SSRs are highly transferable for detecting the gene-rich areas within the genome. We can utilize these markers to evaluate marker transferability across taxa, and conduct analysis in comparative mapping and gene functional diversity analysis, in addition to genotyping. The functional EST-SSR markers should be even more useful for developing a linkage map or tagging a viticulturally important trait. In addition, the polymorphic EST-SSR markers are much needed for genotyping, cultivar identification and development of a linkage map in muscadine grapes since they are genetically much less diversified than Vitis species.