Tree Genetics & Genomes

, Volume 10, Issue 5, pp 1271–1279 | Cite as

Genome-wide characterization and selection of expressed sequence tag simple sequence repeat primers for optimized marker distribution and reliability in peach

  • Chunxian Chen
  • Clive H. Bock
  • William R. Okie
  • Fred G. GmitterJr.
  • Sook Jung
  • Dorrie Main
  • Tom G. Beckman
  • Bruce W. Wood
Original Paper

Abstract

Simple sequence repeats (SSR) in Prunus expressed sequence tags (EST) were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability in peach. A total of 4,770 and 9,029 SSRs were identified from 12,618 contigs and 34,238 singlets, from which 3,695 and 6,849 primers were designed, respectively. Alignment of the 10,544 forward and reverse primer sequences (21,088 queries) against the peach reference genome at 9e-03 resulted in 23,553 hits (96,621 alignments) with 16,885 queries, and “no hits found” (NHF) for the remaining 4,203 queries. A majority of aligned primers had only one hit/alignment on the peach scaffolds, and the distribution of the 5,500 singly aligned primers (pairs) on each 500-kb genome interval was determined. The average number of ESR-SSR primers per 500-kb interval was 10.8. The primers were categorized into eight subgroups based on the difference between the genome amplicon size and expressed amplicon size of each primer, with 288 primers of optimized distribution and reliability selected for genotype evaluation. Only 2 of the 288 primers failed in all 4 peach cultivars screened, with an overall successful primer/sample rate of 97.2 %. The average number of alleles detected in the four cultivars was 3.84. The polymorphism information content (PIC) values suggested that a majority of the 288 primers had a high rate of allele polymorphism among the four peach cultivars. The advantages of genome-wide analysis of EST-SSR primers and options to improve the polymorphism rate are discussed.

Keywords

Microsatellite Short tandem repeat (STR) Marker-assisted selection (MAS) Variety authentication Reference genome 

Introduction

DNA markers and methodologies have changed in the last 20 years, including the range of marker types, attributes, popularity, development approaches, detection technologies, and throughputs. DNA markers have allowed many molecular studies to make important advances in genetics, taxonomy, ecology, and evolution (Agarwal et al. 2008). In the pregenome era, widely used DNA markers include restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNAs (RAPDs), cleaved amplified polymorphic sequences (CAPS), and amplified fragment length polymorphisms (AFLPs); in the postgenome era, sequence-based codominant markers such as simple sequence repeats (SSRs, also called microsatellite or short tandem repeats—STRs) and single-nucleotide polymorphisms (SNPs) prevailed partly due to concurrent development of massive genomic sequences and computational capabilities along with the initiation and accomplishment of many genome programs (McCarthy 1993; Thiel et al. 2003; Chen et al. 2006; Tang et al. 2006; Horner et al. 2010; Chen and Gmitter 2013; International Peach Genome et al. 2013). Most SSRs are developed from publicly available gene-derived expressed sequence tag (EST) sequences (Thiel et al. 2003; Chen et al. 2006; Kayesh et al. 2013; Miah et al. 2013). An assessment of plant genomes suggested a significantly higher SSR frequency in the low-copy transcribed regions compared with other regions of the genome (Morgante et al. 2002). Generally, SNP genotyping is performed at high throughput and requires expensive proprietary instruments and allele analysis/calling programs (Chen and Sullivan 2003), which is not economically practical for marker applications on a routine and budget-constrained basis. SSR genotyping is more affordable and suitable for many studies because of its throughput and detection flexibility (Chen et al. 2008; Miah et al. 2013). However, the distribution and performance (detectability) of randomly selected SSRs are generally unknown, which can result not only in unevenly distributed primers, but also many primer failures in linkage analysis or other SSR marker applications (Chen et al. 2008; Kayesh et al. 2013; Miah et al. 2013). Furthermore, the allelic heterozygosity and the polymorphism rate of randomly selected EST-SSR primers tend to be very low (Chen et al. 2008), which negatively impacts the power of gene mapping. The heterozygosity (H) and polymorphism information content (PIC) values are the most widely used indices and measures to evaluate and predict whether a genetic marker will be informative among cultivars or strains (Terwilliger et al. 1992; Pettersson et al. 1995; Ott and Rabinowitz 1997; Liu and Muse 2005). The failure of these particular SSR primers and an unpredicted, relatively high rate of homozygosity and nonpolymorphism in successfully detected primers are major factors precluding SSR markers from being efficiently used in genetic mapping or other studies relying on allelic heterozygosity and polymorphisms. With many reference genomes now available, genome-wide characterization of EST-SSR primers likely offers a solution to the issue or at least enables optimal selection of primers with predicted distribution, a lower risk of primer failure, and a higher polymorphism rate, compared to a random selection of SSRs (Chen et al. 2008; Kayesh et al. 2013).

Little attention has been given to failed and/or poorly performing EST-SSR primers due to the difficulty (time and cost) in addressing the unknown causes on a primer-by-primer basis and the lack of interest (i.e., lack of value) in reporting them. The success of an EST-SSR primer depends on the amplification process and instrument-dependent detectability of the products. In other words, the failure of a primer can be caused by a failed amplification or a failed detection if amplification is successful. A recent sequence analysis of 340 failed and successful EST-SSR primers on a reference genome revealed several genomic factors affecting the primer performance and polymorphism in the studied genomes. The main causes of failed primers are due to first the forward and reverse primers being positioned too far into the target genome to form the expected amplicons due to sequencing/assembly errors in EST contigs or in the reference genome; second, introns being too long to allow the genomic amplicon to be detectable (and/or reliably amplified); third, multiple full and partial primer alignments likely caused by paralogs; and fourth, failed full alignment of contig-derived primer sequences containing discrepant nucleotides (Chen et al. 2014). Therefore, based on their alignment on a reference genome, primers can be effectively distinguished and assigned into different reliability categories; meanwhile, their distribution in the genome is also determined. The information provides clear guidance for selection of well-distributed primers from only highly reliable categories and of either genome-wide or localized interest.

Different types of DNA markers have been used to study various aspects of peach (Prunus persica L. Stokes) and other Prunus species. RAPDs were used for genetic linkage mapping (Warburton et al. 1996), AFLPs were used for the diagnosis and mapping of peach tree short life (PTSL) syndrome (Blenda et al. 2006, 2007), SSRs were used for genome and trait mapping (Bliss et al. 2002; Howad et al. 2005; Ogundiwin et al. 2009; Lambert and Pascal 2011), and a 9k SNP array developed that has potential for many marker applications (Verde et al. 2012). Since the first set of peach SSRs was developed (Cipriani et al. 1999), the number of SSRs on peach genetic maps ranges from 4 (Bliss et al. 2002), to 21 (Lambert et al. 2004), and to 264 collected from various sources (including EST-SSRs) and mapped by bin mapping with an almond × peach F2 population (Howad et al. 2005). Increasing Prunus ESTs available in the Genome Database for Rosaceae (GDR) allows identification of substantially more EST-SSRs (Jung et al. 2004, 2008, 2014). However, without further genome-wide characterization of these EST-SSR primers, it would be impossible to optimally select core sets of primers with optimal distribution, reliability, and polymorphism. In this study, Prunus EST-SSR primers were mined and aligned onto the peach reference genome (International Peach Genome et al. 2013) to determine their genome distribution, to predict genomic amplicon sizes, and to characterize the genomic features in these amplicons. Following these results, a core set of primers with optimal distribution, reliability, and polymorphism were selected as candidates for potential use in various marker applications in peach and other Prunus species.

Materials and methods

Prunus ESTs and varieties

A total of 118,965 unique Prunus ESTs were retrieved from the GDR. Peach had the most ESTs (81,200), with the remainder from apricot (Prunus armeniaca L.), sweet cherry (Prunus avium L.), and eight other species or biotypes of Prunus (Electronic Supplementary Material [ESM] Table 1). The FASTA header of each sequence was simplified to its accession ID that was further added a unique two-letter prefix derived from species names to track the genotype source. Four peach cultivars of different origin and characteristics (Okie 1998), “Chinese Cling,” “Blazeprince,” “Helen Borchers,” and “Heath Cling,” were used to screen the selected microsatellites. Genomic DNAs were isolated from 5-g tender, young leaves using a CTAB protocol slightly modified from the method previously described (Doyle and Doyle 1987).

Bioinformatics of EST-SSR mining and primer modeling

Bioinformatics programs were installed in Linux CentOS. All the sequences were combined into a single file and assembled under 95 % similarity using CAP3 (Huang and Madan 1999). All contigs and singlets were used for microsatellite motif identification using misa (Thiel et al. 2003) with primer design achieved using Primer3 (Rozen and Skaletsky 2000). The paired numbers representing microsatellite motif length and minimum repeat number in the misa configuration file were modified to 2-6 3-4 4-3 5-3 6-3 (the mono-type were excluded), and the maximum interval between any adjacent microsatellites remained 100 bp (Thiel et al. 2003). The primers were designed with an optimal length of 24 bp and with expected PCR products of 100–300 bp (Rozen and Skaletsky 2000). The primer result files were saved in a tab-delimited text file (ESM Table 2) and imported into MS Excel file for subsequent analysis and summary.

Primer sequence alignment with the peach reference genome

The peach reference genome sequence version 1.0 (International Peach Genome et al. 2013) was retrieved from the GDR (Jung et al. 2004, 2008, 2014) and formatted into a database by formatdb in BLAST (Altschul et al. 1997). The microsatellite primer sequences were formatted into FASTA format for BLASTN against the reference genome sequence. The BLAST cutoff e value was set at 9e-03 (0.009) that allowed any alignment of 19+ contiguous nucleotides or 22+ with only 1 discrepancy to be saved in the blast output file. Those primers with “no hits found” (NHF) at the cutoff e value were picked out to run BLAST without an e value restraint to determine any possible alignments on the reference genome and thus gain an additional assessment of these primers. The SSR primer sequences mapped in Prunus and available in the GDR were also formatted and included in the blast run. The aligned position information was used to avoid duplicated selection of primers flanking the same SSR motifs and loci, i.e., to ensure only selection of new EST-SSR primers at unmapped loci.

The genomic amplicon size (GAS) of each primer was calculated by subtraction of the maximum and minimum values representing the four start and end alignment positions of the forward and reverse primers on the reference genome. The difference between each GAS and the predicted corresponding EST amplicon size (EAS) was calculated. If equal to 0, it simply suggested the same sequence length of the genomic and expressed amplicon, and if not 0, it would tentatively represent the size of any intron(s) in the genomic sequence or allelic deletions/insertions at the locus, depending on the presumed cutoff minimum intron length (Wendel et al. 2002). No GAS can be calculated for NHF primers for either the F or R primer since there is no or only one aligned primer. If a GAS was not in an acceptable amplicon size range (e.g., over several hundred kilobases) and there were multiple alignments of either or both primers on different scaffolds, a search was performed to find if there were paired forward and reverse alignments on the same scaffold and move the primers to the first location for the calculation. Those forward and reverse primers aligned on two different scaffolds or with excessively long GASs (e.g., >=10,000 kb) on the same scaffolds were tentatively categorized into an “error” subgroup. The distribution of these microsatellite loci on the reference genome was determined based on the start alignment positions of all F primers. A total of 288 new primers (ESM Table 3) evenly distributed on all 8 scaffolds, about 1–2 primers in every 0.5–1 Mb genome interval, were selected for subsequent genotyping validation. Two additional selection criteria were also used: (1) The GAS of these primers had to be 80 to 480 bp (preferentially100 to 300 bp) so as to fit the size range of ensured detectability in widely used fluorescence/capillary or polyacrylamide gel (PAG)-based platforms; (2) if available, the GAS of these primers were preferred to contain presumed allelic deletions/insertions or introns so as to potentially maximize the polymorphism rate. Primers with no differences between EAS and GAS were selected only if there was no other primer choice. Other primers, including those with NHF, error, or oversized intron/GAS, were excluded. The selected primers will have useful applications in genetic studies of Prunus in the future.

Microsatellite genotyping and polymorphism validation

Microsatellite genotyping was performed as previously described (Chen et al. 2006). The M13 forward primer sequence (GTT GTA AAA CGA CGG CCA GT) was added as a common tail to the 5′ end of all microsatellite forward primers (Oetting et al. 1995). The tagged forward primers and their nontagged reverse primers were synthesized by Eurofins MWG Operon Technologies (Huntsville, AL). All the 288 forward and reverse primers were stored in six 96-well plates for high-throughput screening and genotyping. On the other hand, for easy identification and use of individual primers from the plates in the future applications, the 288 primers were named by their well positions prefixed with “CX” and the plate number (1, 2, and 3), for example, CX1A01 named for the primer at the A01 position of Plate 1F and 1R, CX3H12 for the primer at H12 of Plate 3F and 3R, and so on (ESM Table 3). Four fluorescently labeled M13 tags with 6FAM, VIC, NED, and PET labels were synthesized by Life Technologies (Carlsbad, CA). PCR was performed in a C1000 Touch Thermal Cycler with a CFX384 block module (Bio-Rad, Hercules, CA) in a 5-μl volume consisting of 1× PCR buffer, 0.2 mM dNTPs, 2 mM MgCl2, 0.3 μM of the forward and reverse primers, 0.05 μM dye-labeled M13 tagged forward primer, 0.5 U Taq DNA polymerase (BioExpress, Kaysville, UT), and ~10 ng DNA template. A touchdown PCR program was run with an initial step of 94 °C for 3 min, followed by 10 cycles of denaturation at 94 °C for 30 s, annealing at 61 °C for 30 s with a 0.5 °C decrement each cycle, and extension at 72 °C for 45 s, followed by 30 more cycles with a constant annealing temperature of 56 °C (other parameters were the same), plus a final extension at 72 °C for 15 min. The dye-labeled PCR products were genotyped on a 3100xl Genetic Analyzer (Life Technologies, Carlsbad, CA). GeneMarker 2.4 (SoftGenetics, State College, PA) was used to analyze the chromatographic trace files and generate the microsatellite allele table.

Genotyping data analysis

The allele table was converted to the format required by PowerMarker (Liu and Muse 2005) and imported to the program to calculate the number of alleles detected, the H value, PIC value, and the gene diversity value of each marker among the four peach cultivars, which were used to evaluate and predict the informativeness and usability of the primers.

Results

In silico identification of SSRs in Prunus ESTs

A total of 12,618 contigs were assembled from 84,727 ESTs, with 34,238 ESTs left as singlets (Table 1). About 98.25 % contigs contained 2 to 38 ESTs, with an average number of ESTs in the contigs of 6.7. A total of 4,770 SSRs were identified in the 12,618 contigs and 9,029 SSRs in the 34,238 singlets, with 3,695 and 6,849 primers designed, respectively. A majority of SSRs were tri and bi types, accounting for 37.4 and 28.0 % of all the SSRs, respectively.
Table 1

EST SSR mining and primer modeling summary

Mined information

Contigs

Singlets

Subtotal

Total number of sequences examined

12,618

34,238

46,856

Total size of examined sequences (bp)

11,623,354

21,101,163

32,724,517

Total number of identified SSRs

4,770

9,029

13,799

bi type

1,281

2,578

3,859

tri type

1,939

3,222

5,161

tetra type

884

1,871

2,755

penta type

334

781

1,115

hexa type

332

577

909

Number of sequences containing SSR

3,404

6,772

10,176

Number of sequences containing more than 1 SSR

931

1,578

2,509

Number of SSRs present in compound formation

643

1,225

1,868

Number of records created by Primer3

4,127

7,804

11,931

Number of sequences of primer modeling successful

3,695

6,849

10,544

Number of sequences of primer modeling failed

432

955

1,387

The numbers in bold font in each column are the sum of the numbers immediately below in italic

Genomic distribution based on primer alignment on the peach reference genome

A total of 21,088 queries from 10,544 forward and 10,544 reverse primer sequences (ESM Table 2) were aligned by pairwise comparison using the BLAST algorithm to the peach reference genome (International Peach Genome et al. 2013; Jung et al. 2014). There were no hits with 4,203 of the primer queries at the preset e value of 9e-03 (0.009), and 23,553 hits and 96,621 alignments found on the remaining 16,885 primer queries, suggesting that multiple hits and alignments on the peach scaffolds occurred with some of the primer sequences. Of the 96,621 alignments, 50,900 were 100 % full alignments of the primer sequences and 23,650 were partial alignments. A majority of primers had only one hit/alignment, and most of them were in pairs (Table 2). The most hits and alignments in the peach genome were with primer peDN676961_R (22 hits and 731 alignments); however, its forward primer had only one hit/alignment.
Table 2

Primer count based on hits/alignments and aligned peach genome scaffolds

Number of hits

Number of alignments

Total count

F

R

F R pairs

F alone

R alone

0

0 (no hits found)

4,203

2,003

2,200

763

1,240

1,437

1

1

14,248

7,192

7,056

5,500

1,692

1,556

1

2

547

276

271

86

190

185

1

>=3

195

104

91

13

91

78

2

2

658

354

304

52

302

252

2

>=3

286

145

141

3

142

138

>=3

>=3

951

470

481

11

459

470

Subtotal

 

21,088

10,544

10,544

6,428

4,116

4,116

Scaffolds

Scaffold length (bp)

      

 scaffold_1

46877626

3,625

1,816

1,809

1,490

326

319

 scaffold_2

26807724

1,562

796

766

614

182

152

 scaffold_3

22025550

1,558

773

785

602

171

183

 scaffold_4

30528727

2,107

1,069

1,038

801

268

237

 scaffold_5

18502877

1,610

807

803

622

185

181

 scaffold_6

28902582

2,143

1,076

1,067

863

213

204

 scaffold_7

22790193

1,794

917

877

702

215

175

 scaffold_8

21829753

2,057

1,058

999

786

272

213

 minor scaffolds

7834429

429

229

200

95

134

105

 Subtotal

226099461

16,885

8,541

8,344

6,575

1,966

1,769

F forward R reverse primers

Among the 16,885 aligned primer queries, the numbers on the 8 main peach scaffolds (in order of 1 to 8) was 3,625, 1,562, 1,558, 2,107, 1,610, 2,143, 1,794, and 2,057, respectively. Only 429 of the aligned primer queries were on the remaining 54 minor scaffolds (Table 2). The distribution of singly aligned EST-SSRs appeared uneven in the reference genome and varied on every 500-kb interval on each main scaffold (Fig. 1a), which could be determined by and was consistent with that of the genes in the genome. The distribution of all the aligned primers was very similar in pattern but had proportionally more primer counts at each interval (data not shown). The average number of ESR-SSR primers for all 500-kb intervals in the eight main scaffolds was 10.8. The scaffolds from 1 to 7, respectively, had 2, 3, 3, 4, 1, 3, and 5 500-kb regions containing a zero count for the 5,500 singly aligned primers (scaffold_8 had primers in every 500-kb interval); the scaffolds from 1 to 8 had 8, 15, 2, 8, 3, 5, 2, and 3 500-kb regions containing 1 to 3 EST-SSR primers. At least one region in each scaffold that contained several adjacent 500-kb intervals of zero and the low primer counts looked very distinct from all the others and likely was associated with the centromere of the chromosome.
Fig. 1

Distribution of 5,500 singly aligned (a) and 288 optimally selected (b) EST-SSR primers on the 8 main peach scaffolds. Each serial value (colored bar) represents the count of primers aligned within each 500-kb interval on the 8 main scaffolds. All the primer counts were based on the start alignment position of all forward primers. The data for minor scaffolds is omitted

Genomic features identified by primer alignment on the peach reference genome

According to the alignment positions of each forward and reverse primer, the GAS of each primer and the difference between the GAS and EAS were calculated and used to categorize all genomic amplicons into different subgroups (Fig. 2). The negative differences (the “<=−1” subgroup) implied that there were possible deletions in the allelic EST. The zeroes (“0”) indicated that there were no differences between the GAS and EAS. The positive differences might represent possible allelic insertions (the “1–20” subgroup) or introns (the “21–99,” “100–466,” and “467–999” subgroups, most in the “1,000–9,999” subgroup, and a few in the “>=10,000” subgroup) depending on the presumed intron cutoff value or might possibly result from paralogs of high sequence identities in gene families commonly spread in the genome or sequencing/assembly errors in ESTs or the reference genome particularly if the GAS were unreasonably big (some in the 1,000-9,999 subgroup and most in the >=10,000 subgroup). Therefore, the GAS of all aligned primers were clearly predicted and categorized by the alignment-derived information, which greatly facilitates any purposeful selection of primers with known distribution that have reliable amplification/detection properties. For example, the primers with undetectable sizes of intron-containing GASs or those classified in the error group could easily be excluded. For evaluation, 288 primers were selected in this study and were evenly distributed with a majority in the <−1, 1–20, 21–99, and 100–466 subgroups, which potentially maximizes the polymorphism rate (Fig. 2b). On average, there was slightly more than 1 primer per 1 Mb genome.
Fig. 2

Primer count based on differences between genome amplicon sizes (GAS) and expressed amplicon sizes (EAS). “No value” represents either or both forward and reverse primers when “no hits found” precludes an estimate of GAS or the difference between GAS and EAS. “<=−1” represents a calculated GAS < EAS, implying that possible allelic deletions might occur in these particular ESTs if compared to the alleles in the peach reference genome. “0” indicates no difference between GAS and EAS. The positive differences between GAS and EAS are tentatively categorized into subgroups, to show possible allelic insertions (the “1–20” subgroup) and introns of various sizes (the “21–99,” “100–466,” and “467–999” subgroups) depending on the cutoff intron size. With the last two subgroups, it was very likely that most differences in the “1,000–9,999”subgroup were caused by long introns, but most of those in the “>=10,000”subgroup by sequencing and/or assembly errors in the ESTs or the reference genome because most of these forward and reverse primers were aligned on different scaffolds. A difference of 466 bp was used as a cutoff value because the GAS of that primer was 500 bp, and also is the usual upper bound of readily detectable size range in some fluorescence-based capillary instruments

Genotyping evaluation of optimally selected EST-SSR markers

Among the 288 selected primers (ESM Table 3), only 2 (CX2H03 and CX3E08) failed in all 4 peach cultivars tested, and additional 4, 2, 6, and 4 failed in Chinese Cling, Blazeprince, Helen Borchers, and Heath Cling, respectively. The overall failure rate was <2.8 %. Thus, there was a 97.2 % success rate in the primer/sample amplification and detection based on the primer sequence analysis results. There were 1 to 8 different alleles (count) scored among the 4 peach genotypes by the 288 different primers (Fig. 3a); the average number of alleles scored was 3.84. The heterozygosity value of 110 primers (38.2 % of the 288 primers) was 0 (Fig. 3b), suggesting that these primers lacked heterozygosity among the four peach cultivars and were less usable in segregation-based studies such as linkage analysis. But, according to the distribution of their PIC values (Fig. 3c) and gene diversity value (Fig. 3d), most primers showed a high rate of allele polymorphism among the four peach cultivars; suggesting that they could be very informative and useful for variety authentication, pedigree determination, phylogenetic analysis, and other marker applications not requiring heterozygosity but polymorphism.
Fig. 3

Primer count (distribution) of the 286 primers based on the number of alleles scored (a), the heterozygosity value (b), the PIC value (c), and the gene diversity value (d) calculated by PowerMarker (Liu and Muse 2005) among 4 peach cultivars. Each x-axis value in c and d represents the upper bound of a range; for example, “0.2” is “<=0.2,” “0.3” is >0.2 and <=0.3, and so on. The two failed primers were not included

Discussion

Advantages of EST-SSR primer alignment on a reference genome

EST-SSR primers have been widely used in many applications (Chen et al. 2008; Miah et al. 2013). Recently, especially when a relatively large number of primers are needed, the primers have most often been randomly selected from mass EST-SSR primers developed through sequence mining (Thiel et al. 2003; Chen et al. 2006). Random selection of EST-SSRs results in primers and markers of unknown status, including the distribution in the genome, the amplification and detection performance, and the polymorphism rate. As a consequence, a high rate of failure in amplification and/or detection, and frequently a low polymorphism rate and/or a skewed genome distribution are observed (Chen et al. 2008). A recent sequence analysis of 340 EST-SSR primers on a reference genome revealed that the main causes of failed primers included cases where the forward and reverse primers were positioned too far from the site to form the expected amplicons due to sequencing/assembly errors in EST contigs or the reference genome; or when introns were too big to allow the genomic amplicon to be detectable (and/or reliably amplified); or when multiple full and partial primer alignments occur, most likely caused by paralogs; or when failed full alignment of contig-derived primer sequences contained discrepant nucleotides (Chen et al. unpublished data). By alignment of all the Prunus EST-SSR primers on the peach reference genome, the featured genomic information on these primers can be readily gained and used to categorize the primers into different subgroups to help guide the establishment of suitable criteria for choosing certain primers and avoiding those with undesirable traits. For example, knowing the positions of all primers in the genome allows informed selection of primers either in particular regions or across the entire genome. Those primers that fall within the error subgroup or with oversized, unreliable, or undetected GAS can be readily excluded, whereas if selection was to rely on random selection, the primers would almost certainly fail; the 97.2 % overall amplification/detection success rate in this study was proof of these advantages when EST-SSR primers are selected based on the alignment results on a reference genome. It is worth noting that most of the forward and reverse primers in the error subgroup were aligned on different scaffolds, prompting speculation that paralogous genes with high identities in the genome might be responsible for a large proportion of the error subgroup, compared with those due to true sequencing/assembly errors. Further investigation is needed to clarify this speculation.

Options to improve the EST-SSR heterozygosity rate

Allelic heterozygosity and polymorphism among genotypes of interest are essential for linkage mapping and other segregation-based studies and are also more informative for other marker applications including marker-assisted selection (MAS), cultivar authentication, pedigree determination, phylogenetic analysis, and association genetics (Terwilliger et al. 1992; Ott and Rabinowitz 1997; Liu and Muse 2005). Although the primer failure rate was minimized by the approach used in this study, the homozygosity rate remained relatively high based on the H values obtained. The next goal must be to minimize the markers lacking heterozygosity and/or polymorphism among genotypes of interest. Some efforts have been taken to mine polymorphic EST-SSRs among highly redundant ESTs (Kong et al. 2012; Kayesh et al. 2013; Mohanty et al. 2013), which at least minimized selection of nonpolymorphic primers and likely increased the heterozygosity rate. One genotyping comparison indicated that the tri-, penta-, and compound types of EST-SSRs appeared to have much higher heterozygosity rates compared with the other EST-SSRs with even-numbered repeat units (Chen et al. 2008). Further comparison of heterozygosity and polymorphism rates among these EST-SSR markers with presumed deletions/insertions, introns, and other detectable subgroups might provide a new means to increase these rates. Specifically, genomic variation in introns that have little impact on gene structure and function may be particularly valuable for detection of more polymorphisms in intron-containing amplicons and for subsequent genetic analysis as well (Muthamilarasan et al. 2013).

Notes

Acknowledgments

The authors thank Bryan Blackburn, Luke Quick, and Minling Zhang for their technical assistance. The research is partially supported by the USDA National Program of Plant Genetic Resources, Genomics and Genetic Improvement (Project number 6606-21000-004-006) and an USDA National Institute of Food and Agriculture Specialty Crop Research Initiative project (2009-51181- 06036).

Data archiving statement

All Prunus EST sequences and accession numbers are available at the National Center for Biotechnology Information EST database (http://www.ncbi.nlm.nih.gov/nucest/?term=Prunus). The peach (Prunus persica) reference genome assembly (version 1.0) is available at the Genome Database for Rosaceae (http://www.rosaceae.org/species/Prunus_persica/genome_v1.0), so is the mined Prunus EST-SSR primer information (http://www.rosaceae.org/node/336118). The 10545 EST-SSR forward and reverse primers and the selected 288 primers are attached as ESM Tables 2 and 3, respectively.

Supplementary material

11295_2014_759_MOESM1_ESM.docx (14 kb)
ESM 1(DOCX 14 kb)
11295_2014_759_MOESM2_ESM.xlsx (1.3 mb)
ESM 2(XLSX 1316 kb)
11295_2014_759_MOESM3_ESM.docx (33 kb)
ESM 3(DOCX 33 kb)

References

  1. Agarwal M, Shrivastava N, Padh H (2008) Advances in molecular marker techniques and their applications in plant sciences. Plant Cell Rep 27:617–631PubMedCrossRefGoogle Scholar
  2. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefPubMedCentralGoogle Scholar
  3. Blenda AV, Wechter WP, Reighard GL, Baird WV, Abbott AG (2006) Development and characterisation of diagnostic AFLP markers in Prunus persica for its response to peach tree short life syndrome. J Hortic Sci Biotechnol 81:281–288Google Scholar
  4. Blenda AV, Verde I, Georgi LL, Reighard GL, Forrest SD, Munoz-Torres M, Baird WV, Abbott AG (2007) Construction of a genetic linkage map and identification of molecular markers in peach rootstocks for response to peach tree short life syndrome. Tree Genet Genomes 3:341–350CrossRefGoogle Scholar
  5. Bliss FA, Arulsekar S, Foolad MR, Becerra V, Gillen AM, Warburton ML, Dandekar AM, Kocsisne GM, Mydin KK (2002) An expanded genetic linkage map of Prunus based on an interspecific cross between almond and peach. Genome 45:520–529PubMedCrossRefGoogle Scholar
  6. Chen X, Sullivan PF (2003) Single nucleotide polymorphism genotyping: biochemistry, protocol, cost and throughput. Pharmacogenomics J 3:77–96Google Scholar
  7. Chen C, Gmitter FG Jr (2013) Mining of haplotype-based expressed sequence tag single nucleotide polymorphisms in citrus. BMC Genomics 14:746Google Scholar
  8. Chen C, Zhou P, Choi YA, Huang S, Gmitter FG (2006) Mining and characterizing microsatellites from citrus ESTs. Theor Appl Genet 112:1248–1257PubMedCrossRefGoogle Scholar
  9. Chen C, Bowman KD, Choi YA, Dang PM, Rao MN, Huang S, Soneji JR, McCollum TG, Gmitter FG (2008) EST-SSR genetic maps for Citrus sinensis and Poncirus trifoliata. Tree Genet Genome 4:1–10CrossRefGoogle Scholar
  10. Chen C, Bock CH, Beckman TG (2014) Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism. Mol Genet Genomics. doi:10.1007/s00438-014-0875-8
  11. Cipriani G, Lot G, Huang WG, Marrazzo MT, Peterlunger E, Testolin R (1999) AC/GT and AG/CT microsatellite repeats in peach [Prunus persica (L) Batsch]: isolation, characterisation and cross-species amplification in Prunus. Theor Appl Genet 99:65–72CrossRefGoogle Scholar
  12. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bul 19:11–15Google Scholar
  13. Horner DS, Pavesi G, Castrignano T, De Meo PD, Liuni S, Sammeth M, Picardi E, Pesole G (2010) Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief Bioinform 11:181–197PubMedCrossRefGoogle Scholar
  14. Howad W, Yamamoto T, Dirlewanger E, Testolin R, Cosson P, Cipriani G, Monforte AJ, Georgi L, Abbott AG, Arus P (2005) Mapping with a few plants: using selective mapping for microsatellite saturation of the Prunus reference map. Genetics 171:1305–1309PubMedCrossRefPubMedCentralGoogle Scholar
  15. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res 9:868–877PubMedCrossRefPubMedCentralGoogle Scholar
  16. International Peach Genome I, Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F, Zuccolo A, Rossini L, Jenkins J, Vendramin E, Meisel LA, Decroocq V, Sosinski B, Prochnik S, Mitros T, Policriti A, Cipriani G, Dondini L, Ficklin S, Goodstein DM, Xuan P, Del Fabbro C, Aramini V, Copetti D, Gonzalez S, Horner DS, Falchi R, Lucas S, Mica E, Maldonado J, Lazzari B, Bielenberg D, Pirona R, Miculan M, Barakat A, Testolin R, Stella A, Tartarini S, Tonutti P, Arus P, Orellana A, Wells C, Main D, Vizzotto G, Silva H, Salamini F, Schmutz J, Morgante M, Rokhsar DS (2013) The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45:487–494PubMedCrossRefGoogle Scholar
  17. Jung S, Jesudurai C, Staton M, Du Z, Ficklin S, Cho I, Abbott A, Tomkins J, Main D (2004) GDR (Genome Database for Rosaceae): integrated web resources for Rosaceae genomics and genetics research. BMC Bioinformatics 5:130PubMedCrossRefPubMedCentralGoogle Scholar
  18. Jung S, Staton M, Lee T, Blenda A, Svancara R, Abbott A, Main D (2008) GDR (Genome Database for Rosaceae): integrated web-database for Rosaceae genomics and genetics data. Nucleic Acids Res 36:D1034–D1040.Google Scholar
  19. Jung S, Ficklin SP, Lee T, Cheng CH, Blenda A, Zheng P, Yu J, Bombarely A, Cho I, Ru S, Evans K, Peace C, Abbott AG, Mueller LA, Olmstead MA, Main D (2014) The Genome Database for Rosaceae (GDR): year 10 update. Nucleic Acids Res 42:D1237–1244PubMedCrossRefPubMedCentralGoogle Scholar
  20. Kayesh E, Zhang YY, Liu GS, Bilkish N, Sun X, Leng XP, Fang JG (2013) Development of highly polymorphic EST-SSR markers and segregation in F(1) hybrid population of Vitis vinifera L. Genet Mol Res 12:3871–3878PubMedCrossRefGoogle Scholar
  21. Kong Q, Zhang G, Chen W, Zhang Z, Zou X (2012) Identification and development of polymorphic EST-SSR markers by sequence alignment in pepper, Capsicum annuum (Solanaceae). Am J Bot 99:e59–61PubMedCrossRefGoogle Scholar
  22. Lambert P, Hagen LS, Arus P, Audergon JM (2004) Genetic linkage maps of two apricot cultivars ( Prunus armeniaca L.) compared with the almond Texas x peach Earlygold reference map for Prunus. Theor Appl Genet 108:1120–1130PubMedCrossRefGoogle Scholar
  23. Lambert P, Pascal T (2011) Mapping Rm2 gene conferring resistance to the green peach aphid (Myzus persicae Sulzer) in the peach cultivar "Rubira (R)". Tree Genet Genome 7:1057–1068CrossRefGoogle Scholar
  24. Liu K, Muse SV (2005) PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21:2128–2129PubMedCrossRefGoogle Scholar
  25. McCarthy S (1993) USDA's Plant Genome Research Program. Bull Med Libr Assoc 81:278–281PubMedPubMedCentralGoogle Scholar
  26. Miah G, Rafii MY, Ismail MR, Puteh AB, Rahim HA, Islam Kh N, Latif MA (2013) A review of microsatellite markers and their applications in rice breeding programs to improve blast disease resistance. Int J Mol Sci 14:22499–22528PubMedCrossRefPubMedCentralGoogle Scholar
  27. Mohanty P, Sahoo L, Parida K, Das P (2013) Development of polymorphic EST-SSR markers in Macrobrachium rosenbergii by data mining. Conserv Genet Resour 5:133–136CrossRefGoogle Scholar
  28. Morgante M, Hanafey M, Powell W (2002) Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet 30:194–200PubMedCrossRefGoogle Scholar
  29. Muthamilarasan M, Venkata Suresh B, Pandey G, Kumari K, Parida SK, Prasad M (2013) Development of 5123 Intron-length polymorphic markers for large-scale genotyping applications in foxtail millet. DNA Res 21:41–52Google Scholar
  30. Oetting WS, Lee HK, Flanders DJ, Wiesner GL, Sellers TA, King RA (1995) Linkage analysis with multiplexed short tandem repeat polymorphisms using infrared fluorescence and M13 tailed primers. Genomics 30:450–458PubMedCrossRefGoogle Scholar
  31. Ogundiwin EA, Peace CP, Gradziel TM, Parfitt DE, Bliss FA, Crisosto CH (2009) A fruit quality gene map of Prunus. BMC Genomics 10:587PubMedCrossRefPubMedCentralGoogle Scholar
  32. Okie WR (1998) Handbook of peach and nectarine varieties: performance in the Southeastern United States and Index of Names. The National Technical Information Service, Springfield, VAGoogle Scholar
  33. Ott J, Rabinowitz D (1997) The effect of marker heterozygosity on the power to detect linkage disequilibrium. Genetics 147:927–930PubMedPubMedCentralGoogle Scholar
  34. Pettersson A, Winer ES, Wekslerzangen S, Lernmark A, Jacob HJ (1995) Predictability of heterozygosity scores and polymorphism information-content values for rat genetic-markers. Mamm Genome 6:512–520PubMedCrossRefGoogle Scholar
  35. Rozen S, Skaletsky H (2000) Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol (Clifton, NJ) 132:365–386Google Scholar
  36. Tang J, Vosman B, Voorrips RE, van der Linden CG, Leunissen JA (2006) QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinformatics 7:438PubMedCrossRefPubMedCentralGoogle Scholar
  37. Terwilliger JD, Ding YL, Ott J (1992) On the relative importance of marker heterozygosity and intermarker distance in gene-mapping. Genomics 13:951–956PubMedCrossRefGoogle Scholar
  38. Thiel T, Michalek W, Varshney RK, Graner A (2003) Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet 106:411–422PubMedGoogle Scholar
  39. Verde I, Bassil N, Scalabrin S, Gilmore B, Lawley CT, Gasic K, Micheletti D, Rosyara UR, Cattonaro F, Vendramin E, Main D, Aramini V, Blas AL, Mockler TC, Bryant DW, Wilhelm L, Troggio M, Sosinski B, Aranzana MJ, Arus P, Iezzoni A, Morgante M, Peace C (2012) Development and evaluation of a 9K SNP array for peach by internationally coordinated SNP detection and validation in breeding germplasm. PLoS One 7:e35668PubMedCrossRefPubMedCentralGoogle Scholar
  40. Warburton ML, Becerra-Velasquez VL, Goffreda JC, Bliss FA (1996) Utility of RAPD markers in identifying genetic linkages to genes of economic interest in peach. Theor Appl Genet 93:920–925PubMedCrossRefGoogle Scholar
  41. Wendel JF, Cronn RC, Alvarez I, Liu B, Small RL, Senchina DS (2002) Intron size and genome size in plants. Mol Biol Evol 19:2346–2352PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Chunxian Chen
    • 1
  • Clive H. Bock
    • 1
  • William R. Okie
    • 1
  • Fred G. GmitterJr.
    • 2
  • Sook Jung
    • 3
  • Dorrie Main
    • 3
  • Tom G. Beckman
    • 1
  • Bruce W. Wood
    • 1
  1. 1.USDA, ARS, SEFTNRLByronUSA
  2. 2.Citrus Research and Education CenterUniversity of FloridaLake AlfredUSA
  3. 3.Department of HorticultureWashington State UniversityPullmanUSA

Personalised recommendations