Introduction

Simple sequence repeats (SSRs, or microsatellites) have been used to great advantage in potato for studies of diversity, genetic structure, and classification (Spooner et al. 2007); tracing germplasm migrations (Spooner et al. 2005a; Rios et al. 2007); fingerprinting (Moisan-Thiery et al. 2005; Provan et al. 1996; Schneider and Douches 1997); genetic linkage mapping (Ghislain et al. 2001; Feingold et al. 2005); establishment of core collections (Ghislain et al. 2006); and investigations of duplicate collections across genebanks (Del Rio et al. 2006). Although not yet used in potato they have potential applications in studies of linkage disequilibrium (Remington et al. 2001; Stich et al. 2005) and gene flow (Devaux et al. 2005; Fenart et al. 2007). They require considerable developmental costs and often have maximum utility within a narrow range of germplasm from which they were developed. Once developed, however, they have tremendous advantages over many other marker classes to include low operational costs, codominance, hypervariablity, high quality bands, highly reproducible bands, amenability to automation, ease of multiplexing, and use with low quality DNA (Spooner et al. 2005b).

Over 200 potato SSRs have been identified through enriched genomic libraries and database searches of expressed sequence tags (ESTs) (Milbourne et al. 1998; Ashkenazi et al. 2001; Ghislain et al. 2004; Feingold et al. 2005). However, many more are becoming available as ESTs are being identified. The latest SSR summary statistics from the former The Institute for Genomic Research (TIGR) document more than 5,800 sequences with potentially useful SSR (repeats of 2–6 nucleotides) markers for potato. These SSRs differ greatly, however, in quality (clarity and repeatability of bands), map location, and polymorphism. Some of them have been tested on potato landraces and advanced varieties mapped on various potato genetic maps. However, an extensive analysis on a large collection of potato SSRs was lacking. Ghislain et al. (2004) provided the first such analysis of 156 SSRs for quality and polymorphism, chose 22 of them by a combination of the above criteria in cultivated potato, showed how some of these could be multiplexed, and mapped them.

The purpose of the present study is to screen additional potato SSRs from all taxonomic groups of potato to refine a selection of microsatellites for maximum utility in a cultivated potato background. Such large data set is available from a previous study aiming at classifying cultivated potato (Spooner et al. 2007). Its wide genetic diversity makes this data set particularly valuable for our purpose.

Materials and methods

Plant materials and DNA extraction

Seven hundred and forty-two native (landrace) potatoes belonging to a composite genotyping set of potato at the International Potato Center (CIP) were used for this study. These landraces were selected to represent all four species and taxonomic groups of potato as described above, and represent the same used in the taxonomic study of Spooner et al. (2007) and are described in the supporting dataset 1. Genomic DNA was obtained using standard protocol derived from Doyle and Doyle (1990). DNA concentration was estimated by using a TBS-380 Fluorometer (Turner BioSystems, USA) with PicoGreen® reagent and 500 ng/ml salmon DNA as reference.

Microsatellite markers and PCR conditions

Eighty-eight SSR markers were obtained from four sources: (1) 22 belong to the previously identified potato genetic identity (PGI) kit (Ghislain et al. 2004), (2) 13 from ESTs developed at the Scottish Crop Research Institute (Milbourne et al. 1998), (3) 30 identified at CIP using the potato EST database of the former The Institute for Genomic Research (http://www.tigr.org/), and (4) 23 from the University of Idaho (Feingold et al. 2005).

PCR reactions were performed in a 10 μl volume containing 100 mM Tris–HCl (Sigma), 20 mM (NH4)2SO4 (Merck), 2.5 mM MgCl2 (Merck), 0.2 mM of each dNTP (Amersham), 25 pM of 700 or 800 IRDye-labeled M13 forward primer (LI-COR), 22 pM M13-tailed forward SSR primer (Invitrogen), 15 pM reverse SSR primer (Invitrogen), 1 unit of Taq polymerase, and 15 ng of genomic DNA. PCR was carried out in a PTC-100 or PTC-200 thermocycler (MJ Research Inc.) using the following cycling profiles: 4 min at 94°C; 33 cycles of 1 min at 94°C, 1 min at annealing temperature (T a) determined experimentally for each SSR primer combination and 1 min at 72°C; with a final extension step of 4 min at 72°C. Blue Stop solution (#830-05630, LI-COR, USA) in a ratio of 1:1 was added to the PCR reaction before loading. PCR products were separated by electrophoresis on a 4300 LI-COR DNA Analyzer system. We sized alleles with the IRDye 50–350 bp fragment size ladder (LI-COR, USA). SSR alleles were detected and scored using the SAGA Generation 2 software (LI-COR, USA).

Mapping new SSR markers

Previously mapped and new SSR markers were mapped on at least one of three segregating diploid populations from which genetic maps were developed: the PD population (Ghislain et al. 2001), BCT population (Bonierbale et al. 1988) or the PCC1 population (Villamón et al. 2005). A total of 148 SSR markers were used in the present map effort and are provided in the supporting dataset 2. The segregation data of 27 SSR marker alleles located on the BCT genetic map was provided to us by the research group (Feingold et al. 2005). Marker alleles segregated as 1:1 ratio whereas skewed markers were rejected using the threshold value established for each genetic map (goodness of fit χ2 test). Null alleles were not considered. Linkage analysis of marker alleles segregating from the respective source parent was performed using JoinMap 3.0 (Stam 1993) with a LOD score of 3.

Polymorphic information content and matrix comparison

SSR marker alleles were scored for presence or absence of the band for all 742 genotypes and treated as dominant marker. The polymorphic information content (PIC) was calculated as PIC = 1−∑(p 2 i ), where p i is the frequency of the ith allele detected in all individuals of the population (Nei 1973). In addition, the ability of a refined set of SSRs chosen here to discriminate a large dataset were compared to a neighbor-joining analysis of Spooner et al. (2007), who analyzed 742 accessions with 50 SSRs by neighbor joining in DARwin software 4.0 (http://darwin.cirad.fr/darwin/Home.php), to which we added one SSR (STM0019). For this analysis, similarity matrixes were calculated using Jaccard’s coefficient and the comparison of the similarity matrixes were performed using the Mantel matrix-correspondence test in the MXCOMP option of the NTSYS 2.02h software (Sokal and Rohlf 1995). Correlations were conducted of the three main branches of this tree (the “bitter potato” (S. ajanhuiri, S. curtilobum, and S. juzepczukii) cluster, the diploid cluster, and the polyploid cluster), using correlation statistics in Microsoft Excel: ρ X,Y  = cov (X,Y)/σXσY, where ρ X,Y is the correlation coefficient (r), cov (X,Y) is the co-variance of X and Y, σ X is the standard deviation of X, and σ y is the standard deviation of Y.

Construction of new potato SSR fragment size ladders

We initially used a pUC18 sequencing reaction or a IRDye-50–350 size standard in our LI-COR DNA Analyzer System as a fragment size ladder. To make our new kit easily applicable to the cultivated potato germplasm base across all platforms, we constructed new size ladders for each of the 24 primer pairs. We examined allele sizes from our database and selected genotypes displaying a range of sizes based on the following three criteria: (1) good separation among the alleles (>3 bp), (2) choice of allele/genotype combinations highlighting the high-frequency alleles encountered in our screening studies, and (3) the presence of the minimum and maximum size of the range of alleles, when possible. Genomic DNA was obtained from the DNA bank of CIP. Amplification products were obtained using standard protocol for SSR markers from CIP (Ghislain et al. 2004). Optimization of PCR conditions were conducted through temperature gradient PCR experiment for optimal annealing temperature, and appropriate number of amplification cycles to obtain good gel resolution of the bands. Electrophoreses to separate amplified products were performed using denaturant 6% poly-acrylamide gels and a silver stain protocol to reveal the bands (Ghislain et al. 2004).

Results

Genetic mapping

The 30 new candidate SSR markers from the TIGR database were surveyed for polymorphism in the PD, BCT and PPC1 mapping populations (details are included in the supporting dataset 2). Two markers were monomorphic in all populations and three markers displayed a skewed segregation from the expected 1:1 ratio. The remaining 25 SSR markers could be mapped in one or two of the three populations using a LOD score of 3. The use of three segregating populations allowed us to identify 33 new map locations of 29 SSR markers not previously mapped. An integrated map was built with the three maps using a map integration function based on mean recombination frequencies and combined LOD scores of the selected sets of loci from each chromosome (Fig. 1). Four markers (STI0012, STM0019, STM0037, STM1053) of the PGI kit were monomorphic in all three segregating population tested and hence were included graphically based on published maps. Only six out of 157 map locations (STG0023, STG0027a, STM0038, STM2022, STM3009, STM51145) produced conflicting map locations on the integrated map and hence were not included. This map represents the most complete SSR potato map developed to date with 138 mapped potato SSR markers at 147 map locations (excluding the four placed graphically and the six conflicting).

Fig. 1
figure 1

Potato SSR genetic map including the 24 SSR markers of the new PGI kit (bold) on an integrated potato genetic map developed using framework RFLP and SSR genetic maps

Polymorphic information content and matrix comparisons

We analyzed 742 potato landraces of all four cultivated potato species: S. tuberosum Group Andigenum and Group Chilotanum, S. ajanhuiri, S. curtilobum, and S. juzepczukii with 56 SSR markers; 22 from the prior PGI kit (Ghislain et al. 2004) and the 34 most useful of the remaining 66 SSR markers based on marker quality as observed visually on gels. Out of these, five SSR markers appeared to be multi loci based on exceeding allele number considering the ploidy of the plant sample and these were not considered further.

Data obtained with the remaining 51 SSR markers on the 742 potato landraces were analyzed for polymorphic information content (PIC). Considering the cultivated potato a single gene pool (Spooner et al. 2007), markers were scored across all cultivar groups with different ploidies. PIC values per SSR marker ranged from 0.250 to 0.884 while the number of alleles per locus ranged from 2 to 21 (Table 1).

Table 1 Descriptions of the 51 SSR markers and the selected 24 of the new PGI kit by their respective name, source, GenBank accession number, repeat motifs, forward and reverse primer sequences, annealing temperature, map location, allele size and number, and polymorphic information content (PIC) in 742 landraces (Spooner et al. 2007)

The discriminatory capacity of the SSR markers was analyzed by comparing a similarity matrix generated on 742 genotypes analyzed with 8, 16, 24, 32, 42 and 51 SSR markers ranked by their PIC value (Fig. 2). The results indicated that the 24 SSR markers with the highest PIC values provided nearly identical similarity matrices with one generated with the 32 highest PIC values (r = 0.97). In total, 93.5% of the genotypes can be discriminated using the selected 24 SSR markers (Table 1). However, comparison of placement of accessions within the three main branches of the 742 accession neighbor-joining tree of Spooner et al. (2007) were correlated at r = 0.99, suggesting that the 6.5% of the accessions not absolutely discriminated using the 24 markers are all so similar as to have little effect on major groups discovered in phenetic or phylogenetic analyses. These results lead us to propose 24 (Table 1) as an appropriate number of SSR markers for a new PGI kit.

Fig. 2
figure 2

Validation of the 24 SSR markers selected by discrimination analysis using genotyping data of 742 landraces: a Comparison of similarity matrixes generated with 8, 16, 24, 32, 42 and 51 SSR markers by r-values (▲), and by percentage of discrimination (■). b Representation of the comparison of similarity matrixes generated by 24 SSR [Y label] markers of the new PGI kit and the 51 SSR markers [X label] and the corresponding correlation coefficient r

Selection of the new PGI kit

The most informative 24 SSR markers for genotyping potato landraces were selected based on quality criteria, genome coverage, and locus-specific information content. We selected two SSR markers per chromosome with a linkage distance at least 10 cM except for chromosome VII, where markers STM0031 and STI0033 were separated by only 3 cM due to lack of alternative markers with a high PIC value (Fig. 1).

New potato SSR fragment size ladders

To construct a fragment size ladder for each SSR marker, we chose the alleles that were: (1) high-frequency, (2) covered the range of allele sizes, (3) well-separated alleles while avoiding those giving overlapping bands due to stuttering, and (4) displayed by a minimum number of landraces. These provide effective size ladders for easy extrapolation of alleles not part of the size kit. Annealing temperatures had to be verified and for few cases adjusted. Final DNA concentrations of the selected genotype range between 3 and 12 ng/μl to be mixed to produce the ladder (Table 2). We succeed in identifying a maximum of four (e.g., STM0019) genotypes to have good coverage of allele sizes. The number of alleles for each of the 24 SSR fragment size standard ranged from three (STM5121) to nine (STI0012) with an average number of 5.5 (Table 2). Overall, the 24 SSR fragment size standards produce 137 reference alleles, representing 44.7% of the total of 306 alleles found in 742 accessions by the 24 SSR markers of the new PGI kit. Allele sizes included in the size standards ranged from 83 to 322 bp providing an easy and convenient tool for identification of allele sizes (Fig. 3).

Table 2 SSR fragment size standard for each SSR marker of the PGI kit
Fig. 3
figure 3

SSR fragment size standard for the SSR marker STM5127 displaying eight well-defined and spaced alleles using a mix of only 2-genotypes

Discussion

The new PGI kit is composed of 24 SSR markers from over 200 we screened. It provides high-quality, high polymorphism alleles with two markers from each of the 12 linkage groups of potato separated by at least 10 cM, except for chromosome VII with two SSRs separated by only 3 cM. It discriminates representative germplasm samples from all potato cultivar groups with high accuracy. Nine SSRs are from the previous PGI kit (Ghislain et al. 2004), three from ESTs developed at SCRI, four from TIGR, and eight from the University of Idaho. A composite reference DNA sample can conveniently be used to provide accurate sizing of all alleles for these SSR markers across laboratories and platforms.

The PGI kit can be used for potato germplasm characterization for a variety of purposes from identity verification (fingerprinting), to studies of genetic diversity, anchoring genetic linkage maps, establishment of core collections, and gene flow. The 24 composite reference samples of DNA for allelic size determinations will stimulate and foster collaborations worldwide on the use of SSRs for these applications. For example, we used the new PGI kit to identify potential duplicate landraces between the CIP and PROINPA Bolivian potato germplasm collections (data not shown). Some landraces belong to the same morphologically selected cluster were not grouped into the same molecular cluster especially for landraces of very diverse germplasm sets such as the S. tuberosum Andigenum Group. There was a total correspondence with the less diverse S. tuberosum Chilotanum Group. In another application, the new PGI kit has been using to genotype breeding lines and advanced cultivars of potato, and most of the breeding material groups into a well-defined cluster with landraces of the Chilotanum group (data not shown). Such grouping is expected because the germplasm of the Chilotanum group has been used extensively in potato breeding worldwide.

In summary, these highly characterized new SSR markers have tremendous utility for a variety of applications and can stimulate standardization and international collaborations within the cultivated potato gene pool. The PGI kit including primers and fragment size standard are available upon request. A SSR database of the cultivated potato is available on line from the bioinformatics portal of the Generation Challenge Program web site (www.generationcp.org) and of CIP (http://research.cip.cgiar.org/confluence/display/IPD/SSR+Marker). The latter provides a full description of each SSR markers, amplification and detection conditions, and the genotyping data of all potato landraces available to date. It is expected that with increased uses the SSR database will be integrated with the germplasm database of the CGIAR centers.