Background

Peanut (Arachis hypogaea L.) is an oil crop of great importance in the tropics: in Africa, its production is comparable to all other grain legumes put together, and in Asia it provides about the same number of calories as soya (FAO, 2009). It has a narrow genetic base due to its recent origin event of tetraploidization [1, 2], and this has hindered the application of molecular breeding in this crop.

Microsatellites or simple sequence repeats (SSRs) are useful molecular markers, are abundant, highly dispersed through the genomes of eukaryotes, and locus specific. In addition they are the ideal markers for genotyping allotetraploid species, such as peanut, since they are usually co-dominant and multi-allelic. They are considered suitable as tools for genetic diversity studies, genetic linkage mapping, and for use in plant breeding programs [3].

Over the past years several research groups have put considerable effort into developing SSR markers for the genus Arachis in general and cultivated peanut in particular. Now about 5,000 SSR markers have been published [421]. These markers have been mainly used for diversity studies of germplasm, and for genetic mapping ([10, 11, 2228]. However, in spite of the number of markers available, the very low polymorphism observed within cultivated germplasm requires large-scale marker screening for the identification of sufficient polymorphic markers even for low density genetic maps in populations derived from cultivated × cultivated crosses. For example, in spite of extensive marker screening, the published SSR-based maps of cultivated peanut have only 131, 135 and 175 SSR markers [22, 23, 28]. In a previous study, we observed that AG/TC microsatellites were more polymorphic than AC/TG ones and that for cultivated germplasm, the highest polymorphism was observed for microsatellites with 21-25 motif repetitions [10]. In this context, we isolated and characterized long repeat AG/TC SSRs as an effort to develop markers with high polymorphism levels for cultivated peanut [10].

Findings

Sequencing

Sequences were obtained for 1401 cloned genomic fragments. Most fragments were sequenced in both forward and reverse orientations. Of these 1401 cloned fragments, 65 harbored sequences very similar to already published markers and so were excluded from further analysis (≥ 50% of sequence with BLAST detected similarity with E-value ≤ E-40). Of the remaining sequences, 193 harbored microsatellite repeats. As expected, most were TC/AG repeats. The 143 unique SSR sequences were deposited in GenBank (accession numbers JN887491 to JN887636).

Design of flanking primer pairs

Of the 193 selected sequences, 135 were appropriate for primer design. Some sequences contained multiple microsatellite repeats that could not be flanked by a single primer pair. Therefore, in total 146 primer pairs were designed. The microsatellites amplified were generally long, the average number of motif repeats being 23.

Polymorphism levels

All 146 primer pairs amplified PCR products of the expected size. Of these, 85 were polymorphic within the tetraploid samples (including cultivated peanut, a synthetic allotetraploid and an accession of the tetraploid wild species, A. monticola (Table 1), and 78 were polymorphic within cultivated germplasm (Table 2).

Table 1 Arachis genotypes included in this study.
Table 2 List of the 78 polymorphic markers.

The average number of alleles amplified per locus was 5.5, values of Gene Diversity (GD) were between 0.080 and 0.885, with an average of 0.614. Sixty-six markers were highly polymorphic with a GD of more than or equal to 0.5.

Within cultivated peanut, markers with 21-25 motif repetitions were the most polymorphic (69%), followed by markers that amplified more than 30 motif repetitions (60%, most of the markers being composite or imperfect) (Figure 1). The lowest polymorphism was observed with short microsatellites, between 6-10 motif repetitions.

Figure 1
figure 1

Frequency of markers detected per repeat size class. Frequency of markers developed in this study, polymorphic (dark grey), and monomorphic (light grey) per motif repeat number class. The percentage of polymorphic and monomorphic markers in each class is indicated on each bar of the graph.

Genic content

Thirty-six of the 135 marker sequences encoded putative proteins that had significant BLAST similarities to known predicted proteins of Arabidopsis and/or legumes (E-value < 1e-07, (Additional File 1: Table S1). Of the highly polymorphic markers (GD ≥ 0.5), 23% showed a significant BLAST similarity. This compares to 35% of the markers with GD < 0.5 that do not show significant BLAST similarity.

Genetic relationships

Genetic similarities were estimated by the band-sharing coefficient [29] in pairwise comparisons of the 24 genotypes (Table 1), using 78 microsatellite loci. Genetic similarity values ranged from 0.42-0.77, considering the 22 A. hypogaea genotypes used. Therefore all the genotypes were differentiated. A dendrogram based on UPGMA was constructed for the 24 genotypes (Figure 2). Cluster analysis showed two main groups according to the subspecies. Within these groups, genotypes of the same botanical varieties tended to group together.

Figure 2
figure 2

Dendrogram based on the band-sharing distances of 24 Arachis genotypes, generated by UPGMA. The letters, after each A. hypogaea accession number, refer to the subspecies, varieties, and type: FF-fastigiata/fastigiata; FV-fastigiata/vulgaris; FA-fastigiata/aequatoriana; FP-fastigiata/peruviana; HH-hypogaea/hypogaea; HHi-hypogaea/hirsuta; and HX-hypogaea Xingu type; an accession of A. monticola and a synthetic allotetraploid plant (K30076 × V14167)4x. Note, all accessions were distinguished, the highest point represented on the scale of similarity coefficient being 0.77.

Discussion

In spite of the considerable effort made by several research groups to develop molecular markers for cultivated peanut, the number of polymorphic markers available for this important crop is still limiting. One of the main challenges in the construction of linkage maps using populations derived from cultivated × cultivated crosses is the need to screen thousands of markers to obtain sufficient markers even for the construction of low density maps.

In this study we focused on the class of microsatellites that was shown to be the most highly polymorphic for cultivated peanut in a previous study, long TC repeats [10] For this, sequences were obtained from an enriched genomic library. For processing the sequences, the Staden software was used together with a module for the detection of microsatellites. Starting from a relatively large dataset of unassembled sequences, it was possible to quickly eliminate sequences that were similar to previously described markers, and assemble a compact database of microsatellite containing sequences. Using a naming convention of plasmid clones, it was possible to correctly assemble microsatellite-containing reads even when the only overlap between forward and reverse sequences were microsatellite repeats. This was particularly important for obtaining complete sequences when the repeats were long. For design of primer pairs, the program used took into account the quality values of consensus bases. This was reflected in the 100% success rate of amplification of the primer pairs.

Markers with 21-25 motif repetitions were the most polymorphic, while markers with shorter repeats tended to be less polymorphic. This general tendency agrees with previous studies and reinforces the view that long (21-25 motif repetitions) or composite TC microsatellites are probably the most polymorphic marker class for cultivated peanut. A slightly higher proportion of markers that were not polymorphic or less informative (GD < 0.5) showed significant similarities to protein encoding regions, probably reflecting a tendency for non-coding regions to be more polymorphic than coding regions. Overall 78 of the markers were polymorphic for the cultivated accessions and 66 of these had GD value of 0.5 or above.

Cluster analysis showed two main groups separating the two subspecies of A. hypogaea. Some tendency of grouping of genotypes according to their botanical varieties was also evident. The main exceptions were three accessions, Mf2517, Mf2352, and Mf2534, which clustered with no apparent reason. The upper group contained the five hypogaea/hypogaea genotypes and two of the three hypogaea/hirsuta genotypes. Arachis monticola and the two genotypes collected in the Xingu Indigenous Park also clustered in this group. The Xingu material has some morphological traits, especially in the pods, exceeding the previously variation described in cultivated peanut [30], but it seems to be closely related to hypogaea/hypogaea and hypogaea/hirsuta varieties. Our results also showed the great genetic similarity of the varieties fastigiata and vulgaris, which formed a subgroup, and peruviana and aequatoriana, which formed a separate subgroup. Some studies have shown that genotypes of the varieties peruviana and aequatoriana were more closely related to genotypes of the subspecies hypogaea than to the other two varieties (fastigiata and vulgaris) of subspecies fastigiata [8, 17, 31, 32]. Our results, in contrast, corroborated the current taxonomical classification, despite the small number of genotypes included.

Conclusion

In this study 146 new microsatellite markers were developed for Arachis. All of these markers are new and useful tools for genetics and genomics in Arachis, but in particular the set of 66 markers highly polymorphic for cultivated peanut are a significant step towards routine molecular breeding in this important crop.

Methods

Plant material and DNA extraction

For construction of an SSR-enriched genomic DNA library, the peanut genotype A. hypogaea subsp. fastigiata var. fastigiata cv. IAC-Tatu was used. For marker validation and genetic relationship analysis, the following panel was used: a set of 22 A. hypogaea genotypes representing all six botanical varieties, a synthetic allotetraploid (derived from a cross between A. ipaënsis and A. duranensis) and an accession of the tetraploid wild species, A. monticola (Table 1). Marker polymorphism was also assessed in parents of four mapping populations: A. duranensis K7988 × A.stenosperma V10309 [10, 25], A. ipaënsis KG30076 × A. magna KG30097 [11], A. hypogaea subsp. hypogaea var. hypogaea cv. Runner IAC 886, and A. hypogaea subsp. fastigiata var. vulgaris cv. Fleur 11 × a synthetic amphidiploid [24] (Additional file 1).

Total genomic DNA was isolated from young leaves using the CTAB-based protocol described by Grattapaglia and Sederoff [33] modified by the inclusion of an additional precipitation step with 1.2 M NaCl. DNA quality and concentration were estimated on agarose gel electrophoresis and by spectrophotometry (Genesys 4 - Spectronic, Unitech, USA).

Construction of SSR-enriched library

A genomic DNA library enriched for the dinucleotide repeats TC/AG was constructed as described by Moretzsohn [10]. About nine micrograms of DNA were digested with Sau 3AI (Amersham Biosciences, UK) and electrophoresed in 0.8% low melting agarose gels to select fragments ranging from 200-600 bp. The selected fragments were purified from the agarose gels using phenol/chloroform, and ligated into Sau 3AI specific adaptors (5'-cagcctagagccgaattcacc-3' and 5'-gatcggtgaaatcggctcaggctg-3'). The ligated fragments were hybridized to biotinylated (AG)15 oligonucleotides and isolated using streptavidin-coated magnetic beads (Dynabeads Streptavidin, Dynal Biotech, Norway). The eluted fragments were amplified using one adaptor-specific primer, cloned into the pGEM-T Easy vector (Promega, WI, USA) and transformed into XL1-Blue E. coli cells with blue/white selection (Invitrogen, CA, USA). Plasmid DNAs of the positive clones were isolated by the alkaline lysis method. Sequencing reactions were performed with T7 and SP6 primers and the Big-Dye Terminator Cycle Sequencing Kit, version 3.1 (Applied Biosystems, CA, USA) using the ABI Prism 377 automated DNA sequencer.

SSR marker development and validation

Sequences were processed and assembled by using the Staden package [34] with the repeat sequence finding module TROLL [35] and Primer3 for primer design [36], using a module developed by Martins et al. [37]. Sequences with more than ten motif repeats were chosen for primer design. Some sequences with BLASTX hits to genes of interest were also included in spite of having fewer than ten motif repeats. The parameters for primer design were: (1) primer size ranging from 18 bp to 25 bp with an optimal length of 20 bp; (2) primer Tm (melting temperature) ranging from 57°C to 63°C with an optimal temperature of 60°C; and (3) GC content ranging from 40%-60%. Default values were used for the other parameters.

PCR reactions contained 10 ng of genomic DNA, 1 U of Taq DNA polymerase (Amersham Biosciences, UK), 1× PCR buffer (200 mM Tris pH 8.4, 500 mM KCl), 1.5-2.0 mM MgCl2, 200 μM of each dNTP, and 0.4 μM of each primer, in a final reaction volume of 10 μl. Amplifications were carried out in a PTC 100 thermocycler (MJ Research Inc., MA, USA). PCR conditions were: 96°C for 5 min, followed by 30 cycles of 94°C for 1 min, 48-62°C (annealing temperature depending on primer pair, see Additional file 1).

Total genomic DNA was isolated from young leaves using the CTAB-based protocol described by Grattapaglia and Sederoff [33] modified by the inclusion of an additional precipitation step with 1.2 M NaCl. DNA quality and concentration were estimated on agarose gel electrophoresis and by spectrophotometry (Genesys 4 - Spectronic, Unitech, USA).

Construction of SSR-enriched library

A genomic DNA library enriched for the dinucleotide repeats TC/AG was constructed as described by Moretzsohn [10]. About nine micrograms of DNA were digested with Sau 3AI (Amersham Biosciences, UK) and electrophoresed in 0.8% low melting agarose gels to select fragments ranging from 200-600 bp. The selected fragments were purified from the agarose gels using phenol/chloroform, and ligated into Sau 3AI specific adaptors (5'-cagcctagagccgaattcacc-3' and 5'-gatcggtgaaatcggctcaggctg-3'). The ligated fragments were hybridized to biotinylated (AG)15 oligonucleotides and isolated using streptavidin-coated magnetic beads (Dynabeads Streptavidin, Dynal Biotech, Norway). The eluted fragments were amplified using one adaptor-specific primer, cloned into the pGEM-T Easy vector (Promega, WI, USA) and transformed into XL1-Blue E. coli cells with blue/white selection (Invitrogen, CA, USA). Plasmid DNAs of the positive clones were isolated by the alkaline lysis method. Sequencing reactions were performed with T7 and SP6 primers and the Big-Dye Terminator Cycle Sequencing Kit, version 3.1 (Applied Biosystems, CA, USA) using the ABI Prism 377 automated DNA sequencer.

SSR marker development and validation

Sequences were processed and assembled by using the Staden package [34] with the repeat sequence finding module TROLL [35] and Primer3 for primer design [36], using a module developed by Martins et al. [37]. Sequences with more than ten motif repeats were chosen for primer design. Some sequences with BLASTX hits to genes of interest were also included in spite of having fewer than ten motif repeats. The parameters for primer design were: (1) primer size ranging from 18 bp to 25 bp with an optimal length of 20 bp; (2) primer Tm (melting temperature) ranging from 57°C to 63°C with an optimal temperature of 60°C; and (3) GC content ranging from 40%-60%. Default values were used for the other parameters.

PCR reactions contained 10 ng of genomic DNA, 1 U of Taq DNA polymerase (Amersham Biosciences, UK), 1× PCR buffer (200 mM Tris pH 8.4, 500 mM KCl), 1.5-2.0 mM MgCl2, 200 μM of each dNTP, and 0.4 μM of each primer, in a final reaction volume of 10 μl. Amplifications were carried out in a PTC 100 thermocycler (MJ Research Inc., MA, USA). PCR conditions were: 96°C for 5 min, followed by 30 cycles of 94°C for 1 min, 48-62°C (annealing temperature depending on primer pair, see Additional file 1).

Total genomic DNA was isolated from young leaves using the CTAB-based protocol described by Grattapaglia and Sederoff [33] modified by the inclusion of an additional precipitation step with 1.2 M NaCl. DNA quality and concentration were estimated on agarose gel electrophoresis and by spectrophotometry (Genesys 4 - Spectronic, Unitech, USA).

Construction of SSR-enriched library

A genomic DNA library enriched for the dinucleotide repeats TC/AG was constructed as described by Moretzsohn [10]. About nine micrograms of DNA were digested with Sau 3AI (Amersham Biosciences, UK) and electrophoresed in 0.8% low melting agarose gels to select fragments ranging from 200-600 bp. The selected fragments were purified from the agarose gels using phenol/chloroform, and ligated into Sau 3AI specific adaptors (5'-cagcctagagccgaattcacc-3' and 5'-gatcggtgaaatcggctcaggctg-3'). The ligated fragments were hybridized to biotinylated (AG)15 oligonucleotides and isolated using streptavidin-coated magnetic beads (Dynabeads Streptavidin, Dynal Biotech, Norway). The eluted fragments were amplified using one adaptor-specific primer, cloned into the pGEM-T Easy vector (Promega, WI, USA) and transformed into XL1-Blue E. coli cells with blue/white selection (Invitrogen, CA, USA). Plasmid DNAs of the positive clones were isolated by the alkaline lysis method. Sequencing reactions were performed with T7 and SP6 primers and the Big-Dye Terminator Cycle Sequencing Kit, version 3.1 (Applied Biosystems, CA, USA) using the ABI Prism 377 automated DNA sequencer.

SSR marker development and validation

Sequences were processed and assembled by using the Staden package [34] with the repeat sequence finding module TROLL [35] and Primer3 for primer design [36], using a module developed by Martins et al. [37]. Sequences with more than ten motif repeats were chosen for primer design. Some sequences with BLASTX hits to genes of interest were also included in spite of having fewer than ten motif repeats. The parameters for primer design were: (1) primer size ranging from 18 bp to 25 bp with an optimal length of 20 bp; (2) primer Tm (melting temperature) ranging from 57°C to 63°C with an optimal temperature of 60°C; and (3) GC content ranging from 40%-60%. Default values were used for the other parameters.

PCR reactions contained 10 ng of genomic DNA, 1 U of Taq DNA polymerase (Amersham Biosciences, UK), 1× PCR buffer (200 mM Tris pH 8.4, 500 mM KCl), 1.5-2.0 mM MgCl2, 200 μM of each dNTP, and 0.4 μM of each primer, in a final reaction volume of 10 μl. Amplifications were carried out in a PTC 100 thermocycler (MJ Research Inc., MA, USA). PCR conditions were: 96°C for 5 min, followed by 30 cycles of 94°C for 1 min, 48-62°C (annealing temperature depending on primer pair, see Additional file 1).

Total genomic DNA was isolated from young leaves using the CTAB-based protocol described by Grattapaglia and Sederoff [33] modified by the inclusion of an additional precipitation step with 1.2 M NaCl. DNA quality and concentration were estimated on agarose gel electrophoresis and by spectrophotometry (Genesys 4 - Spectronic, Unitech, USA).

Construction of SSR-enriched library

A genomic DNA library enriched for the dinucleotide repeats TC/AG was constructed as described by Moretzsohn [10]. About nine micrograms of DNA were digested with Sau 3AI (Amersham Biosciences, UK) and electrophoresed in 0.8% low melting agarose gels to select fragments ranging from 200-600 bp. The selected fragments were purified from the agarose gels using phenol/chloroform, and ligated into Sau 3AI specific adaptors (5'-cagcctagagccgaattcacc-3' and 5'-gatcggtgaaatcggctcaggctg-3'). The ligated fragments were hybridized to biotinylated (AG)15 oligonucleotides and isolated using streptavidin-coated magnetic beads (Dynabeads Streptavidin, Dynal Biotech, Norway). The eluted fragments were amplified using one adaptor-specific primer, cloned into the pGEM-T Easy vector (Promega, WI, USA) and transformed into XL1-Blue E. coli cells with blue/white selection (Invitrogen, CA, USA). Plasmid DNAs of the positive clones were isolated by the alkaline lysis method. Sequencing reactions were performed with T7 and SP6 primers and the Big-Dye Terminator Cycle Sequencing Kit, version 3.1 (Applied Biosystems, CA, USA) using the ABI Prism 377 automated DNA sequencer.

SSR marker development and validation

Sequences were processed and assembled by using the Staden package [34] with the repeat sequence finding module TROLL [35] and Primer3 for primer design [36], using a module developed by Martins et al. [37]. Sequences with more than ten motif repeats were chosen for primer design. Some sequences with BLASTX hits to genes of interest were also included in spite of having fewer than ten motif repeats. The parameters for primer design were: (1) primer size ranging from 18 bp to 25 bp with an optimal length of 20 bp; (2) primer Tm (melting temperature) ranging from 57°C to 63°C with an optimal temperature of 60°C; and (3) GC content ranging from 40%-60%. Default values were used for the other parameters.

PCR reactions contained 10 ng of genomic DNA, 1 U of Taq DNA polymerase (Amersham Biosciences, UK), 1× PCR buffer (200 mM Tris pH 8.4, 500 mM KCl), 1.5-2.0 mM MgCl2, 200 μM of each dNTP, and 0.4 μM of each primer, in a final reaction volume of 10 μl. Amplifications were carried out in a PTC 100 thermocycler (MJ Research Inc., MA, USA). PCR conditions were: 96°C for 5 min, followed by 30 cycles of 94°C for 1 min, 48-62°C (annealing temperature depending on primer pair, see Additional file 1) for 1 min, 72°C for 1 min, with a final extension for 10 min at 72°C. PCR products were separated by electrophoresis on denaturing polyacrylamide gels (6% acrylamide:bisacrylamide 29:1, 5 M urea in TBE pH 8.3), stained with silver nitrate [38].

Data analyses

Number of alleles per locus, the range of fragment length and gene diversity (GD) were estimated for the polymorphic primers, using the program "Power Marker 3.25" [39]. Pairwise genetic similarities were estimated from the allelic data using the band-sharing coefficient of Lynch [29]. The resulting diagonal matrix was then submitted to cluster analysis using UPGMA ("unweighted pair-group method analysis"). In order to verify the consistency of the built dendrogram, the cophenetic correlation - r [40] was calculated. All these analyses were performed using the software NTSYS 2.21 [41].