Background

Cultivated peanut (Arachis hypogaea L.) is a major crop in most tropical and subtropical areas of the world and provides a significant source of oil and protein to large segments of the population in Asia, Africa and the Americas. In the U. S., peanut is a high-value cash crop of regional importance, with major production areas concentrated in the Southeast. Plant breeding efforts to pyramid genes for disease and insect resistances, quality, and yield is hampered by the polyploid genetics of the crop species, the multigenic nature of many traits (e.g., yield), and the difficulty of selecting for many traits in the field (e.g., soil borne diseases). Thus, secondary selection methods that are environmentally neutral would greatly facilitate crop improvement efforts. Molecular markers fit this criterion, but only recently have markers been developed that reveal sufficient polymorphisms in A. hypogaea and related species to have wide-spread application in peanut breeding. Preliminary steps for utilizing molecular markers for crop improvement are developing collections of polymorphic markers and utilizing them to construct dense and high-resolution genetic maps.

Constructing a high-quality genetic map depends largely upon finding one or more marker systems that can detect high levels of polymorphism between two individual parents. Unfortunately, low levels of molecular polymorphism were observed within tetraploid (2n = 4x = 40) A. hypogaea throughout the 1990s and early 2000s with the marker systems available at that time [1, 2]. However, compared with the limited numbers of polymorphic markers detected for the tetraploid, the same marker systems can uncover high levels of molecular polymorphism within and between the diploid (2n = 2x = 20) peanut species. This polymorphism led researchers to create molecular maps for Arachis. The first molecular map in peanut was constructed between the diploids A. stenosperma Krapov. and W.C. Gregoryx and A. cardenasii Krapov. and W.C. Gregory by Halward et al. [3] who used Restriction Fragment Length Polymorphisms (RFLPs) to associate 117 markers into 11 linkage groups. Additional maps were subsequently published using Randomly Amplified Polymorphic DNA (RAPD) [4] and Simple Sequence Repeats (SSRs) [5, 6]. Burow et al. [7] published the first tetraploid map in peanut based on 370 RFLP loci across 23 linkage groups by utilizing the complex interspecific cross, Florunner × 4x A. batizocoi Krapov. and W.C. Gregory (A. cardenasii × A. diogoi Hoehne)]. Another interspecific tetraploid linkage map of 298 loci and 21 linkage groups was derived from a backcross population between A. hypogaea and a synthetic amphidiploid [8]. Only recently have linkage maps been developed from crosses between A. hypogaea genotypes, most with less than 200 loci and with more than the expected 20 linkage groups [913]. An exception is the recently published map containing 1114 loci across 21 linkage groups that was constructed in part with highly polymorphic markers derived from sequences harboring miniature inverted repeat transposable elements [14]. Therefore, there is a continuing need to generate dense linkage maps for the cultivated tetraploid peanut that will not only cluster the markers into the expected 20 linkage groups to cover the haplotype chromosomes, but also to facilitate marker-trait association and eventually assist in its genetic improvement.

The domesticated peanut is thought to have arisen from a single hybridization event between two diploid wild species followed by whole genome duplication approximately 3,500 years ago [15]. This short evolutionary history, along with hybridization barriers between diploids and the tetraploid have resulted in a narrow genetic base for the cultivated tetraploid peanut. On the contrary, diploid Arachis species are genetically diverse, have simpler inheritance patterns, and most importantly, contain a rich source of agronomically important traits for peanut improvement. Due to these attributes, diploid Arachis species have been proposed as model systems to map the peanut genome. Because the genomes of progenitor diploid species [i.e., A. duranensis (A-genome donor) and A. ipaensis (B-genome donor)] are closely allied to the cultivated peanut [16], mapping the genome of one or both of these species should be useful for predicting the positions of loci in the cultivated peanut. This approach has been employed in wheat [17, 18], alfalfa [19, 20], oat [21], and other crop species.

One accession of A. ipaensis and 67 accessions of A. duranensis have been collected in South America. The largest concentration of A. duranensis is in southern Bolivia and northern Argentina, with a few populations being reported in Paraguay and one in central Brazil [22, 23]. The species is morphologically diverse and the Bolivia and Argentina types can be separated cytogenetically and morphologically [24]. Due to the availability of diverse accessions to produce intraspecific crosses in the greenhouse, a dense linkage map in the diploid species A. duranensis was produced using large numbers of molecular markers derived from transcribed sequences.

Results and discussion

Species relationships

A preliminary study of SSR marker variation among 37 A. duranensis accessions using 556 markers indicated that the species is highly polymorphic at the molecular level and individual accessions could be separated based on a cluster analysis (Figure 1). Interestingly, we found that A. ipaensis, the proposed B-genome (BB) progenitor species, clustered with the A-genome (AA) species A. stenosperma and not with the B-genome species A. batizocoi. Recent molecular cytogenetic analysis of A- and non-A- (i.e., B-) genome species suggests that karyotype diversity among non-A-genome species is extensive enough to support separation into additional genome classes where A. ipaensis remains in B sensu stricto while A. batizocoi is placed into a separate group [25]. Therefore, A. batizocoi is less typical of B-genome species.

Figure 1
figure 1

Genetic relationships among A- and B-genome Arachis species. Clustering of A- (A. duranensis and A. stenosperma) and B- (A. ipaensis and A. batizocoi) genome species according to analysis of data from SSR markers. The two parents used for mapping are indicated by arrows.

The number of polymorphic SSR markers between paired A. duranensis accessions ranged from 160 to 375 out of 556, which is 29 to 67% of the total number of SSR markers screened. This is a significant amount of variation, which indicates the high genetic diversity within the species. Based on cluster analysis, success of crosses, and fertility of F1s, accessions PI 475887 and Grif 15036 were selected for subsequent mapping studies using 94 F2 progenies. Screening of the parental accessions with 2,138 SSR markers derived from A. hypogaea EST sequences resulted in 1,768 (82.7%) that were scorable (detected by ABI3730XL genotyping systems) and 896 (41.9%) that were polymorphic (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A- and B-genome diploid species of peanut, Submitted). The same markers were used to create a map between two A. batizocoi accessions and to determine syntenic relationships between the A and B genome species (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, submitted).

Arachis duranensis genetic map

The total number of published SSR markers has now risen beyond the 2,847 cataloged in a related paper by Guo et al. (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, submitted) to around 6,000 [26]. Those most recently reported include: 14 by Gimenes et al. [27]; 51 by Mace et al. [28]; 188 by Proite et al. [29]; 104 by Cuc et al. [30]; 138 by Yuan et al. [31]; 33 by Song et al. [32]; 123 by Wang et al. [33]; 290 by Liang et al. [34]; and 1,571 by Koilkonda et al. [35]. Five hundred and ninety-eight of these markers are included in the A. duranensis map (Figure 2). Of the 34 genomic SSR markers mapped in the current study (Table 1), 24 were mapped previously in an interspecific population between A. duranensis and A. stenosperma[6, 36]. These markers served to anchor and align the current and previously published peanut maps (Figure 2). Linkage group assignments of all markers were consistent between the current map and that of Bertioli et al. [36] except for the marker GM117 (AC3C02 on map in reference 36 derived from GenBank accession DQ099133) that was localized on chromosome 2A (the ‘A’ following a chromosome number is presented in this study to represent chromosomes in the A genome of peanut) in their interspecific map, while mapping to chromosome 10A in the A. duranensis intraspecific map. Although detailed information for parental alleles in the study by Bertioli et al. [36] was not presented, GM117 amplified only one locus from each parent in both their population and ours. It is, therefore, unlikely that the marker location discrepancy was due to mapping of multiple loci and perhaps could reflect a small chromosomal rearrangement. Chromosomal rearrangements are not unexpected based on previous cytological observations in the genus [24, 37].

Figure 2
figure 2

High-density linkage map of Arachis duranensis including 1,724 markers. SNP and SSR markers are prefixed by ‘SNP’ and ‘GM’, respectively, resistance gene candidate markers are prefixed by ‘RGC’ and ‘GS’. Twenty-four previously published markers (underlined) were selected from an interspecific map between A. duranensis and A. stenosperma[36] to establish synteny between the current and former linkage groups. The original linkage group assignments are given in the marker names separated by the pound (#) sign. Loci with significant segregation distortion (p = 0.05) are labeled with an asterisk. Graphs to the right of the linkage groups represent recombination frequencies. Each data point represents genetic distances between adjacent markers averaged for a window of 20 markers.

Table 1 Previously published genomic SSR markers mapped in Arachis duranensis

EST libraries of A. duranensis were developed to produce Single Nucleotide Polymorphism (SNP) markers for mapping (Table 2). Of the 1,536 SNP markers developed (Additional file 1), 1,054 were included in the A. duranensis map (Figure 2). The remaining 482 SNP markers were either of low quality (GC quality score <0.25) or they showed segregation patterns (extremely distorted) that could not be mapped. Of the 1,054 mapped SNP markers, 815 were derived from the cDNA sequencing project while the other 239 were genomic legume orthologs.

Table 2 cDNA sequence reads generated for single nucleotide polymorphism (SNP) discovery in Arachis duranensis*

The A. duranensis map produced in this study contained 1,724 markers combined into 10 linkage groups with a total genetic distance of 1081.3 cM. MSTMap, a software program that accommodates large numbers of markers and utilizes a “minimum spanning tree” algorithm, was used to construct an initial genetic map using only the codominant markers. The 1,673 codominant markers were distributed into 810 co-segregating groups (bins). Although this program has been reported to be accurate for large-scale mapping projects [38], few independent studies are available establishing consistency between MSTMap and other commonly used mapping software [39]. To confirm the linkage group assignments, marker orders, and genetic distances determined by alternative software, both codominant and dominant markers were mapped with Joinmap 3.0. Marker orders and genetic distances were highly consistent between MSTMap and Joinmap 3.0 (Additional file 2).

Significant segregation distortion (p = 0.05) was observed for 513 (29.8%) markers (Figure 2, Additional file 3). Chromosomes 4A and 9A carried particularly long segments of distorted segregation suggesting large-scale chromosomal selection in these regions. Guo et al. (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted) found that a single linkage group (4/9B) in A. batizocoi was syntenic with chromosomes 4A and 9A of A. duranensis implicating inversion and reciprocal translocation events as the underlying chromosomal rearrangements in this B-genome species. Recombination frequencies were generally low in the central, presumably centromeric chromosomal regions of A. duranensis and increased toward the telomeres, a pattern typical of many plant species [40, 41]. More even distribution was observed along chromosome 3A and only slightly suppressed recombination was observed around the presumable location of the centromere (Figure 2).

Across the A. duranensis linkage map, each linkage group spanned on average 108.1 cM (77.3-145.6 cM) and included 172.4 markers (119–266) (Table 3). This is considerably denser than the previously published AA, BB, and AABB maps consisting of only a few hundred markers. For example, the A. ipaensis × A. magna B-genome map published by Moretzsohn et al. [5] had 149 SSR markers grouped into 10 linkages, whereas the B-genome SSR-based map in our related paper consists of 449 loci in 16 linkage groups (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted). The A-genome map produced using the interspecific hybrid A. duranensis × A. stenosperma had 339 SSRs that were mapped into 11 linkage groups [6, 42]. For A. hypogaea, there are now several maps with the most dense consensus map containing 324 loci on 21 linkage groups [11].

Table 3 Genetic distances and distribution of markers on the ten linkage groups of A. duranensis

The map produced in the current study is the first high-density map available in peanut, and because it was generated from a progenitor species of A. hypogaea, we anticipate that it will have significant applications for analyzing the cultivated genome. For example, the data generated in this map was used by Nagy et al. [43] to more precisely map the Rma gene for nematode resistance that originated from an introgression line between A. hypogaea and A. cardenasii. The A-genome SNP array also has been useful at the tetraploid level for genotyping a recombinant inbred line population derived from a cross between cultivated peanut and a synthetic A. ipaensis x A. duranensis tetraploid (Ozias-Akins, unpublished).

Gene annotation and comparative mapping

Homology search of the 1,724 mapped loci resulted in significant hits for 1,463 loci in at least one of the three databases: Medicago, Uniprot and GenBank NR database, and 580 of the mapped loci gave significant similarities in either of the two gene ontology databases: Medicago Gene Atlas and TAIR (Additional file 4). Altogether 1,366 gene ontology terms were assigned to the 580 genes. These were distributed among the three major gene ontology categories as follows: 521 molecular functions, 534 biological processes, and 311 cellular components (Additional file 4).

The sequences used to create the A. duranensis map also were compared to the genomes of two legumes where 995 loci on the A. duranensis map could be mapped to M. truncatula, and 2,711 matches could be found in G. max (with potentially two hits per mapped locus). While a majority of the dots in the synteny plots appear to be random (Figure 3), there are definite clusters of markers, and striking examples of colinearity (red arrows), especially for the comparisons to Glycine. Presumably there has been extensive single gene movement since the last common ancestors in one or both species, but many genes remain in the ancestral locations and can be detected. Overall, the synteny patterns for G. max showed the recent whole genome duplication within Glycine, with each location in peanut showing corresponding synteny at two locations in Glycine. Colinearity between Medicago and Arachis is much less conserved than between Arachis and Glycine. This could be due to extensive inversions in either genome, or more likely, due to preliminary ordering of sequences within the Medicago unfinished genome assembly. In general, the patterns showed strong synteny on the chromosomal ends in both genetic and physical distance, while the central regions of chromosomes tended to show less synteny. Presumably this could be attributed to pericentromeric heterochromatin which is known to define less recombinogenic regions where genomic rearrangements are more likely to persist [44]. Chromosome arms tend to be maintained as syntenic between Glycine and Arachis, but there is evidence that chromosome arms have been translocated in some cases so that synteny exists at the chromosome arm level, but less so at the whole chromosome level.

Figure 3
figure 3

Synteny between diploid A-genome peanut ( A. duranensis , 2 n = 20) and Glycine max (2 n = 40). Arrows indicate clusters of genes in common between the two genomes. For plotting the data on the Y axis, the peanut genome for each chromosome is proportional in size to the total map size in centimorgans. For the X axis, the unit of measure is scaled to bp within the chromosomal assemblies of the respective genomes. The plot was obtained with a visual basic program that plotted the x‐y coordinates of each hit. The total number of matches for each pair wise comparison is listed at the upper left corner.

Conclusions

This investigation provided a large number of de novo EST sequences that were deposited into GenBank. The markers developed here are valuable resources for peanut and, more broadly to the legume research community. This research presents the first high-density molecular map in peanut with 1,724 markers grouped into the 10 expected linkage groups for an A-genome species. Because the map was produced with the progenitor species A. duranensis which contributed the A genome of A. hypogaea, it will serve as the reference map for both wild and cultivated species. Lastly, synteny was found between Arachis and the Glycine and Medicago genomes, which indicates that markers developed for other legume species may be of value for crop improvement in peanut. The A-genome map will have utility for fine mapping in other peanut species and has already had application to mapping a nematode resistance gene that was introgressed to A. hypogaea from A. cardenasii.

Methods

Plant materials

Thirty-seven accessions of A. duranensis, 14 accessions of A. stenosperma (A genome), one accession of A. ipaensis, and eight accessions of A. batizocoi (B genome) were obtained from the USDA or NCSU germplasm collections. Plants were then grown in greenhouses at the University of Georgia at Athens. The accessions evaluated are shown in Figure 1. Hybrids were made between three pairs of A. duranensis accessions, including PI 468200 × PI 468198, PI 468319 × PI 475885, and PI 475887 × Grif 15036. The hybrid combination PI 475887 × Grif 15036 was one of the most polymorphic as revealed by using a panel of SSR markers as described below and thus was selected for subsequent mapping. PI 475887 was originally collected by Krapovickas, Schinini, and Simpson near Salta, Argentina during 1980, and Grif 15036 was originally collected by Williams, Simpson, and Vargas near Boqueron, Paraguay during 2002 [22]. Crosses were made by manually emasculating flowers of the female parent PI 475887 during the late afternoon and pollinating stigmas between 8 and 10 am the following morning with pollen from the male parent Grif 15036. An F2 population was developed by self-pollinating multiple F1 individuals. The intraspecific F2 population (n = 94) from a hybrid between two A. duranensis accessions was then used for mapping studies.

Molecular diversity between and within A- and B-genome diploid species

DNA was isolated from leaf samples of A. duranensis, A. ipaensis, A. stenosperma, and A. batizocoi accessions using a modified CTAB method [45, 46]. The 60 DNA samples were amplified using 709 different SSR primer pairs (GM1-GM709) that had been generated from sequences reported in the literature [6, 29, 4753] and screened for polymorphisms. SSR markers were genotyped on an ABI3730XL Capillary DNA Sequencer (Applied Biosystems, Foster City, CA) as described in a related paper by Guo et al. (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted) using forward primers labelled with FAM, HEX, or TAMRA fluorophores. Microsat [54] was used for construction of a distance matrix based on the proportion of shared bands (D = 1 - ps) from 556 primer pairs amplifying polymorphic fragments. The matrix was imported into Phylip v3.67 [55] for the construction of the neighbor-joining tree.

Marker development

Simple sequence repeat (SSR) markers

A total of 101,132 unigenes (37,916 contigs (GenBank Acc. No. EZ720985-EZ758900) and 63,216 singletons) from tetraploid peanut ESTs (GenBank Acc. No. CD037499-CD038843, ES702769-ES768453, GO256999-GO269325, GO322902- GO343529 and short-read Sequence Read Archive accessions SRX020012, SRX019979, SRX019972, SRX019971) representing ca. 37 Mb of the A. hypogaea genome were mined for 2,138 EST-SSR markers (GM710-GM2847) (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted). Unigenes in the transcript assembly were screened for perfect repeat motifs using SSR-IT http://www.gramene.org/db/markers/ssrtool) and for imperfect motifs using FastPCR (http://primerdigital.com/fastpcr.html). The repeat count (n) threshold for each motif type was set for n ≥ 5. SSR markers were genotyped on an ABI3730XL Capillary DNA Sequencer (Applied Biosystems, Foster City, CA) using forward primers labelled with FAM, HEX, or TAMRA fluorophores. PCR was performed in a 12 μL reaction mixture containing 1.0 × PCR buffer, 2.5 mM Mg++, 0.2 mM each of dNTPs, 5.0 pmol of each primer, 0.5 unit of Taq polymerase, and 10 ng of genomic DNA. Touchdown PCR was used to reduce spurious amplification. The SSR markers were screened for length polymorphisms using GeneMapper 3.0 software (Applied Biosystems, Foster City, CA). Of the 2,138 EST-SSR primer pairs tested, markers derived from 598 could be mapped. A set of 34 SSR markers from genomic sequences of Arachis previously screened for polymorphism between parents of the A. duranensis mapping population (Guo Y et al: Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A-and B-genome diploid species of peanut, Submitted) were also mapped (Table 1).

Single-stranded conformational polymorphism (SSCP) markers

SSCP markers were developed from genomic DNA templates for previously described NBS sequences isolated by targeting conserved sequence motifs in NBS-LRR encoding genes [56, 57] and from Arachis unigenes showing similarity to R-gene homologs identified by mining a peanut transcript assembly [43]. SSCP fragments were amplified using touch-down PCR and detected by silver-staining as previously described [5860]. A total of 380 SSCP markers were evaluated for polymorphism between the parents PI 475887 and Grif 15036. The resistance gene analog markers are prefixed by either ‘GS’ or ‘RGC’ in the map. cDNA sequences for unigenes targeted for SSCP marker development in the present study were deposited in GenBank (Acc. No. GF100476-GF100638). One additional marker, the SCAR marker S197 linked to a root-knot nematode resistance gene in Arachis hypogaea[43, 61] was also mapped.

Development of single nucleotide polymorphism (SNP) markers

Total RNA was isolated from roots of young seedlings (up to four trifoliate) and from developing seeds (up to developmental stage R6) of the two parental genotypes, PI 475887 and Grif 15036 (alias DUR25 and DUR2, respectively). cDNA libraries were developed using the Mint cDNA synthesis kit (Evrogen) and normalized using the Trimmer cDNA normalization kit (Evrogen). cDNA sequences were generated by Sanger and 454 GS-FLX sequencing methods and assembled using the tool Mira [62]. Altogether, more than one million cDNA sequence reads were generated from A. duranensis PI 475887 and Grif 15036. These were assembled into 81,116 unique transcripts (unigenes) (GenBank Accn. No. HP000001-HP081116). Assemblies were searched for single nucleotide polymorphisms (SNPs) that fulfilled the following two criteria: (a) the SNP position is covered at least by two reads from each genotype, and (b) at least 80% of the reads call the SNP in the particular genotype. Using these criteria, we identified 8,478 SNPs in 3,922 unigenes. To facilitate the selection of candidate SNPs for designing and building Illumina GoldenGate SNP genotyping arrays, putative intron positions were predicted by aligning Arachis contigs with Arabidopsis and Medicago genomic DNA sequences identified by BLAST analyses. SNPs within 60 bp of a putative intron were eliminated, thereby reducing the collection of candidate SNPs to 6,789 in 3,264 unigenes from which 1,236 high-quality SNPs, each representing separate unigenes, were selected for genotyping. SNPs were also detected by allele re-sequencing in a subset of 768 conserved legume orthologs identified by coauthors (R.V. Penmetsa, N. Carrasquilla-Garcia, A. D. Farmer and D.R. Cook), and 300 of these SNPs were added to the GoldenGate array. SNP genotyping on the GoldenGate array was conducted at the Emory Biomarker Service Center, Emory University. The BeadStudio (Illumina) genotyping module was used for calling genotypes. Markers with GC quality scores lower than 0.25 were excluded from subsequent analysis.

Map construction

The program, MSTMap [39] was used to build a core genetic map including all codominant markers using the cut-off p-value of 10-12 for clustering markers into linkage groups. The recombinant inbred line2 (RIL2) algorithm and Kosambi function were used to calculate genetic distances. The program Joinmap 3.0 [63] was used to localize the dominant markers and to confirm the marker order, a range of LOD scores of 5–16 was used to create groups. The Kosambi mapping function was used for map length estimations. Markers were tested for segregation distortion by the chi-square test. Graphic presentation of the map was drawn using Mapchart 2.0 software [64].

Gene annotation

The cDNA sequences included in the genetic map have been used to search for homologous genes in the Medicago (http://www.medicago.org), Uniprot (http://www.uniprot.org) and GenBank NR (http://www.ncbi.nlm.nih.gov/genbank) databases using various blast algorithms. Gene ontology annotations were also added by searching Medicago Gene Atlas (http://mtgea.noble.org) and The Arabidopsis Information Resource (TAIR, http://www.arabidopsis.org) databases. A significance threshold of E =1e-5 was applied in all inquiries.

Synteny between Arachis, Medicago, and Glycine

The EST sequences used for marker-development were compared to the whole genome sequences of Glycine max and Medicago truncatula to establish synteny. Sequences for the genomes G. max V5 and M. truncatula MT3.0 were obtained through http://www.phytozome.net. The sequences associated with each locus on the A. duranensis peanut map (Additional file 1 and Additional file 5) were searched against the respective whole genome sequences using blastn and E < =1e-6. For comparison to Medicago, only the best match was retained because diploid peanut and M. truncatula are at the same relative ploidy level. However for Glycine, the two best matches for each peanut sequence were retained because of the recent polyploidy within soybean and the high level of retention of duplicated genes in the species. Blast hits to scaffolds or Bacterial Artificial Chromosomes (BACs) not anchored to the chromosomal assembly in the target genomes were discarded. Plotting the data and processing of blast results were performed with Visual Basic programs written for this study.