Background

Legumes are an important source of protein for humans and livestock. Cultivated peanut (Arachis hypogaea) is one of the most widely grown grain legumes in the world. It is widely cultivated mainly in Asia, Africa and the Americas, and is valued for its high protein and unsaturated oil contents [1, 2]. Worldwide, the major constraints to peanut production are drought and fungal diseases including Early (ELS) and Late Leaf Spots (LLS), the latter caused by Cercosporidium personatum[35].

Wild Arachis species, which are exclusively South American in origin, have high genetic diversity and have been selected during evolution in a range of environments and biotic stresses, and constitute a rich source of allele diversity [68]. The species A. stenosperma harbors resistances to a number of pests, including the root-knot nematode Meloidogyne spp. [9] and fungal diseases [10, 11], whilst A. duranensis is originated from regions with relatively low rainfall [12]. These wild relatives are a rich source of new alleles for peanut improvement as they have sufficient polymorphism for their genetic characterization, and the tracking of genome segments that confer these resistances during introgression into cultivated peanut [1315]. Both species have been exploited as a resource for gene discovery, interpretation of genomic sequences and marker development [11, 1519] and are also parents of a recently developed RIL (Recombinant Inbred Line) diploid mapping population.

In recent years, a relatively large number of EST sequences have been made available in the National Center for Biotechnology Information (NCBI) public database for A. hypogaea (151,352). However, fewer resources exist for A. duranensis (35,292) and A. stenosperma (6,264). In addition, a whole genome sequencing project for peanut (tetraploid) remains a challenge, in part due to the size of the genomes compared to model plants, even for the diploid species (A. duranensis 1,260 Mbp vs. 115 Mbp in Arabidopsis thaliana), but even more because of the high repetitive DNA content [20, 21].

Functional genomics studies, using microarrays and subtractive libraries (SSH), identified genes potentially associated to stress responses to C. personatum and drought stress in Arachis spp. [3, 4, 22, 23]. However, to our knowledge, no massal transcriptome analysis in stressed wild Arachis is available.

Increased transcriptome sequence resources should facilitate basic and applied research on genetics, contribute to the development of molecular markers, facilitate comparative genomics and aid in the study of adaptive variation across the genus. In addition, transcriptome data can assist on the elucidation of genes involved in biological processes, such as defense responses to biotic and abiotic stress, which Transcription Factors (TFs) are notably associated [24, 25], and have hardly been studied in wild Arachis.

Although it is still a challenge to assemble a new whole complex genome using Next-Generation Sequencing technologies (NGS) (454/Illumina), the smaller size and reduced repetitive content of the transcriptome together with increased coverage facilitates the de novo transcriptome assembly using these technologies [26]. Next-generation sequencing technologies have facilitated large scale generation of ESTs cost-effectively, and allowed the whole transcriptome analysis of a number of smaller scale legume crops such as chickpea, lentil, mungbean and pigeonpea [2730]. Moreover, deep sequencing has enabled the identification of new transcripts not present in previous model crops EST collections, such as Arabidopsis[31] and rice [32], and the massive identification of molecular markers such as SNPs (Single Nucleotide Polymorphisms) and SSRs (Simple Sequence Repeats) [33, 34]. In addition, EST-SSRs being genic in origin, frequently display a high degree of transferability between related species.

In the present study 743,232 sequence reads were produced using Roche/454 GS FLX Titanium, generating a total of 17,912 unigenes for A. stenosperma and 21,714 for A. duranensis submitted to infection with C. personatum and gradual water limited stress, respectively. Contigs derived from these reads were annotated into functional categories separately for each species, and the expression of five Arachis RGAs (Resistance Gene Analogs) and four sequences related to the only fully-characterized Arachis retrotransposon (FIDEL) [20] further analyzed by qRT-PCR (quantitative reverse transcription-PCR). This database was also used to design a set of 214 EST-SSRs primers which showed polymorphism via electronic PCR (ePCR) between the two species studied.

The genomic resources developed in this study can, in association with other tools already developed for wild Arachis, contribute to accelerate genetics and breeding of peanut and contribute particularly for the elucidation of genes involved in responses to important biological processes such as fungal diseases and water limited stress in peanut and other legumes.

Results

EST sequencing and assembly

A total of 7.4 x 105 raw sequence reads covering 211 Mbp were generated in a single 454 GS FLX Titanium run (Table 1) on the four libraries constructed from two Arachis species subjected to biotic (A. stenosperma/C. personatum) and abiotic (A. duranensis/water limitation) stress and respective controls. After eliminating adapter sequences, low quality chromatograms and “masking” unwanted sequences (rDNA, mitochondrial, repetitive) a total of 3.1 x 105 processed high quality reads were obtained for A. stenosperma and 2.7 x 105 for A. duranensis (Table 1). The average length of high quality sequence reads was 278 bp for A. stenosperma and 282 bp for A. duranensis, enabling coverage of 85 and 78 Mbp of the genomes, respectively (Figure 1, Table 1).

Table 1 Total number and length of 454 GS FLX Titanium reads for each library
Figure 1
figure 1

454 GS FLEX Titanium read length distribution for each library (bp). AsI- A. stenosperma inoculated with C. personatum; AsC- A. stenosperma control; AdS- A. duranensis under water limited stress; AdC- A.duranensis control.

The high quality reads from the 454 GS FLX Titanium platform were then used for clustering and de novo assembly according to the genotype of origin, resulting in 17,912 unigenes (singletons and contigs) from A. stenosperma and 21,714 from A. duranensis (Table 2) with an average index of 83.8% of accepted reads for all the four libraries. The difference in the number of unigenes generated for the two species can be attributed to the fact that A. duranensis reads represent leaf and root transcriptomes, whilst A. stenosperma is solely composed from leaf tissues. The number of assembled unigenes found in this study including both species (39,626) was comparable to those reported for other legumes such as pigeonpea [19, 30], mungbean [29], lentil [27] and chickpea [28] also obtained using the 454 GS FLX Titanium platform.

Table 2 Total number of unigenes and genome coverage for each species

After removing the singletons, we produced 7,723 high confidence consensus sequences (contigs) for A. stenosperma and 12,792 for A. duranensis, with each contig being built from, on average, a relatively high number of reads (33 for A. stenosperma and 19 for A. duranensis) (Table 2). The number of reads/contig and their length distribution are shown in Figure 2 (A and B). The majority of the contigs were assembled from 2 to 5 reads, with 90% of them containing less than 30 reads for both species (Figure 2). The average length of the contigs was 457 bp for A. stenosperma and 494 bp for A. duranensis, with 27% and 36% of them larger than 500 bp respectively (Figure 2).

Figure 2
figure 2

Distribution of contigs by number of reads (A) and length (B).

The genome coverage of contigs from A. stenosperma was 8.18 Mbp and A. duranensis 10.72 Mbp (Table 2), which make just under 1% of the estimated 1,260 Mbp size of a typical diploid Arachis species [35]. Sequence data from this study can be found for each species in the Sequence Read Archive (SRA) at the NCBI (A. duranensis - SRA047273.1; A. stenosperma - SRA047258.1). The derived contigs for each species and their most significant match against the nr database of GenBank (E value < e-7) is available in additional files (Additional files 1, 2) and at NCBI in Transcriptome Shotgun Assembly (TSA) (A. duranensis - JR332677 -JR344253 and A. stenosperma - JR326556 - JR332676).

Sequence annotation and gene ontology

Only the high confidence consensus sequences (contigs) of A. stenosperma and A. duranensis were compared against the NCBI non-redundant protein sequence database (nr) for each species using BLASTX [36] in order to annotate known proteins/genes (Additional files 1, 2). A relatively high rate of contigs, 52.3% from A. stenosperma and 58.5% from A. duranensis, could be assigned to putative orthologs of genes involved in various pathways and cellular processes, when compared to other legumes without a completely annotated reference genome sequence [27, 29]. Over 27% of the overall transcripts in both, A. stenosperma and A. duranensis showed homology in BLASTX to 14 legume species (Additional files 1 and 2). From these, over 60% of A. stenosperma and A. duranensis transcripts showed homology to Glycine max, followed by Medicago spp. (18%). Only 2.9% of A. stenosperma transcripts showed homology to A. hypogaea, whilst 6.5% showed homology to A. duranensis. This data reflects the greater number of ESTs available for these two legumes in comparison with Arachis spp. and also the closeness of A. duranensis to the cultivated tetraploid A. hypogaea[37].

For functional annotation, Blast2GO [38] was applied to classify contigs at superfamily, family and subfamily levels, to predict the occurrence of functional domains, repeats and important sites, and to include GO (Gene Ontology) terms to the protein signatures. From the 7,723 contigs in A. stenosperma, 96% (7,391) could be assigned to one or more GO annotation category, with 2,925 (39%) attributed to a biological process, 2,144 (29%) to a cellular component and 3,338 to a molecular function (45%) (Figure 3). Likewise, in A. duranensis, an equally high amount (12,024 contigs; 94%) of the 12,792 contigs could be appointed to GO annotation categories, with 4,752 (39%) identified as belonging to a biological process, 3,135 (26%) to a cellular component and 5,937 (49%) to a molecular function (Figure 3).

Figure 3
figure 3

GO Annotation analysis for contigs from A. stenosperma and A. duranensis. Gene Ontology (GO) classification of the predicted A. stenosperma (blue) and A. duranensis (red) ORFs according to cellular location, molecular function and biological process using Blast2GO with e-10 cutoff.

The assignments made to the molecular function ontology was very similar for both species (Figure 3), with a large proportion of the sequences in catalytic (12–15%) and binding activities (14–17%), whilst under the biological process ontology a large proportion fell into metabolic process (15–17%) and cellular process (13%). Additionally, in A. stenosperma, including transcripts from fungi inoculated leaves, 68 sequences were identified in the GO subcategory response to stimulus which included peroxidases, catalases, chitinases, glycosinases and serine/threonine kinases, whilst in A. duranensis, including transcripts from leaves and roots submitted to water limited stress, 126 sequences were in this category with highlight to those sequences related to osmotic stress and water deprivation (Figure 3).

Transcription factors

Transcription factors (TFs) constituted up to 1% of the total high confidence consensus sequences in both species studied, and were classified in TF families by sequence comparison to known transcription factor gene families at a Plant TF public database [39] (Figure 4). In this study, all TF A. duranensis transcripts were classified in 25 families that play important roles in eliciting stress responses such as bZIP (13%), MYB (13%), NAC (7%), bHLH and AP2-EREB (8%) and WRK (6%), the latter being the most highly represented (Figure 4A). In A. stenosperma, a slightly different distribution of the TFs in 20 families was observed with bZIp (18%), MYB (14%), AP2-EREB (10%), bHLH (6%) and WRK (4%), also being the most represented (Figure 4B).

Figure 4
figure 4

Distribution of contigs of A. Duranensis (A) and A. stenosperma (B) by transcription factor (TF) families. Transcription factors (TFs) identified by conserved domain annotation BLASTX with e-7 cutoff.

Expression profile of RGAs and FIDEL

The largest class of known plant disease resistance (R) proteins includes those that contain a nucleotide binding site and leucine-rich repeat domains (NBS-LRR proteins). NBS-LRR proteins may recognize the presence of a pathogen directly or indirectly [40]. A total of 48 homologs of Arabidopsis NBS encoding genes was identified in A. stenosperma according to previous methodology [41], of which five representatives were selected for further expression analysis (Additional file 3). Those five genes were analyzed by qRT-PCR, using cDNA from C. personatum inoculated plants and the respective controls as template, RGA primers described in Table 3, and 60S as the reference gene, according to [42]. Relative quantification of transcripts showed that all five RGAs were up regulated in fungi-challenged plants in comparison to the control, with RGAs 256, 122 and 11 showing the biggest differences in expression levels (Figure 5A).

Table 3 NBS-LRR and FIDEL sequences used for expression analysis using qRT-PCR
Figure 5
figure 5

Relative mRNA levels produced by five NBS-LRR sequences in A. stenosperma leaves inoculated with C. personatum (A) and by four FIDEL sequences in water limited stressed A. duranensis roots (B). Normalization of expression was performed using as references the 60S gene for A. stenosperma and the actin gene for A. duranensis samples. Bars represent the standard error of the mean of two biological replicates for each sample.

Retroelements constitute the major part of repetitive DNA of a number of animal and plant genomes [43]. The long terminal repeat (LTR) retrotransposon-FIDEL constitutes a significant part of Arachis tetraploid and diploid genomes [20]. FIDEL-related sequences were found to be expressed in both species studied with a surprisingly high frequency. For A. duranensis, 0.23% of the high quality sequence reads, and 37 of the 12,792 (0.29%) contigs were FIDEL or FIDEL-related. For A. stenosperma, 1.3% of the high quality sequence reads, and 87 of the 7,465 (1.16%) contigs were FIDEL or FIDEL-related. In silico analysis indicated that most of these contigs were up regulated in response to the biotic/abiotic stresses. Four FIDEL-related contigs (Table 3) were chosen for analysis by qRT-PCR using cDNA from A. stenosperma leaves and A. duranensis roots, as template, and actin or 60S as reference genes.

We found that, with the exception of FIDEL274 in A. stenosperma/C. personatum samples, all representatives of this retroelement showed an increased expression in both species under biotic and abiotic stress (Table 3; Figure 5B). It is also interesting to note that, the levels of induction of the four FIDEL sequences were slightly higher in A. duranensis submitted to water limited stress than in A. stenosperma under fungi inoculation (Figure 5). The sequence FIDEL 412 showed the highest difference in expression levels between stressed and non-stressed plants with 1.74-fold expression ratio (Figure 5B).

SSR identification

The discovery of EST-SSRs in the transcriptome of both species, A. duranensis and A. stenosperma, was performed based on the analysis from assembled contig templates. A total of 2,884 distinct SSR loci were identified, and 1,463 primer pairs were designed for A. duranensis and 862 for A. stenosperma corresponding to 11 and 10% of total contigs, respectively. Table 4 shows the information regarding the primer design and the frequency of different repeat types. Overall, the most abundant SSR type was tri-nucleotides (57%) followed by tetra/penta (20%) and di-nucleotides (12%). More details about the primers are provided in Additional file 4. Thirty–one EST-SSRs identified had been identified in a previous study [17] and were already mapped in the A-genome mapping population (A. stenosperma X A. duranensis) [11, 19] (Additional file 4). The 2,324 primer pairs designed were submitted to ePCR [44] and 584 amplified in both species. Of these, 214 showed to be polymorphic for A. duranensis and A. stenosperma, which are the parents of a RIL mapping population. Some of these newly developed markers will be included in the saturated linkage map that is being constructed for the A-genome of Arachis using this RIL population.

Table 4 Frequency and repeat type of SSRs in A. duranensis and A. stenosperma transcripts

Discussion

Transcriptome sequences are a valuable resource, especially for species without a completely sequenced genome, such as peanut. They accelerate gene discovery, provide an asset for molecular markers development and allow expression analysis and evolutionary genome dynamics studies. In the present study, Next Generation Sequencing (NGS) enabled the generation of large numbers of sequence reads in a rapid and cost-effective manner, and enabled the development of genomic resources for the exploitation of the stress resistances harbored by two wild diploid relatives of peanut.

Some recent studies have indicated that short reads from 454 GS 20 and GS FLX can effectively be used to characterize gene regions in a number of less studied species, including some tropical legumes [26, 2830, 45, 46]. In the present study, the average read length for both species was of 280 bp, which allowed estimated genome coverage of up to 163 Mbp of high quality reads for both diploid Arachis genomes studied in a single sequencing run. In comparison with other studies in legumes, a relatively small number of singletons were produced (8,922 for A. duranensis and 10,189 for A. stenosperma), furthermore the average length and number of reads per contig assembled was comparatively high (475.5 bp and 26 reads/contig) (Table 2) [27, 29, 47]. This may in part be due to very stringent quality and assembly parameters used, which also may partly explain that only 5% of the contigs produced in this study (1,012) failed to show significant functional annotation.

The lack of a complete sequenced and annotated reference genome makes it very difficult to estimate the genome coverage obtained in this study for both species analyzed. However, if we take as comparison other diploid legume genomes which have already been completely sequenced and assume the same number of genes, as for Medicago truncatula (38,835) and Lotus japonicus (42,395), we could suggest that up to 54% of the A. duranensis (21,714) and 44% of A. stenosperma (17,912) unigenes were covered in our work. However, it is also important to be aware that more than one contig or singleton can be originated from a single gene due to either non-overlapping sequence reads or high levels of sequence error in a single read [27].

Transcription factors (TFs) are of special interest due to their role in controlling plant developmental processes and responses to environmental conditions, including functions of key importance to agronomic performance [24]. They have an essential role in the signal transduction networks that leads from the perception of stress signals to the expression of stress-responsive genes, and, as opposed to most structural genes, tend to control multiple pathway steps within a transcriptional cascade [25]. Therefore, TFs are expected to be excellent candidates for modifying complex traits in crop plants, with TF-based technologies likely to be a prominent part of the next generation of successful biotechnology crops [48, 49]. In the present study, 1% of the transcripts were identified as transcription factors (TFs). Their overall distribution among the various known TF protein families was compatible with previous studies in other legumes such as soybean, chickpea, pigeonpea and cultivated peanut [4, 28, 30, 50, 51], with bZIP, MYB, NAC, bHLH, AP2-EREBP and WRKY highly represented in both A. duranensis and A. stenosperma transcripts.

The most expressed TF family was the basic leucine zipper (bZIP)-type TF protein, which comprise regulators of many central developmental and physiological processes and abiotic and biotic stress responses [52]. Among other reports, this TF has been associated with water deficit-response in the relatively drought resistant tepary bean (Phaseolus acutifolius)[53] and to abscisic acid (ABA)-regulated gene expression required for the dehydration-response in Arabidopsis[54]. Likewise, this TF family was the most expressed in A. duranensis plants subjected to gradual water limited stress (18%), suggesting a role of this family in this relatively drought tolerant species. The bZIP TF family was also the most expressed TF in A. stenosperma leaves subjected to C. personatum (18%), and has already been described as involved in defense response to other host-fungi interactions, such as to the stripe rust via the ethylene/methyl jasmonate -dependent signal transduction pathways in wheat [55], and to regulate the expression of some stress-responsive genes such as the PR-1 and Glutathione S-Transferase in Arabidopsis[56].

The second most highly expressed TF family in drought imposed A. duranensis plants (12%) and fungi infected A. stenosperma leaves (14%) was the MYB family, which has been described to act through the ABA signaling cascade to regulate stomatal movement and therefore water loss regulation, and disease resistance in Arabidopsis and rice [57, 58]. Likewise, the plant specific NAC transcription family was showed to be highly expressed in A. duranensis (10%) and to a lesser extent in A. stenosperma (2%). NAC proteins function has been previously described in potato and Brassica napus under fungal infection [59, 60] and to significantly increase drought tolerance in soybean and chickpea [61, 62].

Dehydration-responsive element binding (DREB) proteins a subgroup of the AP2/EREBP, have an important role in plant response and adaptation to abiotic stresses [63]. In this study, they constituted 7% of the TFs in A. duranensis plants subjected to water limited stress. A previous study with transgenic peanut plants over expressing DREB1A showed that the changes in the antioxidative machinery in these transgenic plants under water-limiting conditions played no causative role in improved transpiration efficiency [5, 64, 65]. Nonetheless, different DREB homologues have shown to play different roles in increasing tolerance to cold, salt and drought in different plant species, and have been extensively studied in Arabidopsis, rice and soybean being correlated to increased dehydration tolerance in these species [6669]. An additional consideration is that recent studies indicate that function of central regulators as NAC, WRKY, and zinc finger proteins may be modulated by mechanisms such as small RNA (miRNA)-mediated posttranscriptional silencing, reactive oxygen species signaling and epigenetic processes such as DNA methylation and posttranslational modifications of histones [70]. This suggests that a more comprehensive elucidation of the role and dynamics of drought and defense responsive TFs in plants may be required.

Retroelements, particularly the long terminal repeat (LTR) retrotransposons, constitute the major part of repetitive DNA of plant genomes. Some of these elements seem to be constitutively expressed and others are silent and can be activated upon certain stress signals such as tissue culture, ionizing irradiation, wounding or poliploidization. As a matter of fact, data from the whole genome sequencing of several eukaryotes strongly suggests that, far from being circumstantial, the activity of transposable elements plays an extremely important role in the plasticity and regulation of host gene functions [71]. The mechanisms of how stress induces the activity of an element are not completely clarified, but it has been shown that most expression features of Tnt1, a Solanaceae retrotransposon, can be deduced from the structure of its regulatory regions, located in the LTR that contains several cis-acting elements, which are similar to well characterized motifs involved in activation of defense genes, whilst the Tnt1A G-box-like sequence is related to the typical ABA-responsive (ABRE) sequences and is identical to the MYC recognition sequence present in many drought-inducible genes [71, 72].

In the present study, many transcripts from both species were identified as having similarity to retroelements. Therefore, we studied in more detail FIDEL, the only fully characterized Ty3-gypsy retrotransposon described in allotetraploid peanut (A. hypogaea) and its putative diploid ancestors A. duranensis (A genome) and A. ipaënsis (B genome) [20]. Using qRT-PCR analysis, we observed that FIDEL showed an increased expression ratio in both, A. duranensis roots subjected to gradual water limited stress and A. stenosperma leaves inoculated with fungus, when compared to non-challenged plants. In tobacco and other Solanaceae, drought stress and fungi infection have been described as triggering independent mechanisms of plant defense response and activation of transcription factors and retroelements [71, 73]. In our study, we observed that both biotic and abiotic stresses induced FIDEL or FIDEL-related sequences. However, if the induction of FIDEL represents an activation of some specific FIDEL sequences, FIDEL harboring regions or some more specific response is not known.

Plants, in response to pathogen effectors, have co-evolved specific cytoplasm resistance R protein receptors which recognize individual pathogen effector molecular signatures and activate a second line of defense known as effector-triggered immunity (ETI) [74], also previously known as gene-for-gene or race-specific resistance. In contrast to non-specific response (PAMP-triggered immunity-PTI), which will occur in all members of a particular plant species, ETI operates at the intra-specific level, with resistant genotypes possessing the necessary R gene allele [75]. Conservation of motifs within R genes, such as those present within nucleotide-binding site leucine rich repeat domains, have facilitated their characterization in diverse plant taxa. Putative R genes or Resistance Gene Analogs (RGAs) are commonly clustered, as a result of duplication events occurring under diversifying selection. In Arachis, a previous investigation on RGAs content in a number of wild species [41] showed that from the 78 NBS sequences identified, most fall within legume-specific clades, some of which appear to have undergone extensive copy number expansions. In the present study, all five RGA sequences showed an increase on expression under C. personatum inoculation, when compared to the basal expression in the control samples. This was hardly unexpected, as proteins encoded by disease resistance (R) genes, are mostly constitutively expressed in resistant genotypes, mediating specific molecular recognition of pathogenic microorganisms and triggering signaling cascades that activate defense reactions [76, 77]. A broader characterization of the transcriptional response of a suite of defense genes following stimulation of these R-genes, (i.e. kinases, peroxidases, transcription factors, NPR1) [78], and the defense pathways that they trigger is being conducted via Illumina deep sequencing. This will allow a better understanding of their contribution to the overall resistance response of A. stenosperma to C. personatum.

The transcriptome databank produced in this study enabled the development of 2325 SSR primer pairs of which 214 showed to be polymorphic between the two species. These new markers will enrich the current reference AA diploid Arachis map [19] and other Arachis tetraploid maps under construction. In addition, these EST-SSRs markers exhibit potential advantages when compared to SSRs located in non-transcribed regions due to generally more consistent efficiency of amplification, and enhanced cross-species transferability [27].

The development of new SSRs is of special interest in Arachis because these are still the markers of choice in this genus, due to the difficulties in the application of SNPs markers on the cultivated tetraploid species. Therefore, these new markers will contribute to enrich existing genetic maps, generate more informative genetic and genomic tools and enable the identification of orthologous genes through genome synteny analysis [15].

Conclusions

The use of NGS for transcriptome sequencing of species without a complete reference genome is an effective approach for gene discovery and identification of transcripts involved in specific biological processes. The present work constitutes the largest unigene dataset for A. stenosperma and the second for A. duranensis, providing an insight into genomic architecture of these species and also creating a scaffold of transcribed sequences which will help to elucidate genes involved in biological processes such as fungi and drought- related response genes.

Methods

Plant material and library construction

Seeds of A. stenosperma (V10309) and A. duranensis (K7988) were obtained from the Brazilian Arachis Germplasm Collection, and maintained at Embrapa Genetic Resources and Biotechnology (Brasilia–DF, Brazil). For fungi bioassays, two month old plants of A. stenosperma were inoculated with a 50,000 spores⁄ml suspension in 0.5% Tween 20. Plant leaves were collected at 24, 48 and 72 hours after inoculation (HAI) and from non-inoculated controls, as described in our previous work [42], and immediately frozen on liquid nitrogen for RNA extraction.

For gradual water limited stress experiments, A. duranensis three months-old plants were equally divided into two groups of 33 individuals each: one group was subjected to a gradual water limited stress (STR), whilst the control group (CTR) was kept at approximately 70% of field capacity. Daily individual transpiration rate of STR and CTR plants was estimated gravimetrically and no more than 10 g of water loss per day was allowed in STR plants. Normalized Transpiration Ratio (NTR) was calculated between individual transpiration of STR and the mean transpiration of CTR plants, essentially as described by [79]. Leaves and roots were collected at distinct stages of the progressive water deficit (decreasing NTRs: 0.76; 0.73; 0.57; 0.43 and 0.40) and immediately frozen on liquid nitrogen to proceed RNA extraction.

Total RNA was extracted from 250 mg of plant material as previously described [42]. RNA integrity was checked by gel electrophoresis and quantified using Nanodrop ND-1000 (Thermo Scientific, Waltham, USA). To construct four bulked libraries, equal amounts of total RNA for inoculated A. stenosperma (leaves collected at 24, 48 and 72 HAI) and stressed A. duranensis (leaves and roots from all NTR points) were pooled separately from their respective non-treated controls and used for mRNA isolation. For the cDNA libraries construction and sequencing, services of CD-Genomics (http://www.cd-genomics.com) were used employing the Creator SMART cDNA library construction kit (Clontech Laboratories, California, USA) and Roche 454 GS-FLX System with Titanium chemistry.

Sequence processing and assembly

Raw 454 data was pre-processed using est2assembly [80] for contaminant removal (non-coding RNA and plastidial sequences), quality trimming and adaptor trimming and poly-A removal. Transcript clustering was carried out using MIRA [81].

Similarity search and functional annotation

Functional annotation of the cluster consensi was performed by sequence similarity searches using BLASTX program [36] against NCBI’s non-redundant sequence database. InterProScan [38] was employed to perform protein domain and motif searches. Gene ontology (GO) terms were assigned by Blast2GO [82].

For the identification of NBS encoding genes in A. stenosperma, predicted Arabidopsis NBS containing proteins identified as described in [41] were used as a BLAST database against which, all A. stenosperma contigs were used as query sequences in a BLASTX search. BLAST detected similarities were considered significant with E-values of 1e-7 or less (Additional file 3). Similarly, predicted FIDEL sequences identified in previous studies [20] were used as a BLAST database against which all A. stenosperma and A. duranensis contigs were used as query sequences in a BLASTX search (E value < 1e-7).

For the identification of TF families represented in this study, from the functional BLASTX annotation (value < 1e-7), all putative TF genes from both Arachis species were selected and, classified according to their respective TF family using the Plant TF database [39].

SSRs identification

The program Mreps [83] was employed for the identification of simple sequence repeat (SSR) along the sequences. The parameters were set to identify perfect di- to hexa-nucleotide repeats with a minimum length of 12 bases. A series of custom-made PERL scripts were created to processes the potential SSR loci and to create flanking primers, based on Primer3 [84]. Electronic PCR was carried out using the PrimerMatch package [85] for the identification of primers amplifying in both species, and from these, the polymorphic set between A. duranensis and A. stenosperma.

Expression analysis by qRT-PCR

Plant materials were obtained in new independent experiments carried out as described above.

After isolation and purification, total RNA of four samples (A.stenosperma leaves inoculated with C. personatum and A. duranensis roots stressed by water limitation and their respective non-treated controls) was digested with DNase (TURBO DNA-free™, Ambion, USA) and reverse-transcribed using SuperScript™ II RT and Anchored Oligo(dT)20 primer (Invitrogen, Carlsbad, CA, USA), as previously described [42]. For qRT-PCR, the Platinum® SYBR® Green qPCR Super Mix-UDG w/ROX kit (Invitrogen, Carlsbad, CA, USA) was used according to manufacturer's recommendations on a ABI 7300 Real-Time PCR System (Applied Biosystem Foster City, CA, USA). Two biological replicates for each of four samples were used for real-time PCR analysis, with each replicate representing a pool of five plants. Reactions were carried out using three technical replicates for each sample. Specific primer pairs were designed for five RGAs and four Fidel – related sequences (Table 3) with Primer3Plus software [84] and qRT-PCR cycling conditions were carried out with a final dissociation curve step, using previously described parameters [42]. Normalization of expression was performed using as references the 60S gene for A. stenosperma and the actin gene for A. duranensis samples [42]. All calculations for relative quantification, such as amplification efficiencies, correlation coefficients R2 values and relative expression profile (comparative Ct method) were performed using 7500 v.2.0.4 software (Applied Biosystem, Foster City, CA, USA).