Soybean (Glycine max) is one of the most widely grown crop species in the world. Current evidence indicates that the cultivated soybean was domesticated from its annual wild relative, wild soybean (Glycine soja Sieb. and Zucc.), over 5,000 years ago in China [1]. Compared to cultivated soybean, wild soybean possesses much higher adaptability to natural environmental stresses such as drought, alkaline and salt stress, which demonstrates the potential usefulness of the wild soybean to improve cultivated soybean [25]. Soybean breeding and improvement is hindered by a narrow domesticated germplasm compared to other crop species [6]. Studies have revealed that G. soja shows greater genetic diversity and higher allelic diversity than G. max[68]. Wild soybean can readily cross with cultivated soybean giving rise to fertile hybrids, thus making G. soja a promising source of novel genes and alleles for soybean breeding and improvement.

Aluminum (Al) toxicity is a major limitation to crop production on the acid soils that make up about 30–40% of the world’s arable lands. In acid soil, aluminum causes the rapid inhibition of root growth and subsequently inhibits water and nutrient uptake in plants, which increases the susceptibility of plants to environmental stresses and results in reductions in crop production [9, 10]. Soybean is planted widely on acid soil and its productivity is significantly hampered by Al stress. It is known that plant species and genotypes within species differ markedly in their tolerance to excess Al; however, the mechanisms responsible for Al tolerance are not so clearly understood. The exclusion of Al from the roots and the detoxification of Al ions in the plant are two of the Al tolerance mechanisms that have been proposed in plants [11]. To date, many of the genes responsible for Al tolerance that have been identified in plants other than soybean are involved in root Al-induced organic acid exudation, the redistribution or sequestration of Al, and in coding transcription factors [1218]. In soybean, Al tolerance has been described as a quantitative trait that involves several genes and pathways [19, 20]. Ermolayev and coworkers [21] and Ragland and Soliman [22] have identified some genes that were related to Al tolerance in soybean. These genes include phosphoenolpyruvate carboxylase (PEPC), homolog of translationally controlled tumor proteins (TCTPs), inosine 5-monophosphate dehydrogenases (IMPDHs) [21], aluminum-induced 3–2 (Sali3-2), and aluminum-induced 5-4a (Sali 4-5a) [22]. Duressa and coworkers [23] used DNA microarray technology to identify putative genes in the Al-tolerant soybean line PI 416937 and reported that many genes involved in transcription activation, stress response, cell metabolism and signaling were differentially expressed in Al-tolerant soybean. They concluded that Cys2His2 and ADR6 transcription activators, cell wall modifying enzymes, and phytosulfokine growth factors might play roles in soybean Al tolerance [23].

MicroRNAs (miRNA), one of the major types of endogenous non-coding RNAs in higher plants, modulate gene expression at the post-transcription and translational levels [24, 25]. An increasing amount of evidence demonstrates that miRNAs play critical roles in regulating development, stress response, hormone response and many other biological processes in plants [2628]. Although a number of miRNAs have been identified in plants, the genome-wide discovery of new miRNAs is essential for the functional characterization of miRNAs. Recently, using next-generation sequencing technology, hundreds of small RNAs, especially miRNAs with low abundance, have been isolated by small RNA sequencing in higher plants [2932]. Further, this technology has been used successfully to systematically identify stress-associated miRNAs [3337]. Currently miRBase (Release 18: November 2011) lists 362 miRNAs that have been identified in G. max from different tissues, including root, seed, flower, nodule and shoot apical meristem [3843]. Recently, miRNAs responsive to abiotic and biotic stresses such as water deficit, rust and Phytophthora root rot have also been reported in soybean [44, 45]. However, in the same release of miRBase 18.0, only 13 miRNAs from wild soybean are listed [46].

To functionally characterize the biological roles of each miRNAs, target validation is required. Modified 5’ RACE (rapid amplification of cDNA ends) has been widely applied in target confirmation and cleavage site mapping [26]. However, this method is used only for small-scale target confirmation because it is costly, and labor and time consuming. Recently, degradome sequencing, a high-throughput method known as PARE (parallel analysis of RNA ends), has been adopted to identify the target transcripts of miRNAs [4749]. This technology has been successfully used to identify hundreds of targets cleaved by conserved and unconserved miRNAs in plant species [36, 47, 4951].

Most of the soil in South China where wild soybean is widely distributed is typical acidic soil. Therefore, the wild soybean that grows there may have evolved special mechanisms to survive under Al stress conditions. However, little information about the response of wild soybean to Al stress is available. To obtain highly Al tolerant plant materials, the core germplasm of more than 500 wild soybean plants from South China were collected and screened. From among the 500 plants, one wild genotype (BW69) showed the highest Al tolerance (unpublished). Subsequently, this genotype was treated with Al to detect new miRNAs and their targets responsive to Al stress in wild soybean. The microRNA sequencing and degradome sequencing developed by Solexa (Illumina Inc.) were applied to investigate the expression of miRNAs and their targets responsive to Al stress. In total, we identified 97 known miRNAs, 31 novel miRNAs, and a further 49 p3 or p5 strands of known miRNAs. In addition, 91 genes sliced by miRNAs were detected through degradome sequencing. Among the cleavage targets, 52 genes were transcription regulators.


Deep-sequencing results of wild soybean

To identify miRNAs responsive to Al stress, two small RNA libraries constructed from the roots of Al-treated and Al-free (the control) wild soybean seedlings were sequenced on the Illumina Genome Analyzer IIx. A total of 8,616,284 and 8,712,410 raw sequences were generated from Al-treated and Al-free libraries, respectively (Table 1) (high-throughput sequencing data were deposited into the NCBI-GEO with accession no. GSE38065). After removal of low-quality and corrupted adapter sequences (such as 3’ adapter not found), sequences with less than 15 bases after cutting 3’ adapter, and junk reads, 7,287,655 and 7,035,914 clean reads from the Al-treated and Al-free libraries, respectively, remained (Table 1). The length distribution of the unique sequences showed that the most abundant sequences were 24 nt long; however, when redundant sequences were included, the most abundant sequences were 23 nt long (Figure 1). This atypical situation was also reported in cucumber in which a high number of redundant 22-nt sequences were obtained by Solexa (Illumina) high-throughput sequencing [52, 53]. When the clean reads were searched against the Rfam [54] and repeat databases [55], 2.68% and 2.37% of the small RNAs in the Al-treated and Al-free libraries, respectively, were removed from the libraries (Table 1). The remaining sequences were mapped to the G.max genome sequence [56], and sequences that aligned to mRNAs were removed. Finally, 782,582 and 754,685 sequences in the Al-treated and Al-free libraries, respectively, were obtained and used for miRNA prediction (Table 1).

Table 1 Distribution of the G. soja sequences in the Al-treated and Al-free libraries
Figure 1
figure 1

Length distribution of the wild soybean samll RNAs obtained by high-throughput sequencing in two libraries. A) Size distribution of redundant sequences. B) Size distribution of unique sequences.

The total number of small RNAs for miRNA prediction in the Al-treated library (782,582) was 3.69% higher than in the Al-free library (754,685), and the total number of known and predicted miRNA sequences in the Al-treated library (184,734) was 1.49-fold higher than in the Al-free library (124,343) (Table 1). A total of 177 miRNAs were identified in this study. Among them, 92.10% of the miRNAs detected in the Al-treated library were also found in the Al-free library, whereas, 3.95% of the miRNAs were found only in the Al-treated library, and 3.95% miRNAs were found only in the Al-free library (Additional file 1, Table 2).

Table 2 Identification of novel microRNAs in wild soybean

Identification of known miRNAs in wild soybean

To identify known miRNAs in the two libraries, the clean reads were compared with known miRNA precursor or mature miRNA sequences in miRBase 18.0 allowing no more than two mismatches. A total of 97 known miRNAs were identified in the two samll RNA libraries (Additional file 1). Based on their similarity to the known miRNAs, the G. soja miRNAs were classified into two groups. Group I comprised 57 unconserved miRNAs that were either present in wild soybean or other legumes (Additional file 1). Thirteen wild soybean specific miRNAs were described in miRbase 18.0 [46] and all but one of them (gso-miR3522b) were identified in our libraries. Most of these wild soybean specific miRNAs were relatively highly expressed in the two root libraries; among them, gso-miR2218 was the most abundant in the two libraries (7,282 and 19,061 reads in the Al-free and Al-treated libraries, respectively), followed by gso-miRNA1509a. However, with the exception of PN-miR1509b (PN indicates a potentially novel miRNA), most of the legume-specific miRNAs had relatively lower levels of expression. The largest miRNA family in the two libraries was PN-miR1520 with 4 members. Group II contained 40 highly conserved miRNAs (Additional file 1). Of these, the most abundant was miR159a with 2,363 and 2,121 reads in the Al-free and Al-treated libraries respectively. Conserved miRNAs are known to have important functions in plant development and stress response [26]. In the present study, 19 highly conserved miRNA families were identified. The largest conserved miRNA family was miR156 with 9 members.

The p3/p5 strands of known miRNAs have been used as strong evidence to identify miRNAs [57] and 49 p3/p5 strands of known miRNAs were detected in the present study (Additional file 1). Generally, p3/p5 strands of miRNAs are thought to be more unstable than other miRNAs in cells, and the numbers of them have been assumed to be ten times less than those of other mature miRNAs [26, 58]. In the present study, the p3/p5 strands of most of the known miRNAs had relatively low expression. However, the PN-mir4415-p3, gso-mir2109-p3 and gso-mir1510b-p5 sequences occurred more frequently than the corresponding mature miRNAs, suggesting that these candidates might be true mature miRNAs.

Identification of novel miRNAs from wild soybean

The formation of stable hairpin structures has been suggested as a prerequisite for the annotation of new miRNAs [57, 59]. To identify novel miRNAs, we used the M-fold web server to predict the secondary structures of the candidate miRNA precursors [60] and found that 29 of the potential pre-miRNAs met this requirement. Two of the 29 pre-miRNA were predicted to generated two miRNAs, one from the 3’ arm (3p) and one from the 5’ arm (5p) (Table 2). The 31 novel miRNAs were 16 to 25 nt long; 51.61% of them were 24 nt long (Table 2). Most of novel miRNAs had relatively low expressions, and only 12 of the novel miRNA candidates had more than 20 reads in the two libraries (Table 2). The novel miRNAs were all generated from one locus.

When the strict criteria defined by Meyers and coworkers [57] was used to filter all the results, totally six pairs of miRNA/miRNA* strictly met all three of those characteristics. Those miRNAs were PN-miR1507c/PN-miR1507c*, PN-miR862a/PN-miR862a-5p, PN-miR1509b/PN-mir1509b-p3, PN-miR169c/PN-mir169c-p3, PN-miR390b/PN-mir390b-p3 and PN-miR4415/PN-mir4415-p3 (Figure 2). However, because of the unstable of miRNA* in cells [26], when we filtered the results, we did not strictly regard to parameter of miRNA and miRNA* which were derived from opposite stem-arms and formed a duplex with two nucleotide, 3’ overhangs.

Figure 2
figure 2

Precursors structures of six pairs of miRNA/miRNA*s strictly meet characteristics mentioned by Meyers et al. (A-F) The precursor sequences of PN-miR1507c/PN-miR1507c*, PN-miR862a/PN-miR862a-5p, PN-miR169c/PN-mir169c-p3, PN-miR390b/PN-mir390b-p3, PN-miR1509b/PN-mir1509b-p3 and PN-miR4415/PN-mir4415-p3, respectively.

Identification of Al-responsive wild soybean miRNAs

To identify miRNAs responsive to Al stress, the differential expression of miRNAs in the two libraries was compared using the read counts obtained from the high-throughput sequencing. Because some of the miRNAs in this study had extremely low abundances, which might lead to false results, known miRNAs with less than 10 raw reads and novel miRNAs with less than 20 raw reads in the Al-free and Al-treated libraries were removed from the expression analysis. In the two libraries, miRNAs with log2 fold changes higher than 1 were designated as ‘up-regulated’. Similarly, miRNAs with the log2 fold changes less than −1 were designated as ‘down-regulated’. Thirty miRNAs belonging to 26 families were differentially expressed between the two libraries (Table 3); 12 miRNAs were down-regulated and 18 were up-regulated by Al exposure. Of the 12 wild soybean specific miRNAs, gso-miR2218, the miRNA that had the highest expression abundant in the two libraries, was up-regulated by Al stress. Seven legume-specific miRNAs were response to Al stress. Among them, PN-miR1509b had the highest up-regulated fold change. The p3/p5 strands of five unconserved miRNAs (gso-mir1509a-p3, gso-mir1510a-p5, gso-mir2109-p3, PN-miR4387e-p5 and PN-mir4415-p3) were differentially expressed between the two libraries. Among the highly conserved miRNAs, five (PN-miR169c, PN-miR390b, PN-miR396a and PN-miR403a, b) were up-regulated and five belonging to four conserved families (PN-miR156b,c,d, PN-miR164 and PN-miR2111) were down-regulated in response to Al stress. Seven novel miRNAs showed differential expression between the two libraries; three were up-regulated and four were down-regulated.

Table 3 Profiles of the differentially expressed miRNAs responsive to Al stress

Based on the high-throughput sequencing results, ten miRNAs were selected for qRT-PCR to validate their expression patterns. As shown in Figure 3, the expression patterns of three of the selected down-regulated miRNAs and five of the selected up-regulated miRNAs obtained by qRT-PCR was similar to results from high-throughput sequencing; however, in the qRT-PCR analysis, the expression of PN-miR862a did not change under Al stress, and the qRT-PCR expression pattern of PN-miR1514a was not consistent with that from the high-throughput sequencing. The fold changes obtained from the qRT-PCR were much lower than that those obtained from the high-throughput sequencing, possibly because of differences in the sensitivity and specificity between qRT-PCR and high-throughput sequencing technology.

Figure 3
figure 3

Differential expressions of ten miRNAs that were responsive to Al stress.

Identification of targets of miRNAs in wild soybean

Target validation is important to thoroughly elucidate the biological roles of miRNAs. To date, only two targets for wild soybean miRNAs have been identified by 5’RACE [46]. To identify the targets cleaved by the candidate miRNAs identified in the present study, the recently developed high-throughput degradome sequencing technology was adopted to perform a genome-wide analysis of the mRNAs potentially cleaved by the miRNAs [47]. In total, we obtained 16,979,070 and 16,064,291 raw reads from the Al-free and Al-treated libraries, respectively (Additional file 2). After removing the reads without the CAGCAG adaptor, 16,508,834 and 15,539,593 distinct reads in Al-free and Al-treated libraries, respectively, were obtained. The distinct sequences were aligned to the G. max genome database, and 8,407,136 and 7,960,260 reads from the Al-free and Al-treated libraries, respectively, were mapped to the genome. The mapped reads from the Al-free and Al-treated libraries represented 50,619 and 51,077 annotated G. max genes, respectively (Additional file 2). The CleaveLane pipeline reported previously was adopted to identify the sliced targets for the known miRNAs and novel miRNA candidates [61]. The sliced target transcripts were categorized into four groups according to the relative abundance of the tags at the target mRNA sites (Figure 4).

Figure 4
figure 4

T-plots of miRNA targets in the four different categories. The T-plots show the distribution of the degradome tags along the full-length of the target mRNA sequence (bottom). The red line represents the sliced target transcripts and is shown by an arrow. The alignments show the miRNA with a portion of its target sequence (top). Two dots indicate matched RNA base pairs; one dot indicates a GU mismatch. The arrow shows the cleavage site. (A) Example of the category I target Glyma18g03980 for gso-miR1509a. (B) Example of the category II target Glyma18g07890 for PN-miR169c. (C) Example of the category III target Glyma12g35720 for PN-miR319a. (D) Example of the category IV target Glyma13g04540 for gso-mir167a-p3. The categories were based on the relative abundance of the tags at the target sites.

In total, 91 targets that could potentially be cleaved by the miRNAs were identified in the two libraries (Additional file 3, Table 4). The AgriGO toolkit was used to do gene ontology (GO) analysis [62]. Of the 91 targets, 73 were found to have at least one GO annotation (Figure 5). More than 80% of the target genes were annotated as being involved in regulation of biological processes and this term was more enriched in the miRNA targets than in the soybean genes as a whole. The enrichment of genes involved in the regulation of biological processes may be consistent with the observation that miRNA targets are mainly involved in regulating development [26].

Table 4 Identified targets of the novel miRNAs in wild soybean
Figure 5
figure 5

GO analyses of the targets of the known and new miRNAs in Glycine soja . Blue bars indicate the enrichment of the GO terms in the miRNA targets in GO. Green bars indicate the percentage of the total annotated soybean genes that were mapped to the GO terms.

In our degradome dataset, 86 target transcripts for 17 known miRNA families were identified in the two libraries (Additional file 3). Of the 86 target transcripts, 34 (39.53%) were found in both libraries, 45 (52.33%) were found only in the Al-treated library, and seven (8.14%) were found only in the Al-free library, showing that miRNA-mediated target cleavage was stimulated by Al stress. Furthermore, among the 86 targets, 16 for five unconserved miRNA families, 66 for 12 conserved miRNA families and three for the two p3/p5 strands of known miRNA were identified (Additional file 3, Table 4). When the transcript abundance and distribution patterns of the targets were analyzed in the two libraries, 23 (29.11%), 50 (63.29%) and six (7.60%) targets in the Al-treated library were found to be distributed into categories I, III and IV, respectively, and eight (19.51%), one (2.44%), 26 (63.41%) and six (14.63%) targets in the Al-free library were grouped into categories I, II, III and IV, respectively (Additional file 3).

In most cases, the identified miRNAs were predicted to cleave two or more different targets. For example, nine members of CCAAT-binding transcription factor family genes were predicted to be cleaved by PN-miR169 (identified in the Al-treated library), and PN-miR319 was predicted to slice eight genes belonging to the Myb and TCP families of transcription factors (T-plots of six of the targets are presented in Figure 6). In contrast, only one target each was identified for gso-miR1509, gso-miR2109 and PN-miR399 (Additional file 3). Interestingly, some transcripts were found to be regulated by pairs of miRNAs. For instance, PN-miR159 and PN-miR319 targeted two Myb family transcription factors (Glyma13g04030 and Glyma20g11040), and PN-miR156 and PN-miR157 sliced five members of the SBP domain protein family, suggesting that pairs of miRNAs might have a combinatorial function in gene regulation networks (Additional file 3).

Figure 6
figure 6

T-plots of the targets cleaved by the miR319 family. The T-plots show the distribution of the degradome tags along the full-length of the target mRNA sequence (bottom). The red line represents the sliced target transcripts and is shown by an arrow. The alignment shows the miRNA with a portion of its target sequence (top). The two dots indicate matched RNA base pairs; one dot indicates a GU mismatch. The arrow shows the cleavage site. (A, B, C, and D) Examples of different TCP transcription factors as targets for PN-miR319a. (E, F) Examples of different Myb transcription factors as targets for PN-miR319.

The GO annotations of the target transcripts in our study revealed that 52 of the targets that were cleaved by the eight conserved miRNA families played roles in transcription regulation. These miRNA families and their targets are highly conserved in plants, suggesting that the conserved miRNAs might act as master modulators in gene expression networks [26, 36]. Of the eight conserved miRNAs, PN-miR169 targeted nine CCAAT-binding transcription factors of which eight were found in both libraries; the other was found only in the Al-treated library (Additional file 3). In contrast, the auxin response factor and Myb transcription factor genes cleaved by miR160, miR159 and miR319, were found only in the Al-treated libraries. However, only three no apical meristem (NAM) protein genes out of the 16 targets cleaved by the unconserved miRNAs, were found to act as transcription factors. All other targets were annotated as involved in different biological functions, suggesting that the unconserved miRNAs might play special roles in gene expression networks.

However, in this study, we found only five unique transcripts belonging to three signal conduct gene families (cation transporting ATPase, protein tyrosine kinase, and TPR repeat-containing protein) cleaved by PC-46-5p. Two TPR repeat-containing proteins were identified in both libraries, two cation transporting ATPases were found only in the Al-treated library and a protein tyrosine kinase was found only in the Al-free library. When the transcript abundance and distribution pattern of the five transcripts were analyzed, all of them fell into category III.


Identification of wild soybean miRNAs by high-throughput sequencing

Recently, high-throughput sequencing has been used to systemically identify plant miRNAs responsive to abiotic stress in several plant species, and this has greatly advanced our knowledge of the miRNAs functions in stress tolerance [36, 37, 50]. To study the roles of miRNAs in gene regulation under Al stress, we constructed two libraries from the roots of wild soybean seedlings treated either with Al or without Al. The number of miRNAs identified in our study far exceeds the previous report in which only 15 conserved miRNAs and nine novel miRNAs were identified [46]. Of the miRNAs obtained by high-throughput sequencing, almost half (44.52%) of the known miRNAs had relatively low expression abundance (less than 10 raw reads) (Additional file 1), indicating that high-throughput sequencing is a most powerful strategy for the identification miRNAs, especially those with low expression levels, in plants. When we compared our miRNA dataset to the G. soja miRNAs reported previously, we found that most of the known miRNA families had been recovered; however, miR171 and gso-miR3522b were not found in our study (Additional file 1) [46]. This might be because different wild soybean seedling tissues were used in the two studies.

We found that only 10 conserved miRNAs belonging to seven conserved miRNA families were responsive to Al stress. However, 13 unconserved miRNAs and seven novel miRNAs were responsive to Al stress (Table 3). These results indicated that the unconserved miRNAs might play more important roles in the regulation of the plant’s tolerance to Al stress. A previous study of miRNAs in M. truncatula using a bioinformatic approach combined with validation by qRT-PCR, found that some conserved miRNAs, such as miR166 and miR398, were down-regulated, and some, miR171, miR319, miR393, and miR519, were up-regulated in response to Al stress [63]. Subsequently, in a high-throughput sequencing study in M. truncatula, miR160, miR319, miR396, miR1507 miR1510a and miR390 were found to be down-regulated after treatment with Al, while miR166 and miR171 were not responsive to Al [50]. In this study, miR396 and miR390 were up-regulated in response to Al which are different from the results of Chen and coworkers [50] (Table 3). Furthermore, we found that miR1510a was not responsive to Al stress, while mir1510a-p5 was up-regulated under Al stress. The different results might be caused by differences in the genome and tolerance mechanisms between M. truncatula and G. soja.

Cross adaptation is a common phenomenon in plants exposed to different combinations of stresses [64]. Researches have revealed that some conserved miRNAs that are responsive to biotic and abiotic stresses might play roles in cross adaptation [50]. In this study, miR156 which were reported earlier to be down-regulated in response to cadmium treatment [65] were found to be down-regulated under Al stress. In Arabidopsis, miR164 was reported to be induced by drought stress which cleaved the NAC1 transcription factor gene leading to the down-regulation of auxin signals and resulting in reduction in lateral root emergence [66]. We found that miR164 was down-regulated under Al stress (Table 3, Figure 3). In plants, miR169 was found to be responsive to abiotic stresses such as cold, drought, and salinity [6771]. Recently, miR169 were also found to be responsive to some biotic stresses such as a virulent form of the bacterium, P. syringae pv. tomato DC3000 in Arabidopsis [72] and Fusarium virguliforme, the causal agent of sudden death syndrome, in soybean [73]. Our high-throughput sequencing results showed that miR169 was down-regulated under Al stress (Table 3). These findings suggest that the conserved miRNAs might take part in cross adaptation to regulate plant tolerance to biotic and abiotic stresses.

The targets of wild soybean miRNAs identified by high-throughput degradome sequencing

In plants, degradome sequencing has been shown to be an efficient strategy to identify targets of miRNAs [47, 49, 51]. In wild soybean, many miRNA targets have been predicted, but only two of them had been identified by 5-RACE [46]. In this study, 86 target transcripts for 17 known miRNA families and 5 targets for a novel miRNA family were identified through degradome sequencing technology (Additional file 3, Table 4). Previous researches have revealed that miRNAs have a strong preference for genes associated with development, particularly for genes encoding transcription factors and F-box proteins [26]. We found that 52 of the targets for the conserved miRNAs were involved in transcription regulation (Additional file 4); they included the Myb, ARF, WRC, SBP, TCP, SPT6, AP2 and CCAAT type transcription factor gene families which are conserved in other plant species [7477]. This result suggested that the targets of conserved miRNAs might participate in various aspects of plant development and stress responses and may act as the main nodes in gene expression networks in plants.

In our degradome sequencing experiment, we found 12 conserved miRNA families that had detectable cleavage targets. However, only six unconserved miRNAs and one novel miRNAs had detectable cleavage targets (Additional file 3, Table 4). A similar result was reported in M. truncatula[36] and G. max[38]. There were two possible explanations for these results. First, in plants, the miRNA regulation mechanism is not only by mRNA cleavage but also by translational repression [24] and second, some targets could not be identified because of differences in the spatial or temporal expression of a miRNA and its target which might cause insufficient degradation of the target [78]. Here, while 16,508,834 and 15,539,593 distinct reads were obtained in the Al-free and Al-treated degradome libraries respectively, only 0.0039% and 0.0065% were identified as targets of miRNAs (Additional file 3, Table 4). The cleavage fragments that were not identified as targets of miRNAs might be caused by random cleavage during gene transcription or by other small interference RNAs.

Moreover, we found that the targets cleaved by miRNAs in the Al-treated library were different from those in the Al-free library. For example, the cleavage fragments of gso-miR2109, PN-miR1514, PN-miR159, PN-miR160, and PN-miR394 were found only in the Al-treated library. Furthermore, cleavage fragment of the eight genes targeted by gso-miR1510 were found in the Al-treated library, but only two of them were identified in the Al-free library (Additional file 3). In our study, the number of genes cleaved by PN-miR1514, PN-miR396 and PN-miR169 which were up-regulated under Al stress was more in the Al-treated library than in Al-free library. In addition, more tags of target transcripts sliced by PN-miR1509, PN-miR393 and PN-miR403 which were up-regulated under Al stress were detected in the Al-free library (Additional file 3, Table 4). This indicated that Al exposure strongly affected the cleavage of target transcripts mediated by miRNAs.

The target transcripts for which cleavage fragments were found only in the Al-treated library were involved in stress responses and included the mRNAs for NB-ARC domain proteins, LRR domain protein, NAM protein, Myb family transcription factors, auxin response factor, and cation transporting ATPase. This finding might be the result of the temporal expression of miRNAs and their target genes. The NB-ARC, LRR and TIR domains have been identified in plant resistance proteins involved in pathogen recognition and subsequent activation of innate immune responses [7981]. The cleavage of TIR-NBS-LRR type transcripts were reported in M. truncatula under mercury stress [36]. We found that NB-ARC domain, LRR domain and TIR domain transcripts (encoding the three domain TIR-NBS-LRR protein) were cleaved by the gso-miR1510 family of miRNAs. Interestingly, the cleavage fragments of NB-ARC domain, LRR protein were found only in the Al-treated library (Additional file 3), suggesting that the TIR-NBS-LRR protein might take part in a regulating pathway involved in the recognition of biotic and abiotic stresses. The cleavage of the TIR-NBS-LRR protein under Al stress might result in the disruption of the immune system which might increase the susceptibility of the plant to other stresses.

The perception and transmission of stress signals mediated by hormones could play important roles in Al tolerance in plants. Recent studies revealed that the inhibition of root elongation, a typical symptom of Al toxicity, might be was caused by disruption of auxin distribution in roots [82, 83]. It was reported that cleavage of the auxin response factor (ARF) by miR160 regulated the development of the root cap. In Arabidopsis, when miR160 was over-expressed, three ARF genes were barely detectable and the root length of the transgenic seedlings was reduced [84]. In the present study, seven genes belonging to the ARF family were found to be cleaved by miR160 only in the Al-treated library (Additional file 3). Previously, NAC1 was found to mediate auxin signaling to promote lateral root development [85, 86]. Transgenic plants expressing antisense NAC1 cDNA show a reduction of lateral roots. The no apical meristem protein is a member of the NAC domain superfamily. In the Al-treated library, no apical meristem was detected to be cleaved by PN-miR1514 (Additional file 3) and the expression of PN-miR1514 was found to be up-regulated under Al stress. These results suggest that the cleavage of ARF transcripts by miR160 and NAM transcripts by miR1514 might be involved in the plant’s response to auxin which may regulate the inhibition of root development under Al toxicity. In G. max, ABA accumulation was induced by Al treatment, and the tendency of ABA to be distributed in the Al-exposed root was shown by a split-root experiment [87]. Research revealed that the Myb transcription factors might take part in the response to ABA accumulation, and the cleavage of Myb transcription factors mediated by miR159 has been reported in Arabidopsis under drought stress [88]. We also found that the cleavage of Myb transcription factors was mediated by miR159 in G. soja in this study (Additional file 3). Together these results indicated that the cleavage of hormone-related genes might affect the response of wild soybean to hormones potentially affecting the plant’s tolerance to Al stress and causing metabolic dysfunction.

Peroxide stress often occurs concurrently with Al stress. It has been shown that activated antioxidative enzymes and other antioxidant metabolites are beneficial for plant growth under Al stress, because they may contribute to the removal of excess reactive oxygen species and inhibit lipid peroxidation [89]. Cation transporting ATPase has been reported to plays significant roles in adaptive responses to oxidative stress by removing excessive Ca2+ from the cytosol [90, 91]. The cleavage of the transcripts of the cation transporting ATPase by PC-46-5p might play a role in antioxidative systems induced by Al toxicity (Additional file 3).


In our study, two samll RNA libraries and two degradome libraries were constructed from the roots of wild soybean seedlings for deep sequencing. We obtained a total of 8,616,284 and 8,712,410 raw sequences from the Al-treated and Al-free libraries, respectively, and predicted 31 new miRNAs in wild soybean by bioinformatic analysis. We discovered 30 miRNAs that were responsive to Al3+. These findings provided valuable information for the identification of miRNAs in wild soybean and could be used for the functional characterization of miRNAs in the response of legume plants to Al3+ phytotoxicity. Through degradome sequencing, we detected 91 targets cleavage by conserved, unconserved and novel miRNAs in wild soybean. Some of miRNAs and their targets were related to biotic and abiotic stresses. The expressions of the miRNAs and targets identified in our study were shown to be regulated by Al stress. This finding suggests that Al can trigger protective mechanisms involving miRNAs that can improve the plant’s tolerance to Al toxicity. The identification of the new candidate miRNAs and their target genes should contribute to our understanding of gene regulatory frameworks in plants, and may provide insights into the role of miRNAs and their targets in regulating plant tolerance to Al stress.


Plant culture and treatment

Wild soybean seeds (BW69) were grown in a chamber with the following settings: 70% relative humidity, 28°C / 25°C and a light regime of 14 h light / 10 h dark. Before sowing, episperm of seeds were cut carefully using a knife. The seed surface was sterilized with 70% ethanol for 30 s and subsequently washed in deionized water. Then, seeds were sown in quartz sand and left for 4 d to germinate. Germinated seedlings were then transferred into growth boxes. Sixty seedlings were cultured in a 6 L vessel containing a nutrient solution made up of 0.75 mM KNO3, 0.25 mM Ca(NO3)2.4 H2O, 0.325 mM MgSO4.7 H2O, 20 μM Fe(III)-EDTA, 8 μM H3BO3, 10 μM KH2PO4, 0.2 μM (NH4)6MO7O2.4 H2O, 0.2 μM CuSO4.5 H2O, 0.2 μM ZnSO4.7 H2O and 0.2 μM MnCl2.4 H2O. Two days after transplanting, the seedlings were transferred into nutrient solution with either 0 (Al-free) or 50 (Al-treated) μM AlCl3 (pH 4.5). Roots 10 cm from the root apex were harvested at 1, 3, 6, 12, 24, 48 and 96 h after the initiation of Al stress. The samples were immediately frozen in liquid nitrogen, and stored at −80°C. To minimize biological variance, 20 roots from 4 repeats were pooled.

Small RNA library construction, sequencing and sequencing data analysis

Two sets of sample were prepared; one set was derived from the Al-treated roots harvested at 1, 3, 6, 12, 24, 48 and 96 h time points and the other set was from the Al-free roots harvested at the same time points. Total RNA was isolated with the Total RNA Purification Kit (Norgen Biotek Corporation, Thorold, Canada) according to the manufacturer’s instructions. Small RNA libraries were generated from the two samples using the Illumina Truseq Small RNA Preparation kit (Illumina, San Diego, USA) according to Illumina’s TruSeq Small RNA Sample Preparation Guide. The purified cDNA library was used for cluster generation on Illumina’s Cluster Station and then sequenced on an Illumina GAIIx (Illumina) following the vendor’s instructions for running the instrument. Raw sequencing reads were obtained using Illumina’s Sequencing Control Studio software version 2.8 (SCS v2.8) (Illumina) following real-time sequencing image analysis and base-calling by Illumina's Real-Time Analysis version 1.8.70 (RTA v1.8.70) (Illumina). A proprietary pipeline script, ACGT101-miR v4.2 (LC Sciences, Houston, TX, USA), was used for sequencing data analysis (Li et al. 2010; Wei et al. 2011). Because of the limited sequence information for wild soybean, the G. max database was employed for identification of the miRNAs and for the prediction of secondary structure [56]. Then we blasted all our predicted precursors to 18,511 ESTs and 180153 GSSs of G. soja registered in ncbi [92], and we gave an instruction of all the precursor sequences whether they could be mapped to G. soja EST or GSS.

Degradome library construction and bioinformatics analysis

The wild soybean root degradome library was constructed following the methods described previously by German and coworkers [48]. The G. max database and transcript sequences were used as the reference sequences [56]. The publicly available CleaveLand 3.0 software package and the ACGT301-DGEv1.0 program (LC Sciences, Houston, TX, USA) were used to detect potentially sliced targets of the known and novel miRNA identified by high-throughput sequencing and degradome analysis [47, 61]. The G. soja miRNA to G. max sequence alignments were then scored as follow: mismatched pairs or single nucleotide bulges were each scored as 1 and GU pairs were scored as 0.5. The mismatched and GU pair scores within the core segment (at positions 2–13) were doubled. All targets were classified into four categories based on the abundance of the resulting mRNA tag relative to the overall profile of degradome reads that matched the target [47, 61]. In category I, the abundance of cleavage sequences was equal to the most abundant degradome sequences on the transcript, and there was only one maximum on the transcript; in category II, the abundance of the degradome sequences at the cleavage site was equal to the maximum abundance on the transcript, and there was more than one maximum on the transcript; in category III, the abundance of cleavage sequences was less than the maximum but higher than the median for the transcript; in category IV, the abundance at cleavage site was equal to or less than the median for the transcripts. The optimized score thresholds were set to 4.5 for category I, 4 for category II, 3.5 for category III, and 3 for category IV. These thresholds were used to select the resulting output. The gene ontology (GO) analysis of the candidate target transcripts of the known and new miRNAs identified in this study was performed using the AgriGO toolkit [62].

Quantitative real-time PCR

Total RNA was extracted from the G. soja roots from Al-free and Al-treated samples with TRIZOL reagent following the manufacturer’s instructions (Invitrogen). These samples were collected at the same time as those for miRNAs and degradome sequencing. The total RNA (1 μg) was treated with DNase I and reverse-transcribed using miRNA specific stemloop primers (Additional file 5), reverse-transcriptase and deoxynucleotide triphosphates. qRT-PCR analysis was carried out using the SsoFast EvaGreen Supermix kit (BIO-RAD) in a CXF96 (BIO-RAD) qRT-PCR System with 166 ng cDNA and 6 pmol of each gene-specific primer (Additional file 5). The analysis was performed using two independent cDNA preparations and triplicate PCR reactions. The relative expression ratio was calculated using the 2-ΔΔCt method with ACT3 as the reference gene.