Dicer-like (DCL) proteins in plants
- First Online:
- Cite this article as:
- Liu, Q., Feng, Y. & Zhu, Z. Funct Integr Genomics (2009) 9: 277. doi:10.1007/s10142-009-0111-5
- 1.5k Views
Dicer and Dicer-like (DCL) proteins are key components in small RNA biogenesis. DCLs form a small protein family in plants whose diversification time dates to the emergence of mosses (Physcomitrella patens). DCLs are ubiquitously but not evenly expressed in tissues, at different developmental stages, and in response to environmental stresses. In Arabidopsis, AtDCL1, AtDCL2, and AtDCL4 exhibit similar expression pattern during the leaf or stem development, which is distinguished from AtDCL3. However, distinct expression profiles for all DCLs are found during the development of reproductive organs flower and seed. The grape VvDCL1 and VvDCL3 may act sequentially to face the fungi challenge. Overall, the responses of DCLs to drought, cold, and salt are quite different, indicating that plants might have specialized regulatory mechanism in response to different abiotic stresses. Further analysis of the promoter regions reveals a few of cis-elements that are hormone- and stress-responsive and developmental-related. However, gain and loss of cis-elements are frequent during evolution, and not only paralogous but also orthologous DCLs have dissimilar cis-element organization. In addition to cis-elements, AtDCL1 is probably regulated by both ath-miR162 and ath-miR414. Posterior analysis has identified some critical amino acid sites that are responsible for functional divergence between DCL family members. These findings provide new insights into understanding DCL protein functions.
Genetic and biochemical evidence has demonstrated that small RNAs such as microRNAs (miRNAs) and small interfering RNAs (siRNAs) in eukaryotic organisms play important roles in developmental regulation (Kidner and Martienssen 2005), epigenetic modifications (Vaucheret 2006), tumorigenesis (Murakami et al. 2006), and biotic and abiotic stress responses (Llave 2004). The two kinds of non-coding RNAs, miRNAs and siRNAs, are produced from different types of precursors (Millar and Waterhouse 2005; Groβhans and Filipowicz 2008). Dicer or Dicer-like (DCL) proteins are key components in the miRNA and siRNA biogenesis pathways in processing long double-stranded RNAs into mature small RNAs (Millar and Waterhouse 2005; Chapman and Carrington 2007; Groβhans and Filipowicz 2008). In higher plants, insects, protozoa, and some fungi such as Neurospora crassa and Magnaporthe oryzae, Dicer or DCLs form a small gene family being composed of two, four, or five members, whereas only one Dicer protein is found in vertebrates, nematodes, Schizosaccharomyces pombe, and green alga Chlamydomonas reinhardtii.
Dicer and DCL proteins are large multi-domain ribonucleases. Vertebrate, insect, and plant Dicer and DCL proteins generally contain six types of domains including DEAD box, helicase-C, DUF283, PAZ, RNase III, and dsRBD (Margis et al. 2006). In lower eukaryotes, one or more of these domains may be absent. The PAZ, RNase III, and dsRBD domains are considered to function in dsRNA binding and cleavage. The PAZ domain of Dicer is directly connected to the RNase IIIa domain by a long α helix and can specifically bind the end of dsRNA containing a 3′ two-base overhang (MacRae et al. 2006). In addition, the PAZ domain also plays a role in binding single-stranded RNAs (Kini and Walton 2007). Zhang et al. (2004) suggested that Dicer functions through intramolecular dimerization of its two RNase III domains. Structural and biochemical analysis of mouse Dicer (Du et al. 2008) revealed four RNA binding motifs (RBMs 1–4) with RBMs 1 and 2 in dsRBD and RBMs 3 and 4 in RNase IIIb; importantly, a highly conserved lysine residue in Dicer RNase IIIa and IIIb has been suggested to be critical for dsRNA cleavage. In addition to dsRNA binding, the RNase III domain is found to directly bind to the PIWI box of Argonaute proteins, which is dependent on the activity of Hsp90 (Tahbaz et al. 2004). It is worth noting that Dicer itself could act as a molecular ruler, as the distance between the PAZ and RNase III domains (65 Å) matches the length spanned by 25 bp of RNA (MacRae et al. 2006). The dsRBD domain has been suggested to play a role in mediating the processes of discriminating different RNA substrates and the subsequent incorporation of effector complexes (Margis et al. 2006). In higher eukaryotes, the DUF283 domain is proposed to be involved in siRNA/miRNA strand selection by recognizing the asymmetry of RNA duplexes directly or by recruiting another dsRBD protein (Dlakić 2006).
In view of the importance of miRNA/siRNA biogenesis, Dicer and DCL proteins are essential for eukaryote development and viral defense. Mutagenesis studies have indicated that Dicer is indispensable to normal germline development for Caenorhabditis elegans (Knight and Bass 2001) and maintaining two types of stem cells (GSCs and SSCs) in the Drosophila ovary (Jin and Xie 2007). Knockout of Dicer in mouse oocytes results in an inability to progress through first meiotic division due to disorganized spindles and chromosome congression defects (Murchison et al. 2007). Moreover, Dicer is also found to play pivotal roles in embryogenesis (Yang et al. 2005), lung epithelium morphogenesis (Harris et al. 2006), limb development (Harfe et al. 2005), and apoptosis (Matskevich and Moelling 2008). In Drosophila, the two Dicers have distinct but related roles (Lee et al. 2004): Dicer-1 processes miRNA precursors, whereas Dicer-2 is necessary for processing siRNA precursors. Both Dicer-1 and Dicer-2 are required for siRNA-directed mRNA cleavage, and a role for Dicer to protect host against virus infection has been established (Millar and Waterhouse 2005). In mammals, the absence of Dicer leads to a modest increase of virus production and accelerated apoptosis of influenza A virus-infected cells (Matskevich and Moelling 2007). Flies with a loss-of-function of Dicer-2 are more susceptible to infection by flock house virus than the wild type, demonstrating the importance of Dicer-2 in virus defense (Galiana-Arnoux et al. 2006).
Relative to animals and fungi, the notable expansion of DCL family members in monocot and dicot plants may reflect the deployment of RNA silencing approach in antiviral defense (Deleris et al. 2006; Margis et al. 2006). In Arabidopsis thaliana, four Dicer-like proteins (DCL1–DCL4) with different roles are found (Xie et al. 2004; Dunoyer et al. 2005; Moissiard et al. 2007; Mlotshwa et al. 2008): DCL1 not only is associated with miRNA production but also has a role in the production of small RNAs from endogenous inverted repeats. The other three DCLs are siRNA-generating enzymes. DCL2 generates siRNAs from natural cis-acting antisense transcripts and functions in viral resistance. DCL3 generates siRNAs for a guide of chromatin modification, while DCL4 is associated with tasiRNA metabolism and acts during posttranscriptional silencing (Liu et al. 2007). The functions of DCL1 and DCL3 overlap to promote Arabidopsis flowering (Schmitz et al. 2007). Overlaps in function are also found for DCL2 and DCL4 with respect to antiviral defense (Deleris et al. 2006) and for DCL2, DCL3, and DCL4 in siRNA and tasiRNA production and in the establishment and maintenance of DNA methylation (Henderson et al. 2006).
Dicer expression studies in humans have shown that 5′-UTR variants generally repressed translational efficiency, and its diversity determines tissue- and developmental-specific expression patterns (Singh et al. 2005). In Arabidopsis, it was found that loss-of-function of all four DCLs causes ABA supersensitive during seed germination possibly due to the fact that the biogenesis of one or more special microRNAs function in ABA signaling (Zhang et al. 2008). Several ABA responsive cis-acting elements are found in the promoter region of DCL genes as discussed later.
In order to gain more insights about the DCL protein families, a comprehensive survey was conducted by utilizing various data sources in public domain.
Databases and methodologies used in the survey
A. thaliana DCL sequences downloaded from the GenBank database were used as query to BLAST search against the Oryza sativa, Vitis vinifera, Populus trichocarpa, and Sorghum bicolor genomes. In order to exhaustively seek for homologues, the ENSEMBL and GenBank databases were also searched using the programs BLASTN and BLASTP, respectively. In addition, the human Dicer was used as query to collect orthologous Dicers from fungi and other animals. Program InterProScan (Quevillon et al. 2005) was employed to detect conserved domains within Dicer and DCL protein candidates.
Gene expression microarray datasets (GSE7951 and GSE6901 for rice; GSE5621, GSE5623, GSE5624, GSE5630, GSE5632, GSE5633, GSE5634, and GSE607 for Arabidopsis; and GPL1320 for grape) were downloaded from the GEO database in NCBI. The microarray data of rice include the analysis of gene expression profiles in nine tissues (Li et al. 2007) and 7-day-old seedlings under drought, salt, and cold stress treatments (Jain et al. 2007). In Arabidopsis, the expression pattern of DCL genes that is in root and shoot under different stresses with different time treating and in various tissues from different developmental stages were investigated and compared (Bergmann et al. 2004; Schmid et al. 2005). The DCL gene expression profiles in grape cultivars Cabernet sauvignon and Norton infected with powdery mildew Erysiphe necator were analyzed to investigate their possible roles in disease resistance.
Program GEPS (Wang et al. 2006) was employed to quantitatively analyze the expression pattern of DCL genes. Similarity measure (SM) was used to quantify the similarity between gene expression profiles. A value of SM close to 1 indicates high similarity of two gene expression patterns irrespective of their absolute expression levels. Thus, a high SM value means that the corresponding genes may have related biological roles (Wang et al. 2006). In addition, specificity measure (SPM) was used to define the tissue-specific expression pattern of a gene, which may be useful for further understanding its physiological behaviors (Wang et al. 2006). In this survey, gene expression level of DCL genes was divided into three classes based on the SPM value: comparatively high expression (SPM ≥ 0.7), above average (0.7 > SPM ≥ 0.5), and below average (SPM < 0.5).
The 1,000 bp of nucleotide sequences upstream of the translation initiation codon for each DCL gene in three species (rice, Arabidopsis, and grape) was extracted using a custom PERL script and used further for the transcription factor binding sites (TFBSs) analysis. At present, no full-length cDNA sequences for grape exist in JGI and/or the GenBank database. In order to facilitate comparison between species, the sequences upstream of the translation initiation codon rather than the transcription start site were used to screen for possible cis-acting regulatory elements. The software PlantCARE (Lescot et al. 2002) was utilized to determine putative plant-specific TFBSs in a given DNA sequence. To reduce false positives, only TFBSs whose matrix score is not less than 5 were considered further.
DIVERGE, a program developed by Gu and Vander Velden (2002), was used to detect functional divergence between members of a protein family. The coefficient of type I functional divergence θ and likelihood ratio statistic (LRT) between any two DCL clusters were calculated. If θ is significantly greater than 0, it means altered selective constraints of amino acid sites after gene duplication (Gu and Vander Velden 2002).
Dicer and DCL protein sequences were aligned using the E-INS-I program implemented in MAFFT v6.6 (Katoh et al. 2005). Phylogenetic trees were reconstructed with MEGA v3.1 (Kumar et al. 2004) by employing the neighbor-joining (NJ) and minimal evolution (ME) method, respectively. For both the NJ and ME methods, the parameters p-distance model and pairwise deletion of gaps/missing data were selected. Bootstrap test of phylogeny was performed with 1,000 replications. The phylogenetic trees were displayed using MEGA v3.1 (Kumar et al. 2004).
Expression pattern of DCL genes in different tissues
Developmental regulation of DCL gene expression
The results of similarity measure show that during the leaf or stem development, AtDCLs 1, 2, and 4 exhibit similar expression pattern, which is different from that of AtDCL3 (Fig. 2). Figure 2a shows that the expression level of AtDCL3 is low during the whole leaf developmental stage. The expression of AtDCLs 1, 2, and 4 fluctuate more extensively, reaching a peak at the senescing leaf stage (Fig. 2a). The AtDCLs could be also classified into two groups of expression patterns during the stem development. AtDCL3 represents the first group, while the other three AtDCLs form the second one (Fig. 2b). The specificity measure analysis reveals that AtDCL3 is significantly expressed in shoot apex, inflorescence (after bolting), whereas AtDCL2 shows a higher expression in shoot apex, transition (before bolting). In addition, it was observed that AtDCL1 has a higher expression in stem second internode than other AtDCLs, suggesting its particular role in the corresponding developmental stage.
AtDCLs show a wide diversity of expression profiles during the flower and seed developments (Fig. 2). AtDCLs 1, 3, and 4 show a tendency of decreasing expression from flower stage 9 to stage-12-equivalent and then rebound at flower stage 15. On the contrary, the expression of AtDCL2 is significantly higher at flower stage-12-equivalent (p < 0.001) and then decreases at stage 15 (Fig. 2c). Relative to other stages, AtDCL3 is relatively specifically expressed at flower stages 10–11 (SPM value, 0.619). However, all AtDCLs are weakly expressed in mature pollen. During the seed development, the expression of AtDCL2 keeps nearly constant; AtDCLs 1 and 3 show expression peak at the seventh and sixth seed stage respectively. Notably, the AtDCL4 expression increases significantly from seed stage 3 to stage 6 and then decreases dramatically up to stage 10 (Fig. 2d).
Expression profiles of DCL genes in response to stress
After cold treatment, the expression of AtDCL1 continues to increase and shows its highest level at 24.0 h in roots, and AtDCL4 decreased from 6.0 to 24.0 h. Similar patterns were observed in shoots where the expression of AtDCL1 has increased extensively after long time of cold treatment, whereas other AtDCLs show an inverse tendency (Fig. 4b).
More complicated expression patterns of AtDCLs were revealed after salt treatment (Fig. 4c). AtDCL1 showed a decreasing expression pattern, while AtDCL4 was significantly expressed at 12.0 h and then declined rapidly at 24.0 h. In contrast, the changes in expression for AtDCLs 2 and 3 are insignificant in shoots. In roots, the expression of AtDCL1 decreased promptly after 3 and 6-h salt treatment and subsequently recovered at 12.0 h. The AtDCL4 expression decreased with the time of salt treatment. Interestingly, AtDCL2 and AtDCL3 exhibit distinct expression patterns, namely, the former had the lowest expression level at 6.0 h, whereas AtDCL3 was significantly activated at the same time point.
Overall, it is evident that DCL action can be compartmentalized in different tissues (Xie et al. 2005) under different environmental conditions. These results imply that plants should have evolved specialized regulatory mechanisms in response to different abiotic stresses (Xie et al. 2004).
Regulatory elements for plant DCL genes
Transcription factors bind to corresponding TFBSs upstream from genes of interest and the profiles of cis-acting elements may thus provide information for understanding the regulatory mechanism of gene expression. A computational tool PlantCARE (Lescot et al. 2002) was adopted to identify putative TFBSs in the 1,000-bp DNA sequence upstream of the translation initiation codon of DCL genes in rice, Arabidopsis, and grape.
Light responsive elements such as Sp1 and GT1 box are redundantly present in the promoters of plant DCL genes (Electronic supplementary material Table S2). Sp1 is the most redundant cis-element found in rice DCLs. All but OsDCL3, where one GT1 box is found, have at least three copies of Sp1 elements. Arabidopsis and grape, in contrast, showed one Sp1 in AtDCL4 and VvDCL4, respectively. In addition, two and four GT1 boxes have been identified in VvDCL3 and AtDCL4, respectively. The second class of cis-element that enriches in the promoter region is the plant hormone response elements, such as ABRE, GARE, P-box, the TCA element, as well as the CGTCA and TGACG elements, suggesting that plant DCLs may play a role in the corresponding ABA, gibberellin, salicylic acid, and MeJA signaling pathways. The Skn-1 motif that is required for endosperm expression is also found frequently. With one exception (VvDCL1), all DCLs possessed this regulatory element (Electronic supplementary material Table S2). The presence of anaerobic and stress response elements such as the GC motif, MBS, HSE, LTR, and TC-rich repeats in the upstream regions of DCLs further supports the idea that plant DCLs function in a wide diversity of ways.
In addition, species- and/or DCL membership-specific cis-elements have been also observed. In six out of eight Arabidopsis and grape DCLs, a cis-element termed circadian that is involved in circadian control was found, whereas none of the rice OsDCLs possessed this element. AC-II, a cis-element required for xylem-specific expression, is specifically present in OsDCL2 and AtDCL2. Furthermore, orthologous DCLs have different cis-element organization as well (Electronic supplementary material Table S2). Consistent with this observation, Liu et al. (2007) revealed that loss-of-function mutations of OsDCL4 cause severe developmental defects in rice but not in Arabidopsis; thereby, they suggested that OsDCL4 may have evolved a much broader role in development than its Arabidopsis counterpart.
Interestingly, DCL1 participates in the processes of generating mature miRNA, and for feedback, miR162 negatively regulates the DCL1 expression (Xie et al. 2003). Moreover, using both FASTA and the plant microRNA potential target finder miRU (Zhang 2005), ath-miR414 is supposed to be another potential regulator for AtDCL1. However, osa-miR414 is not predicted to target OsDCL1.
Divergence between plant DCL family members
Functional divergence between subgroups of the plant DCL family
θ ± SE
Qk > 0.8
Qk > 0.9
0.528 ± 0.073
0.378 ± 0.072
0.476 ± 0.071
0.322 ± 0.052
0.234 ± 0.048
0.362 ± 0.053
To identify critical amino acid sites that may be responsible for functional divergence between DCL subgroups, the posterior probability (Qk) of divergence was determined for each site. According to the definition, large Qk indicates a high possibility that the functional constraint (or the evolutionary rate) of a site is different between two clusters (Gu 2003). The results showed that the functional divergence between DCL members would be partially attributed to the variation on several to tens of amino acid sites whose Qk value is greater than 0.8 (Tables 1 and Electronic supplementary material Table S3). Strong functional diversification is indicated to have occurred between DCL1/DCL2, DCL1/DCL3, and DCL1/DCL4 pairs, as there are two, two, and four amino acid sites with Qk > 0.9, respectively (Tables 1 and Electronic supplementary material Table S3). However, the diversification between DCL1 and DCL3 was not as strong as that between DCL1/DCL2 and DCL1/DCL4 because only two amino acid sites are found to be possible contributors. For the DCL2/DCL3 and DCL2/DCL4 pairs, there is only two and zero amino acid site with Qk > 0.8, respectively, suggesting that their diversification would be much weaker.
Based on the Gu (1999) method, the function of plant DCLs was revealed to be significantly divergent from each other. In agreement with the previous reports, DCL1 is found to be strongly divergent from other DCL family members, whereas the divergence between DCL2, DCL3, and DCL4 was relatively weak because no amino acid site with Qk > 0.9 was found for the corresponding gene pairs (Table 1). It was observed that most of the critical amino acid sites fall in the PAZ domain and the loops between Helicase-C and PAZ and between RNase IIIa and RNase IIIb (Electronic supplementary material Table S3). Electronic supplementary material Figure S1 shows the amino acid sites with Qk > 0.9 that is predicted to be highly functional divergence-related (Gu 2003). In DCL2, sites 1495 (Qk = 0.967) and 1846 (Qk = 0.978) are invariant for cysteine and arginine, respectively, whereas the same positions in DCL1 have several amino acids with different chemical properties, such as non-polar amino acid valine as well as uncharged polar amino acids cysteine and asparagines. Similar cases were also observed between DCL1/DCL3 and DCL1/DCL4 (Electronic supplementary material Fig. S1).
Evolutionary analysis of Dicer and DCL proteins
With regards genomic location, all four Arabidopsis AtDCLs are located outside of the putative duplicated segments, while in rice, OsDCLs 3, 4, and 5 are in regions which have undergone whole genome duplication events. It is evident, however, that in rice, the duplicated copies have been lost during evolution because only single copies of the genes remain. There exist five DCLs in the poplar genome (Margis et al. 2006). Based on the sequence similarity, it can be inferred that PtDCL5 has resulted from a recent gene duplication of its paralogues PtDCL2 probably after the speciation of poplar and grape (Fig. 6). The emergence of the fifth DCL in monocot species, however, should have occurred after the monocot–dicot split ∼200 million years ago (mya), but before the divergence of cereals approximately 70 mya (Margis et al. 2006). As suggested by Deleris et al. (2006), the extensive proliferation of the DCL family in plants might be partially attributed to the requirement for the existence of multiple antiviral DCL activities. It is therefore more reasonable to predict that the expansion of DCL proteins in plants may be an ongoing process.
The diversification of plant DCLs can be placed at the time before the emergence of moss P. patens, and their rapid proliferation is argued to have partially attributed to the need for plants to acquire resistance to viruses, bacteria, and fungi. The survey of upstream elements revealed three major classes of cis-elements in the promoter region of DCLs, and their distinct organization pattern is interpreted to reflect their varying participation in gene expression regulation. Importantly, the amino acid level analysis suggested that functional divergence has occurred between plant DCL proteins and identified the critical amino acid sites involved in this divergence for further investigation.
We thank Prof. Rudi Appels and Prof. Wujun Ma for their valuable and constructive suggestions and for careful editing of the manuscript. This work was supported by an intramural fund from Zhejiang Forestry University (to Qingpo Liu) and grants from National Basic Research Program of China (973 program; no. 2007CB109305), National Natural Science Foundation of China (no. 30740011), Zijin Program from Zhejiang University (to Y. Feng), the Special Fund for Grade B Innovative Research Team from Zhejiang Forestry University (to Z. Zhu).