Genome-wide Identification and Characterization of a Dehydrin Gene Family in Poplar (Populus trichocarpa)
- First Online:
- Cite this article as:
- Liu, C., Li, C., Liu, B. et al. Plant Mol Biol Rep (2012) 30: 848. doi:10.1007/s11105-011-0395-1
- 503 Views
Dehydrins (DHNs) define a complex group of stress inducible proteins characterized by the presence of one or more lysine-rich motifs. DHNs are present in multiple copies in the genome of plant species. Although genome-wide analysis of DHNs composition and chromosomal distribution has been conducted in herbaceous species, it remains unexplored in woody plants. Here, we report on the identification of ten genes encoding eleven putative DHN polypeptides in Populus. We document that DHN genes occur as duplicated blocks distributed over seven of the 19 poplar chromosomes likely as a result of segmental and tandem duplication events. Based on conserved motifs, poplar DHNs were assigned to four subgroups with the Kn subgroup being the most frequent. One putative DHN polypeptide (PtrDHN-10) with a SKS arrangement could originate from a recombination between SKn and KnS genes. In silico analysis of microarray data showed that in unstressed poplar, DHN genes are expressed in all vegetative tissues except for mature leaves. This exhaustive survey of DHN genes in poplar provides important information that will assist future studies on their functional role in poplar.
KeywordsPopulusLEADehydrinsCold stressWoody plantsCellular dehydration
Late embryogenesis abundant
Dehydrins (DHNs) are Group II (D-11 family), late embryogenesis abundant (LEA) proteins that accumulate during seed desiccation and in response to water deficit induced by drought, low temperature or salinity in vegetative tissues or reproductive tissues (Close 1996; Allagulova et al. 2003; Kosova et al. 2007; Tunnacliffe and Wise 2007). A vital role in bud dormancy and cold acclimation of trees has been attributed to their certain DHN proteins (Rinne et al. 2010; HongXia et al. 2009; Rohde et al. 2007; Rorat 2006). DHNs are widely distributed in various organisms of plant kingdom including all seed plants, nonvascular plants and seedless vascular plants, where they accumulated in different cell compartments but mostly in the cytoplasm and nucleus (Battaglia et al. 2008; Tunnacliffe and Wise 2007; Allagulova et al. 2003).
The distinctive sequence feature of all DHN proteins is a conserved, Lys-rich 15-residue motif, EKKGIMDKIKEKLPG, named the K-segment often found in one to 11 copies within a single protein. Other optionally additional motifs in DHNs are the Y-segment ([V/T]D[E/Q]YGNP) usually found in one to 35 tandem copies in the N-terminus; the S-segment containing a track of Ser residues; the less conserved Ф-segment rich in polar amino acids and lay interspersed between K-segment (Close 1996; Allagulova et al. 2003). The presence and arrangement of these different conserved motifs in a single protein allow the classification of DHN proteins into five subgroups: YnSK2, Kn, SKn, KnS, and Y2Kn (Rorat 2006; Allagulova et al. 2003). In addition, some DHNs could only be assigned to certain intermediate forms instead of the five subgroups, such as SK3S arrangement in one DHN protein of chickweed (Z21500; Close 1996). These considerable research efforts have been employed in exploring DHNs structure and function for herb model plants, such as Arabidopsis, maize and barley, but such in-depth study has not yet been directed towards woody trees.
The genes encoding DHN are a multigene family (Hundertmark and Hincha 2008). Recent studies, together with the release of complete genome sequences for different organisms, have led to the identification of DHNs in single plant genome. In previous published reports, 12, 9, and 10 DHN genes had successively been identified using different methods in Arabidopsis (Hundertmark and Hincha 2008; Tunnacliffe and Wise 2007; Alsheikh et al. 2005), 13 in barley (Choi et al. 1999; Rodriguez et al. 2005), 8 in rice (Wang et al. 2007). In addition, so far, only one SK2 type DHN in different poplars was successively identified and their response to various stresses was confirmed (Bae et al. 2009; Caruso et al. 2002). Even though genes encoding DHNs have been identified in several plant species, to date, there is still no comprehensive and systematic study characterizing all DHN genes in a single woody plant genome. In order to explore all genes encoding DHN proteins in poplar, complete Populus trichocarpa genome was investigated using the method of domain search. Here, we exhibit an identification and analysis of DHN proteins and their respective genes in P. trichocarpa. As we know, this is the first systematic characterization of all genes encoding DHN proteins in a single woody plant genome, and represents the basis for future studies on the in vivo each poplar DHN function.
Identification and chromosomal location of poplar DHN genes
The complete protein sequence database was downloaded from P. trichocarpa v1.1 (www.jgi.doe.gov/poplar). Hidden Markov Model (HMM) profile file (dehydrin.hmm) of the Pfam Dehydrin domain (PF00257) was downloaded from the Pfam database (http://pfam.sanger.ac.uk/). The dehydrin.hmm file was exploited as a query to identify the DHN genes in the poplar protein database using the hmmer search command of the HMMER (v 3.0) software, which was widely applied for identification of homologues of an interested protein family (Finn et al. 2010; Eddy 2009). All non-redundant (Nr) hits with expected values less than 0.1 were collected, and then were respectively searched applying BLASTP program across REFseq Nr protein database in NCBI (http://www.ncbi.nlm.nih.gov/). The expressed sequence tags (EST) were retrieved by BLASTN the corresponding transcript/CDS from P. trichocarpa v1.1 (www.jgi.doe.gov/poplar) as query sequence online search against all of the Populus EST sequences in NCBI. Matches above 95% identity and over an alignment of at least 100 bp were considered as corresponding sequences of the dehydrin genes. Multiple sequences alignments of these sequences with their individual transcript/CDS sequence were performed using ClustalW program in BioEdit software under the default parameters settings (Hall 1999). Sequence alignments were manually adjusted to get maximum matching.
The 11 identified DHN genes were located in the genome of P. trichocarpa using NCBI map viewer (http://www.ncbi.nlm.nih.gov/projects/mapview/). Identification of duplicated regions between chromosomes was completed as described in Tuskan et al. (2006). The tandem gene duplication in poplar was determined according to the criteria that five or fewer gene loci occurred within a range of 100 kb distance (Hu et al. 2010; Finn et al. 2006).
Identification of conserved motifs
Extraction of motifs from 34 DHN protein sequences in poplar, Arabidopsis and barley, are performed using the software of MEME online-version 4.6.1 (Multiple Expectation Maximization for motif Elicitation), which is one of the most widely used tools for observation of new sequence patterns in biological sequences and analysis of their significance (Bailey and Elkan 1994; Bailey et al. 2006). MEME program is run with the following parameters: the optimum number for each motif is between 2 and 120, distribution of motif occurrences is any number of repetitions, maximum number of motifs is 15, and the optimum motif widths were restricted between 8 and 16 residues.
Phylogenetic analysis and in silico microarray analysis
Multiple sequences alignments of the full-length protein sequences were performed using ClustalW program in BioEdit software with default parameters (Hall 1999). Based on these aligned sequences, the unrooted phylogenetic trees were constructed using MEGA 5.0 software (Tamura et al. 2011), by both Neighbor-joining method (Saitou and Nei 1987) and Minimum Evolution method with the parameters (p-distance and completed deletion). The reliability of the phylogenetic tree was estimated using bootstrap value with 1000 replicates. Probe sets corresponding to individual poplar DHN gene were retrieved using an online probe match tool available at NetAffxTM Analysis center (http://www.affymetrix.com/analysis/index.affx). The transcript relative abundance values of all poplar DHN genes from various tissues were obtained from the poplar transcript abundances datasets (Wilkins et al. 2009) in the website of the Populus electronic fluorescent pictograph browser (Poplar eFP browser; http://bar.utoronto.ca/efppop/cgi-bin/efpWeb.cgi), whose data originated from the NCBI Gene Expression Omnibus (accession number: GSE13990). For genes with more than one probe set, the mean expression values were considered. When several genes have the same probe set, then they are considered as the same level of transcript abundance. Dendrogram and heat map for display expression pattern were obtained using the Cluster 3.0 (de Hoon et al. 2004) for normalizing and hierarchical clustering with average linkage based on Pearson coefficients, and then Java Tree-View 1.1 program (Saldanha 2004) for visualizing the analyzing datasets.
Results and Discussion
Identification and characterization of DHN gene family in Populus
All identified dehydrin genes and putative encoded poplypeptides present in Populus trichocarpa genome
JGI protein ID
Gene and transcript products
Novel simplified gene nomenclature
Novel simplified nomenclature
Thus, in our study, a total of 11 DHN genes were finally identified in P. trichocarpa genome by the genome-wide survey (Table 1). The number of DHN genes in P. trichocarpa is roughly equal to that of Arabidopsis, which is not in agreement with the ratio of 1.4∼1.6 putative Populus homologues to each Arabidopsis gene according to comparative genomics studies (Tuskan et al. 2006). In contrast, the expansion, often present on a large number of Populus multigene families (Tuskan et al. 2006), seems not to occur in PopulusDHN gene family. It could be speculated that the presence of similar number of DHN genes in Populus genome might reflect the analogous needs for these genes involving in their specific stress-related function.
Revising of DHN gene-encoding proteins as well as discovering of alternative splicing present in poplar DHN genes
Given the current draft nature of the Populus genome (www.jgi.doe.gov/poplar), where a first-draft reference set of 45,555 protein-coding gene loci was tentatively identified, the gene set in Populus will need to be refined gradually (Tuskan et al. 2006). To calibrate our preliminary identification of the eleven DHN genes from JGI poplar database, their encoding proteins were further compared by a BLASTP search against NCBI Reference sequence (RefSeq) database, which provides a non-redundant and validated collection of sequences representing genomic data, transcripts and proteins (Pruitt et al. 2006, 2005). As a result, among them, the three poplar DHN proteins (PtrDHN-8, PtrDHN-11, and PtrDHN-9) without counterparts in NCBI RefSeq database (Table 1), may represent truncated or incorrect proteins. Their corresponding EST were retrieved by BLASTN online search to obtain support and mend them for further analysis. These ESTs from NCBI perfectly matched CDS sequences, particularly for the nucleotide acid sequences encoding amino acid sequences of K-segment, were selected for alignment with their individual transcript/CDS from P. trichocarpa v1.1 (Electronic Supplementary Material (ESM) Fig. S1–3). As for the transcript of PtrDHN-9 (665494), a large number of EST support “ATG” at position 49∼51 as translation start codon, “TAA” at position 337∼339 as translation stop codon (ESM Fig. S1). According to this, the encoded amino acids after the “TAA” were removed from the original PtrDHN-9 encoding protein sequence (ESM Fig. S1 and ESM Table S1). The absence of translation start codon “ATG” lead to the incomplete N-terminus of PtrDHN-8 (195568) protein, our EST sequence alignment and comparative analyses clearly demonstrated that upstream of the first three nucleotides “GCC” from PtrDHN-8 transcript should be extended by the “ATG” encoding Met as initiation codon as well as the followed “GCT” encoding Ala (ESM Fig. S2 and ESM Table S2). Moreover, “TAG” at position 394∼396 was strongly supported by ESTs as its translation terminator codon (ESM Fig. S2). The revised CDS and encoding protein sequence of PtrDHN-8 were displayed in ESM Table S2; The gene PtrDHN-11 (276757) had no significant EST match, and “TAG” at position 196∼198 of transcript as stop codon caused the early translation termination (ESM Fig. S3 and ESM Table S3). Based on this revised CDS sequence, its encoding amino acid sequence in the front of the stop codon was determined not to match any DHN domain (ESM Fig. 3 and ESM Table S3). Therefore, it was excluded from the identified 11 DHN gene of poplar above mentioned, but identified as putative pseudogene of DHN because of its high sequence identity with another DHN gene PtrDHN-1 (550802). In this endeavor, two (PtrDHN-9 and PtrDHN-8) out of the three problematic transcripts were confirmed by EST support with high confidence, and modified into complete protein, whereas the third gene PtrDHN-11 (276757) was identified as pseudogene of DHN.
Chromosomal location and duplication of DHN gene in Populus
Previous analysis of Populus genome has identified the presence of paralogous segments caused by the whole-genome duplication event in the Salicaceae (salicoid duplication), which occurred 65 million years ago and significantly contributed to the amplification of many multigene families (Tuskan et al. 2006). To determine the possible relationship between the DHN genes and paralogous segments, the Populus DHN genes were mapped to the duplicated blocks of P. trichocarpa established in the studies of Tuskan et al. (2006). The distribution of DHN genes relative to the duplicated blocks is illustrated as Fig. 3. It was found that all the nine mapped DHN genes (100%), are located in duplicated blocks. Two duplicated pairs (PtrDHN-1/3 and PtrDHN-2/8) are each located in a pair of paralogous blocks and can be considered as direct results of the segmental duplication event (Fig. 3). Similarly, Cluster I/PtrDHN-9 also corresponds to a pair of paralogous blocks created by the whole-genome duplication event (Fig. 3). One duplicated pair (PtrDHN-10) harbored DHN genes on only one of the blocks and lack corresponding duplicates, suggesting that the loss event of its corresponding paralogous genes should have occurred after the segmental duplication events (Fig. 3). The findings support the result that the most abundant genes losses in eukaryotes occur following the whole genome duplication (Abdel-Haleem 2007).
Furthermore, the tandem duplications also contribute to the expansion of DHN gene family. In LG XIII, there is one DHN cluster (Cluster I) with three genes tandem arranged in the same orientation spanning a 20-kb fragment (Table 1 and Fig. 3). Together with the high sequence identities among them, the three tandem DHN genes within Cluster I were considered to be direct results of the tandem duplication events. Their organization in duplicated blocks implied that the presence of the segmental duplication events was prior to the tandem duplication. According to the genomic organization of DHN genes, segmental duplication as well as tandem duplication events contributed to the expansion of DHN gene family in the Populus genome. Similarly, the two events had also been shown to contribute to the expansion of DHN genes in Arabidopsis (Hundertmark and Hincha 2008) and rice (Wang et al. 2007).
In our study, Populus DHN gene family has been preferentially retained at a rate of 100%, while in Populus genome, about only one-third of putative genes are retained in duplicated blocks resulting from the whole genome duplication events (Tuskan et al. 2006). The high retention rate of duplicated genes had also previously been documented in other Populus gene families (Hu et al. 2010; Barakat et al. 2009; Kalluri et al. 2007). In addition, the segmental duplication ratio of DHN genes in this study is predominantly higher than that of the tandem duplication, suggesting that the segmental duplication might be main events contributing to the expansion of Populus DHN genes.
Identification of conserved motif and classification of Populus DHN proteins
Biochemical properties of all identified poplar DHN proteins
Novel simplified nomenclature
JGI ACS. number
Number of AA
Divergence within Populus DHN genes
An unrooted tree was, respectively, generated by both Neighbor-Joining (Saitou and Nei 1987) and Minimum-Evolution methods using MEGA 5.0 (Tamura et al. 2011) based on complete protein sequences of all the DHN genes in Populus. The tree topologies generated by the two methods were comparable without modifications at branches, and supported by their high bootstrap values of >55, suggesting that we constructed a reliable unrooted tree topology, in which the 11 poplar DHNs were grouped into four distinct clans, including Type I, Type II, Type III, and Type IV (Fig. 5a). The four distinct types generated by their evolutional divergence mostly corresponded to the subgroups identified by motif analysis above. The PtrDHN-2 and PtrDHN-8 belonging to YnSKn subgroup were assigned to type II, and the SKnS subgroup of PtrDHN-10 representing intermediate form of SKn and KnS to type IV. Type I contains two KnS subgroup of DHNs (PtrDHN-6 and -9) and three Kn subgroup of DHNs (PtrDHN-7, -4, and -5; Fig. 5a). The latter differs from other two Kn subgroup DHNs of PtrDHN-1.1 (K9) and -1.2 (K13) by the presence of a novel repeating motif (motif-5; Fig. 5 and ESM Fig. S4e). The two Kn subgroup of DHNs (PtrDHN-1.1 and -1.2), together with one SKn subgroup of DHN (PtrDHN-3), were assigned to type III because of their presence of the other one novel motif (motif-6; Fig. 5 and ESM Fig. S4f). The similar conserved motifs of DHN proteins within the same types might provide additional supports for the unrooted tree topology. Also, proteins encoded by one paralogous pairs in DHN gene family well correspond to the same types, for instance, the paralogous pairs of PtrDHN-1/3 were assigned to type II, PtrDHN-2/8 to type III, Cluster I/PtrDHN-9 to type I. This evidence further supports the expansion of DHN gene family in the Populus genome caused by segmental duplication as well as tandem duplication events.
Biochemical properties of poplar DHN proteins
Generally, DHNs are characterized by the presence of abundant Gly and polar amino acid, but lack Cys and Trp (Close 1997, 1996). Analysis of the amino acid compositions of all poplar DHN proteins indicated that they share the common feature, only one exceptional example is PtrDHN-10 (817405) of the SKS subgroup with relatively high content of Cys (4.6%; ESM Table S7). Together with their relatively low GRAVY values in the range of −1.995 to approximately −0.880 (Table 2), confirm the presence of the very hydrophilic nature in Populus DHN proteins, which is in agreement with other plant DHNs (Kosova et al. 2007). For example, the Kn subgroup of PtrDHN-7 (K3) with molecular mass of 19.1 kDa, Gly, Gln, Lys, Glu, and Asp represent 60.0% of the total amino acids, whereas no Cys and Trp were found (Table 2 and ESM Table S7). Calculation on MW of all poplar DHN proteins shows that they are characterized by a range of molecular masses from 10.7 to 68.9 kDa, most (9/11) of which are relatively small falling in a range of 10∼26 kDa, only two are larger, respectively, being 50.8 and 68.9 kDa (Table 2). However, their unique amino acids composition led to the presence of discrepancy that the apparent MW on electrophoretic gels significantly higher than the actual MW of these proteins calculated from their amino acid sequence (Close 1997; Kosova et al. 2007). Like barley DHN5, its MW on SDS gels is evaluated into about 84 kDa according to standards of protein marker though its actual MW is only 58.5 kDa (Kosova et al. 2007). Accordingly, further experiment is required for confirming actual MW corresponding to apparent MW of each poplar DHNs.
In addition, isoelectric point (pI) value is also considered to be important biochemical properties for subdivision of DHNs of plants because DHNs of different acidic or basic features within the same subgroup might respond to various environmental factors (Allagulova et al. 2003). Theoretical pI values of Populus DHNs fluctuate in a wide range from 5.01∼9.96, with five acidic DHNs, five basic and one neutral DHNs (Table 2), which is consistent with pI range (5.21∼9.52) of barley DHNs (Kosova et al. 2007).
Tissue location of DHN gene expression in Populus
Furthermore, several previous studies obtained from different species, indicated that different types of DHN proteins can localize to common tissues during development under normal growth conditions (Battaglia et al. 2008; Rorat 2006). Our in silico expression study of all poplar DHN genes confirms this conclusion, for example, the same tissue expression pattern are found between the KnS type of PtrDHN-6 and the Kn type of PtrDHN-4 and -7, as well as between the SKnS type of PtrDHN-10 and the YnSKn type of PtrDHN-2 and -8 (Fig. 6). However, we also found that PopulusDHN genes belonging to the same types preferentially expressed in the common tissues under normal growth conditions. For example, PtrDHN-4, -7, -1.1, and -1.2 belonging to the same Kn type share the similar tissue expression patterns that preferentially expressed in MC, FC and YL, few accumulated in ML, R, CL, DL, and DS. PtrDHN-2 (Y3SK2) and -8 (Y3SK) belonging to the common YnSKn type share the same expression patterns with the highest transcript abundances especially present in seedlings under specific conditions (CL, DL, and DS), which is consistent pattern with this type of DHNs in other plants, such as Indian mustard BjDHN1 (Y3SK2) and oilseed rape BnDHN1 (Y3SK2; Yao et al. 2005). The evidence that poplar DHN genes within the same type preferentially share similar expression patterns across the nine tissues during normal growth conditions, would provide one useful data resource for exploring correlation between DHN type and their tissue localization.
Considerable research effort has been performed in characterization of the DHNs in herbaceous plants, such as barley, rice, and Arabidopsis, but such effort has not yet been directed towards woody trees. In this work, the above issues are addressed using the method of genome-wide identification and in silico analysis. This comprehensive analysis will be an important starting point for future efforts to elucidate the function role of all DHN proteins in poplar.
We are grateful for the financial support from the National Basic Research Priorities Programme (Grant No. 2009CB119102), the Excellent Doctor Degree Dissertation in the Northeast Forestry University (Grant No. 140-602055), and Special Fund of Forestry Industrial Research for Public Welfare of China (Grant No. 201004040). The authors declare no conflicts of interest.