Plant Molecular Biology

, Volume 73, Issue 4, pp 449–465

Genome-wide analysis of helicase gene family from rice and Arabidopsis: a comparison with yeast and human

Authors

  • Pavan Umate
    • Plant Molecular Biology GroupInternational Centre for Genetic Engineering and Biotechnology (ICGEB)
  • Renu Tuteja
    • Plant Molecular Biology GroupInternational Centre for Genetic Engineering and Biotechnology (ICGEB)
    • Plant Molecular Biology GroupInternational Centre for Genetic Engineering and Biotechnology (ICGEB)
Article

DOI: 10.1007/s11103-010-9632-5

Cite this article as:
Umate, P., Tuteja, R. & Tuteja, N. Plant Mol Biol (2010) 73: 449. doi:10.1007/s11103-010-9632-5

Abstract

Helicases are motor proteins which can catalyze the unwinding of stable RNA or DNA duplex utilizing mainly ATP as source of energy. In this study we have identified complete sets of helicases from rice and Arabidopsis. The helicase gene family in rice and Arabidopsis contains 115 and 113 genes respectively. These helicases were validated based on their annotations and supported with organization of conserved helicase signature motifs. We have also identified homologs of 64 rice RNA and DNA helicases in Arabidopsis, yeast and human. We explored Arabidopsis oligonucleotide array data to gain functional insights into the transcriptome of helicase family members under ten different stress conditions. Our results revealed that expression of helicase genes is profoundly regulated under various stress conditions. The helicases identified in this study lay a foundation for the in depth characterization of each helicase type.

Keywords

Abiotic stressDEAD-box proteinsHelicase motifsProtein alignmentSequence LOGOTranscriptomics

Introduction

Helicases are ubiquitous enzymes functioning in diverse cellular and metabolic processes that involve separation of double-stranded nucleic acid into single strands and removal of nucleic-acid associated proteins. Helicases are involved in a number of metabolic pathways involving nucleic acids such as replication, recombination, transcription and translation etc. The efficient operation of almost all the nucleic acid metabolic processes requires helicase enzymes and therefore a number of RNA and DNA helicases with diverse functions have been found in all organisms. Helicases are motor proteins which can catalyze the unwinding reaction of an energetically stable duplex DNA or RNA molecule by using the energy of ATP hydrolysis (Tuteja and Tuteja 2004a). It is widely accepted that helicases translocate mainly unidirectionally along the bound strand and they have a specific polarity depending upon the bound strand on which they move either in the 3′–5′ or 5′–3′ direction.

Mostly, all the helicases possess conserved amino acid sequence motifs based on which they are classified into families and superfamilies (Gorbalenya and Koonin 1993). The sequence based classification has led to the definition of 4 superfamilies of helicases, namely SF1, SF2, SF3 and SF4 (Gorbalenya and Koonin 1993). SF2 is the best characterized superfamily which includes the protein families SNF2, the DEAH and DEAD (Asp-Glu-Ala-Asp)-box helicases. It is noteworthy that different substrate specificity (RNA versus DNA) and directionality of translocation (3′–5′ or 5′–3′) might exist in helicase members of the same superfamily. A core region (~350–400 amino acids) of highly conserved sequence motifs is shared by several DNA and RNA helicases of the DEAD-box protein family (Schmid and Linder 1992; Pause et al. 1993). Almost all the helicase proteins contain well-defined nine conserved motifs named Q, I, Ia, Ib and from II to VI (Tuteja and Tuteja 2004a, b). The crucial helicase motifs are A/GxxGxGKT, DEAD, SAT and HRIGRxxR which have been implicated in ATP binding and hydrolysis, RNA or DNA binding and ATP hydrolysis-dependent RNA and DNA unwinding (Tuteja 2003; Tuteja and Tuteja 2004b; Cordin et al. 2006; Jankowsky and Fairman 2007).

The largest subfamily of helicases constitutes the DEAD-box family and these enzymes unwind mainly double stranded RNAs (de la Cruz et al. 1999). The RNA helicases have been implicated in every step of RNA metabolism, including nuclear transcription, pre-mRNA splicing, ribosome biogenesis, nucleo-cytoplasmic transport, translation, RNA decay and organellar gene expression (de la Cruz et al. 1999; Tanner and Linder 2001; Lorsch 2002). DEAD-box RNA helicases are also RNA chaperones that can actively disrupt the misfolded RNA structures (Tanner and Linder 2001; Lorsch 2002). The DNA helicases are known to unwind the duplex DNA and are involved in replication, repair, and recombination processes (Tuteja and Tuteja 2004b). It is interesting to note that the expression or activity of some of the DEAD-box helicases is regulated in response to changes in specific environmental conditions including salt stress, oxygen, light or temperature (Mahajan and Tuteja 2005; Owttrim 2006; Vashisht and Tuteja 2006).

The availability of the rice (Oryza sativa) (Goff et al. 2002; Yu et al. 2002) and Arabidopsis genome sequences (http://www.arabidopsis.orgArabidopsis Genome Initiative 2000) provides an exciting opportunity for the large-scale investigation of functions of gene families that underlie the extensive genetic diversity amongst the monocots and eudicots. A possible comparative approach of gene families will involve the establishment of relationships between different genomes (Bennetzen 2002; Pennacchio 2003; Vincentz et al. 2004). In this study, we have identified the possible complete sets of helicases in rice and Arabidopsis, which contain 115 and 113 genes, respectively. We have categorized and classified this collection of helicases from rice and Arabidopsis as RNA and DNA helicases based on their annotations and conserved helicase motifs.

The objectives of the present study were to explore through the complete set of helicase genes present in two sequenced model plant genomes, rice and Arabidopsis using comparative bioinformatics approach. The present study thus provides insights into the relationship between the composition of the helicase gene family in rice and Arabidopsis, and their corresponding homologs in yeast and human. In this study, the helicase genes from rice were compared by analysis of phylogenetic relationships and protein domains. We further explored a visualization method based on pHMM (profile Hidden Markov Model) sequence LOGO and demonstrate its utility to study the contribution of conserved helicase motifs in all the known members of helicase family. In this study, the gene expression patterns of 113 helicase genes in Arabidopsis under various stress conditions like anoxia, cold, drought, genotoxic, heat, hypoxia, osmotic, oxidative, salt and wounding was analyzed using the Affymetrix 22K ATH1 oligonucleotide array containing more than 22,500 probe sets (Cho et al. 2009). Utilizing the microarray-derived expression profiles of unique helicase genes can increase the efficiency of functional validation of such genes in Arabidopsis and their possible homologs in other systems. The results presented here thus shed light on several key aspects of helicase gene family phylogeny and the distribution of conserved helicase motifs and provides insights into the structure and composition of helicase gene families in rice and Arabidopsis. The various groups identified in this study thus constitute a backbone for more detailed analysis of each helicase group.

Materials and methods

Screening of database

A putative function search was carried out for helicase genes in the Rice Genome Annotation Project (RAP) database (http://rice.plantbiology.msu.edu/) (Ouyang et al. 2007) and a list of 115 helicase genes was compiled. These candidates were further sorted into RNA helicases (73), DNA helicases (31) and others (11). For Arabidopsis, a dataset representing all helicase genes encoded by the Arabidopsis genome was generated based on putative function search in the Arabidopsis Information Resource (TAIR) database (http://www.arabidopsis.org/) (Poole 2007).

Identification of helicase genes

The helicase gene sequences were verified based on the presence of conserved helicase motifs (Tuteja and Tuteja 2004a, b). The protein sequences for rice (Oryza sativa) and Arabidopsis were downloaded from http://rice.plantbiology.msu.edu and http://www.arabidopsis.org websites, respectively.

Protein alignment studies and phylogenetic analysis

The sequences for 75 helicase proteins (39 for RNA helicases, 25 for DNA helicases and 11 for others) were downloaded from RAP database and were aligned with ClustalX using default parameters. The protein domains were validated based on the alignment comparison that was made with ClustalX. Protein sequences were checked manually also for the presence of helicase motifs. Phylogenetic analysis was performed using MEGA4 software program (Kumar et al. 2008). Phylogenies were constructed using Neighbor-Joining (NJ) on protein sequences (Saitou and Nei 1987). Entire protein sequences were employed for all these analyses.

Generation of profile HMM Logos for DEAD-box proteins

We utilized the pHMM for building sequence LOGO for the DEAD-box helicase proteins in the Pfam database (http://pfam.janelia.org/family?acc=PF00270) (Finn et al. 2008). A pHMM specifies position-specific letter emission distributions and also position specific insertion and deletion probabilities to describe a sequence family (Schuster-Böckler et al. 2004). In this study, the sequence profiles for all homologous helicase protein sequences from the Pfam database (http://pfam.janelia.org/family?acc=PF00270) were visualized using sequence LOGO (Schneider and Stephens 1990). A sequence logo graphically represents conservation of the columns (positions) in a multiple alignment by plotting a stack of letters (nucleotides or amino acids) for each position. The total stack height is computed as the information content of the column, i.e., its relative entropy distance from an assumed background distribution. The relative height of each letter in the stack is proportional to its frequency at that position (Schuster-Böckler et al. 2004). While building a pHMM for the DEAD-box helicases, all the homologous helicase protein sequences in the Pfam database were covered.

Arabidopsis microarray data

The Affymetrix 22K ATH1 oligonucleotide expression data were obtained from the Genevestigator Response Viewer (https://www.genevestigator.com) (Hruz et al. 2008) available as external link in TAIR database. The expression profiles of 113 genes were selected for cluster analysis. The fold changes were first converted to log2 and expressed relative to the mean value for normalization at P < 0.05. Heat map was generated based on hierarchical clustering method which was implemented using the program Genesis (Eisen et al. 1998; Soukas et al. 2000; Sturn et al. 2002).

Expression profiling, data analysis and cluster analysis

The differential expression analysis was performed on the log transformed data. The data comprised of a number of various stress treatments like anoxia, cold, drought, genotoxic, heat, hypoxia, osmotic, oxidative, salt and wounding. Following this, a set of genes was created importing the log2 fold expression values for 113 genes in the Genesis program. Cluster analysis was performed using average linkage clustering of hierarchical clustering.

Results

The search criteria of putative function search on RAP database revealed the presence of 115 loci that encode for helicase proteins in the rice genome. These loci were distributed amongst all 12 rice chromosomes. The enlisted group of helicases was further classified into 3 major groups as follows: RNA helicases (73), DNA helicases (31) and others (11). Several of these genes were redundant. Through iterated searches using BLASTp (http://blast.ncbi.nlm.nih.gov) algorithms identified these 115 helicase genes to encode for 75 proteins which included 39 for RNA helicases, 25 for DNA helicases and 11 for other helicases. With BLASTp searches, the homologs for 64 helicase proteins (39 RNA helicase and 25 DNA helicases) in Arabidopsis, yeast and human were also identified. We also performed a more detailed analysis of these genome data sets taking into account gene structures and conserved helicase motifs.

RNA helicases

Table 1 shows a total of 73 RNA helicases encoded by the rice genome. The list of RNA helicases included prominent helicases like BAT1, DB10, DBP2-5, members of subfamilies DDX (DDX18, DDX23, DDX41, DDX46, DDX47, DDX49, DDX51, DDX52, DDX54, DDX56) and DHX (DHX8, DHX16, DHX35, DHX36), maintenance of killer protein (MAK5), nucleolar RNA helicase (NRH2), pre-mRNA processing RNA helicases (PRP5 and PRP16), silencing defective (SDE3), tRNA-splicing endonuclease (SEN1), U5 small nuclear ribonucleoprotein (U5 snRNA 200 kDa), suppressor of PAB1 protein 4 (spb4), pre-mRNA splicing factor RNA helicases (sfRNAh, sfRNAh.F56D2.6), suppressor allel var 1 (SUV3), RNA helicase (rh1E), helicase associated with SET1 protein (has1), deficiency of ribosomal subunits protein 1 (DRS1), DEAD (DBH, ded1) and DExD/H (dhh1) box members, RNA helicase A (RNAh A) and RNA helicase (RNA h) (Table 1 and Supplementary Table 1). The genes that encode RNA helicases are distributed throughout 12 chromosomes in the rice genome (Table 1). It is interesting to note that chromosomes 3 and 1 contain highest number of these helicase genes (sixteen and fourteen, respectively). Chromosomes 2 and 7 contain eight, 4 and 6 contain five, 8 and 11 contain four, 12 contains three and 5, 9 and 10 each contains two genes respectively (Supplementary Table 1). Several of these genes were encoded by more than 2 and up to 8 loci in the rice genome (Table 1). It should be noted here that some of these loci encoded truncated proteins and therefore after thorough analysis only the loci which encoded the full-length protein was considered for further characterization.
Table 1

A list of 73 RNA helicases with their corresponding gene symbol, the number of encoding loci, and Locus IDs from the rice genome

RNA helicases type

Gene symbol

No. of encoding loci

Locus ID

HLA-BAssociated Transcript 1

BAT1

2

LOC_Os01g36890, LOC_Os01g36920

DEAD-Box

DB10

2

LOC_Os01g07740, LOC_Os01g36860

DEAD-Box Protein 2

dbp2

2

LOC_Os01g68320, LOC_Os01g10050

DEAD-Box Protein 3

DBP3

1

LOC_Os07g20580

DEAD-Box Protein 4

dbp4

2

LOC_Os07g33340, LOC_Os02g57980

DEAD-Box Protein 5

DBP5

1

LOC_Os03g06220

DEAD-box family

DDX18

1

LOC_Os06g34420

DDX23

1

LOC_Os03g50090

DDX41

2

LOC_Os06g48210, LOC_Os02g05660

DDX46

1

LOC_Os08g05810

DDX47

2

LOC_Os03g46610, LOC_Os07g46580

DDX49

1

LOC_Os07g43980

DDX51

1

LOC_Os02g55260

DDX52

1

LOC_Os07g45360

DDX54

1

LOC_Os08g32090

DDX56

1

LOC_Os03g51900

DEAD-Box Helicase

DBH

8

LOC_Os03g01830, LOC_Os01g73900, LOC_Os03g12000, LOC_Os04g40970, LOC_Os05g01990, LOC_Os09g21520, LOC_Os11g32880, LOC_Os12g41715

DEAD-Box Helicase

ded1

4

LOC_Os03g59050, LOC_Os07g10250, LOC_Os06g40020, LOC_Os11g38670

DExD/H-box protein

dhh1

3

LOC_Os10g35990, LOC_Os02g42860, LOC_Os04g45040

DEAH-box family

DHX16

3

LOC_Os03g17432, LOC_Os05g32370, LOC_Os08g24760

DHX35

1

LOC_Os01g11370

DHX36

2

LOC_Os01g02884, LOC_Os03g53760

DHX8

1

LOC_Os06g09280

Deficiency of Ribosomal Subunits protein 1

DRS1

1

LOC_Os12g29660

Pre-mRNA Splicing Factor RNA Helicase

sfRNAh.F56D2.6

1

LOC_Os11g20554

Helicase Associated with SET1 protein

has1

3

LOC_Os01g43120, LOC_Os01g43130, LOC_Os03g58810

Maintenance of Killer protein 5

MAK5

1

LOC_Os04g43140

Nucleolar RNA Helicase 2

NRH2

3

LOC_Os03g61220, LOC_Os07g05050, LOC_Os09g34910

Pre-mRNA Processing RNA helicase

PRP16

1

LOC_Os07g32430

Pre-mRNA Processing RNA helicase

PRP5

3

LOC_Os02g10770, LOC_Os03g19530, LOC_Os08g06344

RNA Helicase

rhlE

1

LOC_Os01g08930

RNA Helicases

RNA h

2

LOC_Os12g05230, LOC_Os11g09060

RNA Helicase A

RNAhA

2

LOC_Os01g56190, LOC_Os04g45980

Silencing Defective

SDE3

1

LOC_Os03g06440

tRNA-Splicing Endonuclease

SEN1

1

LOC_Os10g02930

U5 Small Nuclear RiboNucleoProtein

U5.snRNP 200kDa

3

LOC_Os02g01740, LOC_Os02g01880, LOC_Os03g53220

Suppressor of PAB1 protein 4

spb4

1

LOC_Os01g07080

Pre-mRNA Splicing Factor RNA Helicase

sfRNAh

3

LOC_Os02g19860, LOC_Os03g19960, LOC_Os06g23530

Suppressor allel var1

SUV3

2

LOC_Os03g53500, LOC_Os04g38630

In this study, we have identified the possible non-redundant complete set of RNA helicases in rice. Of the total enlisted 73 RNA helicases, 39 were encoded by a single loci and the rest were redundant (Table 1). Within this 39 RNA helicases, we searched for the presence of nine (Q, I, Ia, Ib and from II to VI) conserved helicase motifs based on protein alignment studies. The amino acid sequences of these nine motifs are as follows: GFxxPxxIQ, AxxGxGKT, PTRELA, TPGR, DEAD or DExD/H, SAT, FVNT, RGxD and HRxGRxxR. The entire protein sequences were aligned which showed highly variable N- and C-terminal regions and conserved core of ~350–400 amino acids that contained the helicase signature motifs. The protein alignment studies have revealed the presence of all the nine conserved helicase motifs for 26 of the enlisted RNA helicases (BAT1, DB10, DBP2, DBP3, DBP4, DBP5, DDX18, DDX23, DDX41, DDX46, DDX47, DDX49, DDX51, DDX52, DDX54, DDX56, DBH, ded1, dhh1, DRS1, has1, MAK5, NRH2, PRP5, rhlE and spb4 (Fig. 1a). Of these, with exception for BAT1, DDX51, DDX52, PRP5 and DBH wherein we found DExD motif, the rest of the helicases contain well conserved DEAD motif (Fig. 1a). SAT motif is also highly conserved and is present in almost all the helicases except DB10, rhlE, DBH and DDX46 (Fig. 1a). We performed phylogenetic analysis on these 26 RNA helicases that showed the presence of all the nine helicase signatures (Fig. 1b). The Neighbour-Joining (NJ) analysis on the whole protein sequence for the set of 26 RNA helicases showed that DDX47 and DDX49, BAT1 and dhh1, DB10 and DBP2 along with DBP3 were closely related (Fig. 1b).
https://static-content.springer.com/image/art%3A10.1007%2Fs11103-010-9632-5/MediaObjects/11103_2010_9632_Fig1a_HTML.gif
https://static-content.springer.com/image/art%3A10.1007%2Fs11103-010-9632-5/MediaObjects/11103_2010_9632_Fig1b_HTML.gif
Fig. 1

Amino acid alignment and phylogenetic analysis of RNA helicases from rice. a Amino acid alignment of 26 RNA helicases with 9 helicase signatures is shown. The conserved helicase motifs are indicated on top of aligned sequences. The alignment was made using ClustalX on entire protein sequence for each helicase type. RNA helicases are identified on left and numbers of amino acid residues are given on right. Asterisks indicate conserved amino acids while dots represent conservative amino acid changes. Gaps in the amino acid sequences are introduced to improve the alignment. b Phylogenetic relationship of 26 RNA helicases built on protein sequences. Each node is represented by a number indicating the bootstrap value for 100 replicates. The un-rooted tree was generated using Neighbour-Joining (NJ) method and viewed using MEGA4 software. Scale bar shows 0.1 substitutions per sequence position

The other 5 helicases (DHX8, DHX16, PRP16, DHX35 and DHX36) belong to the category of DEAH-box helicases (Fig. 2). In these members, we were able to identify 8 helicase motifs (GxxGxGKT, TQPRRV, TDGML, DEAH, SAT, FLGT, TNIAEA and QRAGRAGR) (Fig. 2). Except DHX36, which contains DEIH in motif II all the other helicases contain DEAH in this motif (Fig. 2).
https://static-content.springer.com/image/art%3A10.1007%2Fs11103-010-9632-5/MediaObjects/11103_2010_9632_Fig2_HTML.gif
Fig. 2

Protein alignment of RNA helicases encoded by the rice genome. Alignment was generated using ClustalX for 5 representatives of RNA helicases where 8 helicase motifs (GxxGxGKT, TQPRRV, TDGML, DExH, SAT, FLTG, TNIAEA and QRAGRAGR) can be identified. Note that a glycine (G) residue is found in these helicases instead of commonly occurring alanine (A) in the motif-I. The names of helicase proteins are indicated on left and amino acid positions on right. Only a part of entire protein alignment is shown where asterisks and dots indicate conserved, and conservative amino acid changes, respectively. Gaps in the amino acid sequences are introduced to improve the alignment

DNA helicases

A total of 31 DNA helicases are encoded by the rice genome (Table 2, Supplementary Table 2). This list includes homologues of well-characterized DNA helicases like RecQ, Bloom (BLM), Werner (WRN), DNA helicase 2 subunit 1 (Ku80), minichromosome maintenance proteins (MCMs), nucleolin (NCL), ATP-binding protein (ABP), recombination defective (recG), sucrose non-fermenting protein 2 (SNF2), superfamily II DNA helicase (SF2), DNA repair helicase (XPB2) and ultraviolet hypersensitive 6 (UVH6) (Table 2, Supplementary Table 2). Several of these DNA helicases are encoded by single loci with the exception of MCM3 which is encoded by 2 loci and 4 loci encode for Werner helicase protein (Table 2). Chromosome 5 and 1 contain highest number of these helicase genes (six and five respectively). Chromosome 2 and 4 contain four, 6, 7 and 11 contain three, 3, 9 and 12 each contains one gene, respectively (Supplementary Table 2). The list of DNA helicases includes 6 RecQ family members [RecQ1 or BLM helicase, RecQ2 or RecQ4A, RecQl3, RecQ4B and RecQSim] (Table 2, Supplementary Table 2). The RecQSim which is found exclusively in plants is also present in rice and contains an insertion of ~100 amino acids in the helicase domain. All members of hexameric MCM DNA helicases (MCM2-7) and also the homologs of MCM8, MCM9 and MCM10 were identified in rice (Table 2).
Table 2

A list of 31 DNA helicases with their corresponding gene symbol, the number of encoding loci, and Locus IDs from the rice genome

DNA helicases type

Gene symbol

No. of encoding loci

Locus ID

DEAD-Box Protein (RecQ-like 3)

DBP (RecQl3)

1

LOC_Os02g54020

ATP Binding Protein

ABP

1

LOC_Os06g33520

RECQ4B

ATSGS1

1

LOC_Os04g35420

Bloom (RECQ1) helicase

BLM

1

LOC_Os11g44910

DNA helicase 2 subunit 1

ku80

1

LOC_Os07g08729

DNA helicase Q1 (RECQ4A = RECQ2)

RECQ4A

1

LOC_Os11g48090

Eukaryotic Initiation Factor 4A

eIF4A

2

LOC_Os02g05330, LOC_Os06g48750

Eukaryotic Initiation Factor 4A-3

eIF4A-3

2

LOC_Os01g45190, LOC_Os03g36930

Mini Chromosome Maintenance (MCM)

MCM2

1

LOC_Os11g29380

MCM3

2

LOC_Os05g08100, LOC_Os05g39850

MCM4

1

LOC_Os01g36390

MCM5

1

LOC_Os02g55410

MCM6

1

LOC_Os05g14590

MCM7

1

LOC_Os12g37400

MCM8

1

LOC_Os05g38850

MCM9

1

LOC_Os06g11500

MCM10

1

LOC_Os09g36820

Nucleolin

NCL

1

LOC_Os04g52960

Recombination-defective

recG

1

LOC_Os02g48100

RecQSim

RecQSim

1

LOC_Os05g05810

Sucrose Non-Fermenting protein 2

SNF2

1

LOC_Os07g46590

SuperFamily II DNA helicase

SF2

1

LOC_Os07g48360

Werner helicase

WRN

4

LOC_Os01g19430, LOC_Os01g53580, LOC_Os04g03990, LOC_Os04g14810

Xeroderma Pigmentosa group B2

XPB2

1

LOC_Os01g49680

UltraViolet Hypersensitive 6

UVH6

1

LOC_Os05g05260

Of the enlisted 31 loci for DNA helicases, 25 were unique (Table 2). The protein alignment studies indicate that 5 RecQ family helicases in rice (RecQ1 or BLM helicase, RecQ2 or RecQ4A, RecQl3, RecQ4B) contain the DEAH motif similar to human BLM syndrome protein while the RecQSim contain the DEVH motif (Fig. 3a). The conserved motifs in these helicases were MPTGGGKSL, VISPLRSL, DExH, TATA, SGIIYC, TIAFGMGID and QESGRAGR (Fig. 3a). The un-rooted phylogenetic tree constructed based on the NJ-method revealed interesting relationships among different RecQ DNA helicases (Fig. 3b). The closely related members were RecQ4B and RecQ4A whereas rice BLM, RecQ1 and human BLM emerged as single clade (Fig. 3b). The homology based modeling of RecQSim using 1oywA (E. coli RecQ catalytic core) (Bernstein et al. 2003) is shown in Fig. 3c(i–iii). The structural predictions were made using Swiss-model server (Arnold et al. 2006). The data showed that the two proteins contain only 25% sequence identity. Molecular graphics images were produced using the UCSF Chimera package from the resource for biocomputing, visualization and informatics at the university of California, San Francisco (supported by NIH P41 RR-01081) (Pettersen et al. 2004). The ribbon diagram of the template is shown in Fig. 3c(i) and the predicted structure is shown in Fig. 3c(ii). The superposition shows that the structures partially overlap (Fig. 3c(iii)).
https://static-content.springer.com/image/art%3A10.1007%2Fs11103-010-9632-5/MediaObjects/11103_2010_9632_Fig3_HTML.gif
Fig. 3

Protein alignment of rice RecQ DNA helicases and their phylogenetic analysis. a The figure shows distribution of helicase motifs in the RecQ members of rice aligned with human BLM syndrome protein (NP_000048), also a member of RecQ family. Conserved motifs are indicated on top of the plot. The names of RecQ helicases are given on left and positions of amino acids on right. Only a part of entire protein alignment is shown where asterisks and dots indicate conserved and conservative amino acid changes, respectively. Gaps in the amino acid sequences are introduced to improve the alignment. b Phylogenetic analysis of RecQ DNA helicases constructed on protein alignment using ClustalX and viewed with MEGA4 software. The un-rooted tree was generated based on NJ-method. The numbers on each node are bootstrap values for 100 replicates. Scale bar shows 0.05 substitutions per sequence position. c The modeled structures of the template (i), rice RecQsim (ii) and the superimposition (iii) of the structures are shown. The details are described in text

Other types of helicases

The list of helicases that belong to the category of ‘others’ includes members that have not been annotated as RNA or DNA helicases (Supplementary Table 3). These include 11 members like helicase conserved C-terminal domain containing protein (CTD 1–5), regulator of telomere elongation helicase 1, ATP-dependent helicase yprA, helicase associated domain family protein (expressed HADfp), ATP binding protein, ATP-dependent helicase C582.10c, and helicase C6F12.16c (Supplementary Table 3).

Helicase gene family members in Arabidopsis

Based on putative function search and the presence of conserved helicase motifs we have identified 113 helicase genes encoded by the Arabidopsis genome (Supplementary Table 4). These include several prominent RNA and DNA helicases that were also identified in rice, like the batch of 58 DEAD/DEAH-box RNA helicases (RH1 to RH58), PRH75, 6 of the RecQ helicases (RecQL1, RecQL2, RecQ-like 3, AtRecQ4A, AtRecQ4B and RecQSim), HEN2 helicase, DNA repair helicases XPB1 and XPB2, chromatin remodeling helicases, RNA helicase LOS4, DNA helicase MER3, DNA helicase homolog PIF1, eukaryotic translation initiation factors (eIF4A and eIF4A1), helicase domain-containing proteins, Ku80 and Ku70 heterodimer, MCM family proteins, nucleolin, SNF2 domain containing protein, U5.snRNA, stress response suppressor (STRS) 1 and 2, and SDE3 helicase (Supplementary Table 4).

Identification of homologs for rice helicase in Arabidopsis, yeast and human

In this study, we focused on computational search for novel rice helicase homologs in Arabidopsis, yeast and human. With BLASTp searches close homologs of rice RNA and DNA helicases in Arabidopsis, yeast and humans were identified (Tables 3, 4). A total of 64 unique helicase proteins (39 RNA helicase and 25 DNA helicases) from rice were selected for this study. Based on homology search with rice helicase protein sequences as queries, we identified rice homologs for non-redundant sets of 45 Arabidopsis, 42 yeast and 53 human helicase proteins. For Arabidopsis, proteins like DHX8, MCM3, pre-mRNA splicing factor RNA helicase, MCM7, RH2, RH13, RH20, RH30, RH31, RH41 and RH46 were obtained as redundant search hits. All homologs were procured using Swiss-Prot (www.expasy.ch/sprot) except for DNA helicase 2 subunit 1 and nucleolin where their corresponding Arabidopsis counterparts were traced in the UniProt web site (www.uniprot.org). The BLASTp searches did not yield a possible rice MCM10 homolog in Arabidopsis (Table 3) although this gene is present in Arabidopsis (Locus ID AT2G20980). For yeast, the redundant Swiss-Prot entries were for CDC54, DBP1, DBP2, DBP3, DBP5, HAS1, MCM6, PRP22, SGS1 and YLR419W (Table 3). For human, the following proteins were obtained as redundant hits – BLM syndrome protein, DDX8, DDX10, DDX18, DDX5, and Werner syndrome helicase (Tables 3, 4). The homologs of several prominent subfamilies of rice helicases like DDX, DHX, MCM, DBP and RecQ were identified in Arabidopsis, yeast and human (Tables 3, 4).
Table 3

A list of rice helicases and their corresponding homologs in Arabidopsis, yeast and human

S. no.

Rice helicase

Arabidopsis

Yeast

Human

1

ATP-binding protein

RH31 (Q9FFQ1)

HAS1 (Q03532)

DDX10 (Q13206)

2

BAT1

RH56 (Q9LFN6)

SUB2 (Q07478)

UAP56 (Q13838)

3

BLM.RecQ1

RH46 (Q9LYJ9)

SGS1 (P35187)

Bloom syndrome protein (P54132)

4

DB10

RH46 (Q9LYJ9)

DBP2 (p68-like protein) (P24783)

DDX5 (RNA helicase p68) (P17844)

5

DBH

RH50 (Q8GUG7)

HAS1 (Q03532)

DDX18 (Q9NVP1)

6

DBP2

RH30 (Q8W4R3)

DBP2 (p68-like protein) (P24783)

DDX5 (RNA helicase p68) (P17844)

7

DBP3

RH5 (Q9C551)

DBP3 (P20447)

DDX5 (RNA helicase p68) (P17844)

8

DBP4

RH32 (Q9FFT9)

DBP4 (P20448)

DDX10 (Q13206)

9

DBP5

RH38 (Q93ZG7)

DBP5 (P20449)

DDX19A (Q9NUU7)

10

DED1

RH37 (Q84W89)

DBP1 (P24784)

DDX3Y (O15523)

11

dhh1

RH8 (Q8RXK6)

DHH1 (P39517)

DDX6 (P26196)

12

DNA helicase 2 subunit 1

Ku70-like protein (Q9FQ08)*

DNA helicase II subunit 1 (P32807)

DNA helicase 2 subunit 1 (P12956)

13

DRS1

RH28 (Q9ZRZ8)

DRS1 (P32892)

DDX27 (Q96GQ7)

14

eIF4A

eIF4A-1 (RH4) (P41376)

eIF4A (P10081)

eIF4A-1 (P60842)

15

eIF4A3

RH2 (Q94A52)

FAL1 (Q12099)

eIF4A-III (P38919)

16

HAS1

RH31 (Q9FFQ1)

HAS1 (Q03532)

DDX18 (Q9NVP1)

17

MAK5

RH13 (Q93Y39)

MAK5 (P38112)

DDX24 (Q9GZR7)

18

MCM2

MCM3 homolog (Q9FL33)

MCM2 (P29469)

MCM2 (P49736)

19

MCM3

MCM3 homolog (Q9FL33)

MCM3 (P24279)

MCM3 (P25205)

20

MCM4

PROLIFERA (P43299)

CDC54 (P30665)

MCM4 (P33991)

21

MCM5

MCM3 homolog (Q9FL33)

MCM5 (P29496)

MCM5 (P33992)

22

MCM6

PROLIFERA (P43299)

MCM6 (P53091)

MCM6 (Q14566)

23

MCM7

PROLIFERA (P43299)

CDC47 (P38132)

MCM7 (P33993)

24

MCM8

PROLIFERA (P43299)

MCM6 (P53091)

MCM8 (Q9UJA3)

25

MCM9

MCM3 homolog (Q9FL33)

CDC54 (P30665)

MCM9 (Q9NXL9)

26

MCM10

n/f

MCM10 (P32354)

MCM10 (Q7L590)

27

NRH2

RH3 (Q8L7S8)

DBP1 (P24784)

Nucleolar RNA helicase 2 (DDX21) (Q9NR30)

28

Nucleolin

Nucleolin (Q1PEP5)*

PUB1 (P32588)

RNA-binding protein 34 (P42696)

29

PRP5

RH41 (Q3EBD3)

DED1 (P06634)

DDX59 (Q5T1V6)

30

PRP16

DHX8 (Q38953)

PRP16 (P15938)

PRP16 (Q92620)

31

recG

RH2 (Q94A52)

DBP5 (P20449)

DDX17 (Q92841)

32

RecQ4A

RH20 (Q9C718)

SGS1 (P35187)

DNA helicase Q1 (RECQ1) (P46063)

33

RecQ4B

RH20 (Q9C718)

SGS1 (P35187)

Bloom syndrome protein (P54132)

34

RecQl3

RH14 (Q8H136)

SGS1 (P35187)

DNA helicase Q5 (RECQ5) (O94762)

35

RecQsim

RH41 (Q3EBD3)

SGS1 (P35187)

Werner syndrome helicase (Q14191)

36

rh1E

RH39 (Q56X76)

DBP2 (P24783)

DDX4 (Q9NQI0)

37

RNA helicase A

DHX8 (Q38953)

YLR419W (Q06698)

YTHDC2 (Q9H6S0)

38

RNA helicase

RH30 (Q8W4R3)

DBP3 (P20447)

DDX43 (Q9NXZ2)

39

SDE3

SDE3 (Q8GYD9)

NAM7 (P30771)

MOV-10 (Q9HCE1)

40

SEN1

Helicase UPF1 (Q9FJR0)

SEN1 (Q00416)

RENT1 (Q92900)

41

SF2

RNA helicase 3 (Q8L7S8)

SGS1

Werner syndrome helicase (Q14191)

42

SNF2

CHD3-type PICKLE (Q9S775)

CHD (P32657)

CHD potein 2 (O14647)

43

sf.RNAh

DHX15 (O22899)

PRP22 (P24384)

DHX8 (Q14562)

44

sfRNAh.F56D2.6

DHX8 (Q38953)

PRP22 (P24384)

DHX8 (Q14562)

45

spb4

RH18 (Q9FLB0)

SPB4 (P25808)

DDX55 (Q8NHQ9)

46

SUV3

RH53 (Q9LUW5)

SUV3 (P32580)

SUPV3L1 (Q8IYB8)

47

U5.snRNP

RH13 (Q93Y39)

BRR2 (P32639)

U5. snRNP 200 kDa helicase (O75643)

48

UVH6

DNA repair helicase UVH6 (Q8W4M7)

DNA repair helicase RAD3 (P06839)

TFIIH helicase subunit (P18074)

49

XPB2

XPB2 (Q9FUG4)

DNA repair helicase RAD25 (Q00578)

TFIIH helicase XPB subunit (P19447)

50

3′–5′ exonuclease

Werner Syndrome-like exonuclease (Q84LH3)*

SGS1 (P35187)

Werner syndrome (Q14191)

The Swiss-Prot IDs are given in brackets

* UniProt IDs; n/f—not found

Table 4

A list of rice DDX and DHX helicases and their homologs in Arabidopsis, yeast and human

S. No.

Rice helicase

Arabidopsis

Yeast

Human

1

DDX18

RH51 (Q9LIH9)

HAS1 (Q03532)

DDX18 (Q9NVP1)

2

DDX23

RH21 (P93008)

PRP28 (Helicase CA8) (P23394)

DDX23 (Q9BUQ8)

3

DDX41

RH35 (Q9LU46)

DBP2 (p68-like protein) (P24783)

DDX41 (Q9UJV9)

4

DDX46

RH42 (Q8H0U8)

PRP5 (P21372)

DDX46 (Q7L014)

5

DDX47

RH10 (Q8GY84)

RRP3 (P38712)

DDX47 (Q9H0S4)

6

DDX49

RH36 (Q9SA27)

DBP8 (P38719)

DDX49 (Q9Y6V7)

7

DDX51

RH1 (Q7FGZ2)

DBP6 (P53734)

DDX51 (Q8N8A6)

8

DDX52

RH57 (Q84TG1)

ROK1 (P45818)

DDX52 (Q9Y2R4)

9

DDX54

RH29 (O49289)

DBP10 (Q12389)

DDX54 (Q8TDD1)

10

DDX56

RH16 (Q9SW44)

DBP9 (Q06218)

DDX56 (Q9NY93)

11

DHX8

pre-mRNA-splicing (O22899)

PRP22 (P24384)

DHX8 (Q14562)

12

DHX16

DHX8 (Q38953)

PRP22 (P24384)

DHX8 (Q14562)

13

DHX35

DHX8 (Q38953)

PRP22 (P24384)

DHX35 (Q9H5Z1)

14

DHX36

pre-mRNA-splicing (O22899)

YLR419W (Q06698)

DHX36 (Q9H2U1)

The Swiss-Prot IDs are given in brackets

Profile HMM LOGO for helicase proteins

Since the level of sequence identity among the helicase proteins might vary, a multiple-sequence alignment approach might lead to errors to identify the conserved residues. For this reason, we studied pHMM to obtain reliable multiple-sequence alignment. We therefore constructed the pHHM with the help of a sequence alignment based on structural superposition of all helicase proteins in the Pfam database. To validate the multiple sequence alignments obtained with pHMM, the conservation and motif composition of helicase proteins was examined. A sequence LOGO view was generated based on pHMM (Fig. 4). In the sequence LOGO view, the height of the letter (amino acid) at each position represents the degree of conservation. The sequence LOGO view of the consensus helicase motif signatures revealed high degree of conservation for following amino acid residues, ‘Q’ in motif-Q (GFxxPxxIQ), ‘GK’ in motif-I (AxxGxGKT), ‘P’ in motif-Ia (PTRELA), ‘T’ in motif-Ib (TPGR), ‘DE’ in motif-II (DEAD), and ‘AT’ in motif-III (SAT) (Fig. 4).
https://static-content.springer.com/image/art%3A10.1007%2Fs11103-010-9632-5/MediaObjects/11103_2010_9632_Fig4_HTML.gif
Fig. 4

Sequence LOGO view of the consensus helicase motifs based on pHMM. The height of letter (amino acid) at each position represents the degree of conservation. The helicase motifs are indicated as bars on top of the plot

Expression profiling of Arabidopsis helicase genes

The log2 fold expression values for 113 genes in various stresses like anoxia, cold (3 independent replicates), drought, genotoxic, heat (2 independent replicates), hypoxia, osmotic, oxidative, salt and wounding were imported into the Genesis software. The hierarchical clustering of 113 different transcriptomes revealed expression patterns for helicase genes under ten different stress conditions (Supplementary Table 4). A dendrogram was constructed after integrating together the similar expression of genes into rows to form a cluster (Fig. 5). The heat map resulting from the clustering analysis showed high expression of large set of helicases under anoxia, cold and heat stresses (Fig. 5). The expression analysis revealed over expression of SDE3, RH55, chromatin remodeling 31, 3 genes for helicase domain containing proteins, RH18 and RH11 in drought stress; RecQl3, helicase-related, CHR31 and MCM8 in genotoxic stress; MEE29, RH42, helicase domain containing protein, SNF2, RH55 and MER3 in hypoxia (Fig. 5). In osmotic stress, MEE29, RH55, CHR31 and RH45 showed increased expression while in oxidative stress SDE3, helicase domain containing protein, RH28, RNA helicase DRH1 and RH37 were over expressed (Fig. 5). The genes that showed high expression in salt stress were MEE29, SNF2 domain containing protein, RH55, CHR31, CHR9, EDA16, RH30, RH40 and RNA helicase DRH1 (Fig. 5). In wounding stress, SNF2, CHR42, MER3 and PIF1 showed increased expression levels (Fig. 5).
https://static-content.springer.com/image/art%3A10.1007%2Fs11103-010-9632-5/MediaObjects/11103_2010_9632_Fig5_HTML.gif
Fig. 5

The expression profile of helicase gene family in Arabidopsis. The heat map generated following hierarchical clustering of 113 different transcriptomes revealed expression patterns for helicase genes under 10 different stress conditions

Discussion

Helicases are a ubiquitous family of DNA and RNA strand-separating enzymes found in various organisms ranging from prokaryotes to mammals. We have systematically classified the helicase gene family from rice and Arabidopsis into RNA and DNA helicases and compiled the expression pattern data for Arabidopsis helicase genes. The helicases from both these plants represent a large gene family. The presence of a much larger helicase gene family in plants suggests a predominant role for members of this gene family in modulating environmental responses. The DEAD-box RNA helicase family in Arabidopsis (Aubourg et al. 1999), and the evolution of intron/exon structure of DEAD helicase family genes in Arabidopsis, Caenorhabditis, and Drosophila was studied (Boudet et al. 2001). To the best of our knowledge, the complete analysis and classification of various families of helicase genes for rice and Arabidopsis described in this work has not been reported previously.

The number of helicase proteins appears to be under-represented in rice (64) as compared to Arabidopsis (113). It is noteworthy that a number of the helicase genes were redundant in rice which might be the result of either tandem or segmental duplications which leads to large number of duplicated portions in rice genome (Yu et al. 2005). Similar to other organisms, the rice helicase proteins also exhibit a large diversity in the N-and-C terminal regions but the protein sequences in the catalytic motifs are relatively well conserved. The protein alignments of the 64 rice proteins showed that 39 RNA helicases may be classified into 4 different structural groups representing the ones with nine helicase motifs for 26 members, with eight motifs (GxxGxGKT, TQPRRV, TDGML, DEAH, SAT, FLGT, TNIAEA and QRAGRAGR) for 8 members (sfRNAh, DHX35, DHX8, PRP16, sfRNAh.F56D2.6, DHX16, DHX36 and RNA helicase A), and with conserved AxxGxGKT, DExD/H for 2 members (U5.snRNA and SUV3), while in this category the SAT motif is absent in SUV3. The rest 3 members where no helicase motifs were identified are RNAh, SDE3 and SEN1. The SDE3 (Dalmay et al. 2001) and SEN1 (Ursic et al. 1997) however were characterized as helicases. The helicase RNAh is encoded by 2 loci (LOC_Os11g09060 and LOC_Os12g05230) and is identified as ATP-dependent RNA helicase in the RAP database.

As compared to other systems, the RNA helicase family in plants is larger and more diverse (Linder and Owttrim 2009). It is implicated to play role in variety of physiological functions (Linder and Owttrim 2009). The disruption of RNA helicases causes severe effects on plant growth and development; inhibit processes involving RNA maturation and turnover and post-transcriptional gene silencing (Linder and Owttrim 2009). Till date, 13 RNA helicases have been shown to be linked with the metabolism of aberrant and silencing RNA and have been reported to affect cell fate, plant development, stress responses, gene silencing at the transcriptional and post-transcriptional levels, detection and decay of aberrant RNA, RNA maturation and protein quality control (reviewed in Linder and Owttrim 2009). The DExH box RNA helicases, Carpel factory/Dicer-like 1 (DCL1) (Park et al. 2002) and the DEAD-box RNA helicase, LOS4, STRS1 and STRS2 (Kant et al. 2007) and AvDH1 (Liu et al. 2008) are shown to be involved in plant stress responses. The los4-1 mutant was more sensitive to cold stress (Gong et al. 2002) while the los4-2 mutant shows more tolerance to freezing stress but is more sensitive to heat stress (Gong et al. 2005).

An abundant protein, eIF4A, a well characterized RNA/DNA helicase, has long been considered as the canonical helicase for eukaryotic translation initiation (Linder, 2006). In rice, there are 2 eIFs – eIF4A and eIF4A-3. Earlier it was shown that over expression of pea DNA helicase 45 (PDH45), which shares striking homology (> 80% at amino acid level) with eIF4A-3, imparts salinity tolerance in tobacco (Sanan-Mishra et al. 2005). It has also been reported previously that the cold- and salinity stress-induced bipolar, dual pea DNA helicase 47 (PDH47) is involved in protein synthesis and its activity is stimulated by phosphorylation with protein kinase C (Vashisht and Tuteja 2005; Vashisht et al. 2005). The expression patterns of RNA helicases, AtRH9 and AtRH25, in Arabidopsis were up-regulated in response to cold stress, whereas their transcript levels were down-regulated by salt or drought stress (Kim et al. 2008).

Several of the known DNA helicases such as the RecQ family of helicases are involved in DNA repair and maintenance of genome stability. The RecQ helicases are a family of DNA-unwinding enzymes that are conserved from prokaryotes to mammals (Pike et al. 2009). It is interesting to note that each organism contains different number of RecQ family members and various multicellular organisms sometimes contain several homologues of these helicases. Similar to rice, the RecQ helicase family has 6 representative members in Arabidopsis (Hartung et al. 2000). The RecQSim, a RecQ helicase with interrupted helicase domain, is also present in rice. Our studies in rice revealed common structural feature in this family of enzymes with central domain of ~350–400 amino acids, which contains the nine signature helicase motifs including the DExH-box. The 5 true RecQ helicases were characterized in Arabidopsis (Hartung et al. 2000). It has been shown that Arabidopsis RecQ2 disrupts D-loop structures and the Holiday junction in vitro (Kobbe et al. 2008). In Arabidopsis, RecQ2 catalyzes Holliday junction branch migration and replication fork regression, while RecQ3 can not act on intact Holliday junctions thus suggesting that these RecQ helicases perform different functions in the cell (Kobbe et al. 2009). Moreover, it was also shown that AtRecQ3 differs in its biochemical properties from all other eukaryotic RecQ helicases characterized so far (Kobbe et al. 2009).

Another group of DNA helicases are MCMs that are implicated for initiation and elongation during DNA replication in cells (Brewster et al. 2008). In eukaryotes, the MCM proteins consist of a subgroup of six homologous proteins (MCM2-7) that belong to AAA+ ATPase family (Neuwald et al. 1999). In rice, nine MCM proteins are present including MCM2-7, MCM8, MCM9 and MCM10. The MCM8 and 9 are widely distributed in eukaryotes while MCM10 is present in most eukaryotes (Liu et al. 2009). The MCM genes from Arabidopsis (Springer et al. 1995, 2000) and maize (Sabelli et al. 1996) are preferentially expressed in young tissues (Shultz et al. 2009). The homozygous Arabidopsis mutants of MCM7 homolog, prolifera (PRL) are embryonic lethal (Springer et al. 2000), and heterozygous mutants display improper cytokinesis (Holding and Springer 2002). Recently it was shown that subunits of MCM2-7 complex are coordinately expressed during Arabidopsis development and that the complex in plants remains in the nucleus throughout most of the cell cycle and is only dispersed in mitotic cells (Shultz et al. 2009). Our recent unpublished observations indicate that single subunit MCM6 from pea acts as a DNA helicase alone and it also promotes salinity stress tolerance without yield penalty in planta (Tuteja group, unpublished data).

The rapidly expanding sequence databases provide a powerful tool for identifying putative homologous proteins by database searches from different organisms. In this report, homologs of yeast and human helicases in rice and Arabidopsis have been identified in silico. Our search results are consistent with those reported recently where the helicase family in malaria parasite, Plasmodium falciparum, was studied and homologs for its members were identified in yeast and human (Tuteja, 2010).

An approach to study the functions of gene family members will be to study the expression profiles of all genes within a gene family. In order to increase the efficiency of functional analyses, the microarray derived expression of unique genes will be of high relevance. We studied the transcriptome of the entire helicase gene family in Arabidopsis. This study provides information on sets of candidate genes involved in specific stress response. Our results reveal that expression of helicase genes is controlled by diverse environmental responses and that co-expression analysis is an effective strategy for elucidating the functions of gene families in plants. The initial in silico analysis has revealed that the helicase family members are differentially regulated under stress conditions in rice as well. Our studies present the foundation for an in depth and detailed analysis of this important gene family in plants.

Acknowledgments

Work on helicases and plant stress tolerance in N.T.’s laboratory is supported partially by the Department of Science and Technology (DST), Government of India and Department of Biotechnology (DBT), Government of India.

Supplementary material

11103_2010_9632_MOESM1_ESM.doc (269 kb)
Supplementary material 1 (DOC 269 kb)

Copyright information

© Springer Science+Business Media B.V. 2010