Introduction

Abiotic stresses like drought, salinity, cold and high temperature are the predominant environmental factors limiting productivity of crop plants (Breshears et al. 2005; Schroter et al. 2005). Plants respond to these environmental cues at molecular level by altering expression of different sets of genes (Qureshi et al. 2007; Tran et al. 2007). Expression of such genes is mainly regulated through transcriptional control process, while post-transcriptional and post-translational processes also play a crucial role. Transcriptional control machinery appears to be conserved among plant species (Hakimi et al. 2000; Hirt et al. 1990). It is well established from different experiments over past decades that promoters containing a particular cis-element respond to a specific trigger (Chinnusamy et al. 2003; Viswanathan and Zhu 2002; Yamaguchi-Shinozaki and Shinozaki 2005; Zhou et al. 2007). Combinatorial interactions of cis-acting DNA elements in the promoters with trans-acting protein factors are key processes governing spatio-temporal gene expression (Bustos et al. 1991; Hartmann et al. 2005; Hauffe et al. 1993). At an organism level, vast array of molecular genetic networks are operational in a very complex and dynamic mode. Complete understanding of the molecular genetic networks is a long cherished goal of system biologists (Chinnusamy et al. 2004; Li et al. 2006a). Targeted modification of molecular genetic networks has a tremendous potential for engineering tailor made elite genotypes “by-design.”

Availability of genome sequence of a crop plant like rice offers a new challenge and opportunities to explore the genetic mechanisms that regulate gene expression in response to various developmental and environmental cues. Rice is also closely related to other crop plants like wheat, maize, barley, sugarcane, oat, and sorghum, etc. A high degree of genomic synteny is conserved across different member species in the family Gramineae (Buell et al. 2005; Goff et al. 2002). Hence, rice is an ideal model to study complex gene regulation data coupled with comparative sequence information using computational tools. This type of study will give an insight to map, predict and decipher gene regulation mechanisms and functional classification of genes. Recently, a fare amount of large scale gene expression datasets have become available in rice in response to various stresses (Rabbani et al. 2003; Yazaki et al. 2003). However, the available data is not sufficient to do meta-analysis. Nevertheless, the rice gene expression data can be suitably integrated with promoter structure to find out it’s possible correlations (Benedict et al. 2006; Li et al. 2006b).

Roles of ABA in physiological, developmental and adaptive processes in plants are well known. Endogenous level of ABA is induced in response to various biotic (pathogen attack) and abiotic stresses (Fujita et al. 2006; Verslues and Zhu 2007). Exogenous application of ABA to plants mimics various stresses in term of co-expression of different sets of genes (Destefano-Beltran et al. 2006; Loik and Nobel 1993). The gene regulation in response to the elevated levels of ABA is mainly modulated by transcriptional control. Various studies suggest that co-expressed genes are likely involved in a common biological process. Integration of co-expression data with promoter structure in plants shows that promoters of co-expressed genes share a common cis-regulatory element (Kim and Kim 2006; Reiss et al. 2006; Werner 2001). Various algorithms such as expectation maximization (MEME) and Gibbs sampling have been used to search motifs that are over-represented within the set of related biological sequences (Bailey and Elkan 1994; Bailey et al. 2006; Thompson et al. 2003). Evidences from the promoter dissection and transcription factor binding experiments are the main references to evaluate the strength and confidence of computational methods. A handful of molecular dissection experiment like deletion and linker scanning analysis have pinpointed the ABA responsive elements (ABREs), also termed as G-box, C-box or G-box/C-box hybrids within promoters of the ABA responsive genes. ABRE contains ACGT as a core nucleotide sequence, which acts as a binding site for bZIP family transcription factors governing transcriptional regulation of ABA responsive genes (Guiltinan et al. 1990; Hattori et al. 2002; Mundy et al. 1990; Ross and Shen 2006; Shen and Ho 1995). ABREs are also coupled to the non-ACGT coupling elements like CE1, CE3, DRE, O2S, motif III, or ACGT core containing ABRE itself (Hobo et al. 1999; Shen et al. 1996; Singh 1998). As ABA plays crucial role in various signaling processes, it is logical to expect that other stress responsive integration points within ABA responsive promoters also govern gene regulation. In case of rice, genome-wide binding experiments like chromatin-immunoprecipitation coupled with microarray (Chip–chip) are lacking. Several databases like PLACE and Plant-CARE among others provide experimental evidences regarding cis-elements and transcription factors in plants (Higo et al. 1999; Lescot et al. 2002).

We present here, a targeted gene finding approach on a genome-wide scale in rice. Our prediction is based on over-represented ACGT core containing consensus motif found in co-expressed ABA responsive genes in vegetative tissues. Experimental verification by RT-PCR proves high accuracy (80%) of our integrated prediction method in the rice genome. Database mining suggested expression of the predicted genes in response to abiotic stresses as well. Among the diverse functional categories of genes, GO analysis showed the enrichment of the stress related and ABA signaling pathway genes among the genes predicted in this study.

Materials and methods

ABA responsive genes and random sequence datasets

From published microarray data, 105 genes showing two fold or more up-regulation in rice seedlings in response to ABA treatment were identified (Rabbani et al. 2003; Yazaki et al. 2003). Sequences of these genes were downloaded from NCBI database (http://www.ncbi.nlm.nih.gov/) and blasted with TIGR rice c-DNA sequences and corresponding loci were listed. 1 kb upstream sequences (promoters) from translational start site ATG, were downloaded from TIGR (http://www.tigr.org/plantProjects.shtml) Oryza sativa (Release 4.0; January 12, 2006) (Ouyang et al. 2007). Similarly other genomic sequences of rice and Arabidopsis used here were retrieved from the TIGR and TAIR databases, respectively. ABA up-regulated genes (692) in Arabidopsis identified by Li et al. (2006a, b), were considered in this study to search the presence of predicted CGMCACGTGB motif (Li et al. 2006b). Scuffled sequences were generated by randomly taking five ABA responsive promoters and scuffled 100 times using “Sequence Manipulation Suite” (http://www.bioinformatics.org/sms2/shuffle_dna.html). Other random sequence data sets used here were also generated by using “Sequence Manipulation Suite” (http://www.bioinformatics.org/sms2/random_dna.html).

Motif discovery and motif search

From several algorithms available, we chose the expectation maximization method MEME (Version 3.5.7) (http://meme.nbcr.net/meme/intro.html) for motif discovery using its default setting for a minimum and maximum width of a single motif as 10. The relevance of discovered motifs was analyzed using PLACE (http://www.dna.affrc.go.jp/PLACE/) (Bailey and Elkan 1994; Higo et al. 1999). The motifs obtained from analyzed sequences were plotted according to their positions within the regions and their consensus sequences were graphed using WebLogo3: Public Beta (http://weblogo.berkeley.edu/logo.cgi) (Crooks et al. 2004). Perl and JavaScript were used to search the perfect match of the target motifs within the rice, Arabidopsis, random and scuffled sequences. We have considered only one orientation while searching motifs here.

Analysis of gene ontology of predicted genes

To determine whether the occurrence of the discovered motifs were associated with specific gene functions, we retrieved the Plant GOSlim Assignment of rice Proteins from TIGR Database (http://www.tigr.org/tdb/e2k1/osa1/GO.retrieval.shtml) and correlated the annotated molecular function, biological process or cellular component of predicted and control data sets of rice genes.

Plant growth conditions, RNA sampling and RT-PCR analysis

Rice cultivar Nagina-22 (Oryza sativa) seeds were grown for 14 days at 28°C, 80% RH, and 12/12 h light/dark period in phytotron glass house. Sterile absorbent cotton soaked with Hoagland’s solution was used as seed bed for growing rice seedlings. Plants were irrigated with Hoagland’s solution at 3 days interval as supplemental irrigation.

Leaf and root samples from seedling were collected after 3, 5, and 24 h of 100 μM ABA treatment. ABA was also sprayed at every 2 h intervals from the first spraying. For treating roots, plants were submerged up to 1 cm above seed bed level in 100 μM ABA for 3, 5, and 24 h. RNA was extracted using TRI Reagent (Ambion, Inc. USA) and pooled from at least 20 independent controls and treated plant samples, respectively and was treated with DNase-I (QIAGEN GmbH, Germany). Subsequently RNA cleanup was carried out using RNeasy Plant Mini Kit (QIAGEN GmbH, Germany).

For RT-PCR analysis first strand c-DNA was synthesized by using 2 μg of total RNA using Superscript-III reverse transcriptase (Invitrogen, USA) with oligo(dT)20 primer following manufacturer’s instructions. Two microliter of c-DNA was used in 25 μl of reaction volume with the following PCR conditions to study the gene expression: 30 cycles of 94°C for 1 min, annealing temperature according to melting temperature of primers for 1 min, and 72°C for 1 min, and then final extension at 72°C for 10 min. List of primer sets used in the study are given in the supplemental table (Additional Table 1). Out of the predicted 402 genes, randomly selected 45 genes tested here for RT-PCR analysis were not from the list of initial expression datasets (except LOC_Os01g02120 and LOC_Os02g43330 which are used as test control); hence their responsiveness to ABA is virtually un-known. Similarly 15 genes were used as negative control without having CGMCACGTGB motif. Quantitative estimation of RT-PCR amplicon on the gel was calculated as integrated density value (IDV) using AlphaEaseFC software. Accuracy percentage of our prediction was calculated using the conversion:

Accuracy (%) = (Number of genes responsive to ABA detected through RT-PCR/total number of genes tested for RT-PCR) × 100.

Stress related expression data mining and phenotype searching

To analyze the expression of predicted genes in response to cold, drought and salt stresses, the physical position of these loci in the rice pseudomolecules were retrieved and searched against rice in the PlantQTL-GE database (http://www.scbit.org/qtl2gene/new/plantqtl-ge.html) (Zeng et al. 2007). All the above genes were searched for available phenotype in the Rice Tos17 Insertion Mutant Database, if there is an insertion mutation (http://tos.nias.affrc.go.jp/) (Miyao et al. 2007).

Results

ABRE consensus motif discovery and gene prediction

Response of plants to a particular trigger might be mediated by a common transcriptional regulatory mechanism, hardwired by cis-acting elements as proven in other model species (GuhaThakurta et al. 2002; Wolfsberg et al. 1999; Zhang et al. 2005). The cis-acting DNA elements are generally degenerative in nature and difficult to discover from the background but, ACGT-core containing ABRE was defined as ACGTGKC, which matched very well with the consensus derived from sequence comparison of ABA-responsive promoters in rice (Hattori et al. 2002). A genome wide computational prediction has successfully classified ABA responsive genes in Arabidopsis (Zhang et al. 2005). The modular arrangement of ABRE with its one of the coupling elements CE3 shows a clear divergence pattern in Arabidopsis and rice genome (Gomez-Porras et al. 2007). The ACGT core of ABRE is conserved in promoters of ABA responsive genes across monocot and dicot plant species. For example, for rice, maize and Arabidopsis ABRE are CGTACGTGTC, GACGTG, and CCACGTGG, respectively. However, this is not an exclusive list of ABREs in these species. We considered ABA responsive genes showing two fold or more up-regulation from the published expression profiling data for promoter analysis (Rabbani et al. 2003; Yazaki et al. 2003). These genes were aligned to TIGR gene model and the loci showing a perfect match (identity = 100%) were considered (Additional Table 2). The 1 kb upstream region from translational start site (ATG) of the selected genes was analyzed individually by PLACE, and among many other putative stress responsive cis-elements, ACGT core containing ABREs were found to be maximum (data not shown). The occurrence of PLACE-derived ABREs was highest within the 400 bp upstream region from the translational start site (Fig. 1). PLACE has documented 39 ABREs so far. Additional Fig. 1 shows the PLACE derived ABRE consensus which implicate that it is difficult to derive a clear cut consensus beyond ACGT core in the PLACE reported ABREs. We discovered over represented motifs in the promoters of co-expressed genes using expectation maximization algorithm MEME, which is considered as one of the best motif-sampling tool (Bailey and Elkan 1994). Using MEME, the ACGT core containing CGMCACGTGB motif was discovered as the top and best suited motif for genome-wide prediction of ABA responsive genes by partial interactive approach from the data sets used in this study (Fig. 2). PLACE analysis of this motif revealed that it consists of ABREs described for Arabidopsis thaliana, Oryza sativa, Lycopersicon esculentum, Triticum aestivum, Zea mays, Brassica napus, and Phaseolus vulgaris in PLACE (Table 1). Data mining from literature confirmed the sampled motif to be a strong ABRE (Busk et al. 1999; Shen and Ho 1995). The sampled ABRE motifs from promoters of co-expressed genes were used together and plotted according to their positions within the motif regions and their consensus was derived and plotted using WebLogo (Additional Fig. 2) (Crooks et al. 2004).

Fig. 1
figure 1

Distribution of PLACE derived ABRE motif. The distribution of PLACE derived ABRE motif (ACGTG) in promoters (1 kb upstream of ATG) of co-expressed genes compared to predicted genes, 1 kb scuffled ABA responsive promoter sequence, 1 kb rice coding DNA sequences (CDS), and 1 kb randomly sampled Arabidopsis promoters

Fig. 2
figure 2

MEME generated ABRE motif. ACGT core containing MEME generated ABRE sampled from co-expressed genes promoter element

Table 1 PLACE description of ABRE motifs in the consensus MEME derived ABRE

Here we explored a consensus of nucleotides flanking the ACGT core which was a decamer having a typical G-box (CACGTG) (position 4–9 WebLogo, (Additional Fig. 2). Using this top CGMCACGTGB motif, we predicted 402 protein coding genes as potential ABA responsive genes in the TIGR Rice Annotation (Release-4) model using Perl script. As this prediction strategy was stringent and only based on perfect match of cis-element, among these 402 predicted genes 392 genes were unique and independent from initial co-expressed gene data set. Table 2 shows MEME generated motifs from co-expressed genes, PLACE description of these motifs, and occurrences among the 402 predicted ABA responsive genes. Two other top motifs such as WWTTTTTYTW and SSSYGGCGSC were sampled as over-represented and found to be present in at-least one-third of the predicted genes (Table 2). However, motif WWTTTTTYTW and SSSYGGCGSC appears to be a common feature of rice genome as these motifs were sampled considerably from other control data sets such as randomly generated sequences, introns, coding DNA sequences (CDS) and intergenic region of rice (data not shown). Hence we did not consider WWTTTTTYTW and SSSYGGCGSC as real motifs for targeted gene prediction here.

Table 2 MEME generated top three motifs from the co-expressed genes

Analysis of structural and occurrence rareness of the ABRE

To investigate the structural and occurrence rareness of ABRE motif, we have analyzed different sets of sequence such as 1 kb upstream sequences of co-expressed genes, predicted genes, 1 kb scuffled ABA responsive promoter sequence, 1 kb rice CDS, and 1 kb randomly sampled Arabidopsis promoters. The distribution of ABRE (ACGTG) was found to be similar between initial co-expressed genes used to derive ABRE-consensus and predicted genes based on ABRE-consensus; whereas the distribution pattern of ABRE differs among 1 kb scuffled ABA responsive promoter sequence of rice, 1 kb rice CDS and 1 kb randomly sampled Arabidopsis promoters as expected. However, enriched PLACE derived ABRE motif (ACGTG) shows a biased distribution in the promoters of predicted genes with a similar pattern of distribution as compared to co-expressed promoter sets. The occurrence of ABRE motifs was found to be higher within the 400 nucleotide upstream to translational start site (Fig. 1). Subsequently, we have checked the occurrence of the predicted motif within the 5′-UTR of the co-expressed and predicted genes. There was no CGMCACGTGB motif found within the 5′-UTR of the co-expressed genes, whereas only 7 predicted genes (1.7%) contain this motif in 5′-UTR. Hence it shows another common structural similarity between co-expressed and predicted genes. We considered the 1 kb upstream region from translational start site (ATG) of published ABA up-regulated (692) Arabidopsis genes (Li et al. 2006b) to analyze the enrichment of predicted CGMCACGTGB motif. Only 7.5% of the ABA up-regulated Arabidopsis genes contain this motif predicted for ABA responsive genes in rice. To further confirm the rareness of ABRE (CGMCACGTGB) motif, sets of 402 sequences each were analyzed within the 1 kb length such as randomly generated sequences, introns, CDS and intergenic region of rice. As compared to predicted promoters (100%) the occurrence of ABRE (CGMCACGTGB) was found to be 0.4, 0.4, 0.7, and 1.7% within the sets of randomly generated sequences, introns, CDS and intergenic region of rice, respectively, used here. Hence, this study confirms that ABRE (CGMCACGTGB) cis-element is really specific to promoters of ABA-responsive genes and not just a common feature of the rice genome.

Expression analysis of the predicted genes

To study the accuracy of prediction of ABA responsive genes in this study, out of 402 predicted genes, we randomly selected 45 genes whose response to ABA was unknown (except LOC_Os01g02120 and LOC_Os02g43330, which are used as test control) for expression analysis by RT-PCR. As high as 80% of the predicted ABA responsive genes showed induction in response to exogenous ABA (Table 3, Additional Fig. 3). A set of 15 genes without the predicted ABRE in their promoters were tested as negative control for prediction accuracy (Table 3). Ubiqutin was checked as internal control for RT-PCR. Expression analysis of each gene was confirmed in at least three independent RT-PCR reactions. Hence, occurrence of predicted ABRE (CGMCACGTGB) cis-element is important for ABA induction. A differential expression pattern was observed in different tissues at different time points of ABA treatment (Table 3). Although exogenous application of ABA mimics the expression of several stress and endogenous ABA responsive genes, plants are less sensitive to exogenous ABA under normal growth conditions than to endogenous ABA during stress (Sharp 2002). Expression analysis at different time points (3, 5, and 24 h) and tissues (leaf and root) illustrate that plants respond differentially to exogenous ABA. PlantQTL-GE database mining result showed (Additional Table 3) that many of the predicted ABA responsive genes in this study are also expressed under abiotic stresses such as cold, salt and osmotic stresses, which induce ABA accumulation (Zeng et al. 2007).

Table 3 RT-PCR analysis of predicted ABA responsive genes in different time points and vegetative tissues of rice

Functional classification and ontology analysis of genes

ABA is the most versatile plant hormone involved in regulation of varied groups of genes. Functional annotation among these 402 predicted ABA responsive genes as per TIGR revealed diverse functional categories including important stress related signaling components (Additional Fig. 4). Details of these annotations are in the supplemental table (Additional Table 4).

Diverse gene ontology (GO) categories were enriched among these predicted ABA inducible genes (Additional Table 5). This is in consistent with the diverse roles of ABA in regulating biological processes (Ashburner et al. 2000; Hirayama and Shinozaki 2007). Among these important GO categories; considerable enrichments were obtained in different functional classes (Fig. 3). A set of randomly chosen 402 genes apart from predicted genes were analyzed for the GO analysis, where less GO enrichment was observed (Additional Fig. 5). But this study does not rule out the enrichment of GO functional categories in co-expressed genes under other environmental conditions. These results highlight the importance of conservation of ABA responsive genes and signaling pathways.

Fig. 3
figure 3

Important GO categories among predicted ABA-responsive genes. GO categories enriched among the plant GO terms. GOSlim ID and GO name type were obtained from TIGR plant GOSlim assignment of rice proteins

Discussion

The novel rice specific consensus for ABRE motif, CGMCACGTGB generated in this study is beyond the ACGT core and is distinct over PLACE derived ABRE motif for ABA responsive genes. This motif can be considered for finding ABA responsive genes in related species. In a related study cis-elements were discovered using correlated expression and sequence conservation between Arabidopsis and Brassica oleracea (Haberer et al. 2006). ABA responsive genes predicted here are not exhaustive, but represent considerable number of genes with a similar cis-regulatory element. The variability in ABRE and other over-presented motifs organization might be a reason for multiple signal integration points in combinatorial cis–trans interaction and versatile gene action under varied conditions (Suzuki et al. 2005). In our study, genes predicted by using only ABRE (CGMCACGTGB) cis-element showed 80% prediction accuracy among the randomly selected 45 genes of top 402 genes. RT-PCR expression analysis of 27 genes among the top 40 genes prediction by using ABRE–CE module shows that only 63.0% (17) genes were responsive to exogenous ABA in Arabidopsis (Zhang et al. 2005). Thus, ABRE motif identified in this study appears to be a better predictor of ABA-responsive genes in rice. The possibility of ABA-induction of the remaining 20% genes tested is not ruled out as they may express at different tissues/developmental stages/ABA concentration/time points. The differential ABA responsiveness of genes in different tissues and time points as revealed by RT-PCR analysis (Table 3, Additional Fig. 3) suggest the possible involvement of tissue, duration and developmental stage-specific ABRE interacting cis-elements or trans-acting factors in gene regulation. GO analysis showed the enrichments of signal transduction, stress-related and development-related genes among other categories in the predicted ABA regulated genes (Additional Table 5) and thus signifying the diverse role of ABA responsive genes.

Previous studies in different model systems have proven the importance of over re-represented cis-motifs in gene regulation. Integrating expression profile data with cis-motif consensus pattern had a much higher selectivity than only consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences (Fujibuchi et al. 2001). The influence of other external and internal cues apart from the treatment under study cannot be disregarded. Fine-tuning of transcriptional regulation under multitude conditions is most important aspect of plant adaptation process. Combinatorial control of transcription by multiple transcription factors has been reported in plant system (Lara et al. 2003; Narusaka et al. 2003). Integration of ABA responsive tissue-specific gene expression with promoter structure is a challenge to understand the universal and organism level molecular networking (Ma and Bohnert 2007). Binding experiments at different time points and developmental stages to design and verify system models might in turn give direct evidences with respect to dynamics of the molecular genetic network. Large scale cloning and characterization of predicted ABA-induced genes will help to unravel the role of ABA-regulated genes in genome wide chromatin structure, transcription, protein–protein/DNA–protein interaction, post-transcriptional and post-translational regulations.

We found direct Tos17 insertion in two of the predicted ABA responsive loci and searched its phenotypic impact using Tos17 mutant panel database (Miyao et al. 2007). Disruption of the predicted ABA regulated MYB-like DNA-binding domain gene, Os01g09280, by Tos17 insertion resulted in a dwarf and late flowering phenotype, supporting the key role of ABA-dependent components on plant development. Mutation in another predicted ABA regulated gene encoding leucine rich repeat family protein Os02g06130, gives a quite encouraging agronomical phenotype with comparatively high yield. We presume that Os02g06130 might have some adverse impact on yield under normal growth conditions. To validate the role of these loci on yield and plant development, we are developing siRNA knock-out lines and over expression lines.

In the post-genomic era, ability to deduce genome function has become an increasingly important task. For many genomes, the functional annotation immediately available will be based on computational predictions and comparisons with functional elements in related species. Targeted prediction of genes based on cis-motif is quite effective in functional categorization of genes that are most likely to be involved in a common molecular genetic network (Goda et al. 2004; Wang et al. 2007). Role of a particular CRE in a functional category is well demonstrated in yeast by sequencing and comparison to identify genes (Kellis et al. 2003). This method will help in functional annotation of genes predicted through ORF based approach, as ORF based gene prediction does not classify genes into functional categories. However, knowledge gained through sampling of over-represented cis-motifs from co-expressed genes responsive to a particular signal is useful to design genome-wide binding studies like Chip–chip, which in turn will help to unravel the complete molecular genetic network in biological systems. The genome level experimental knowledge of accurate dynamic spatio-temporal gene regulation integrated with promoter architecture is not available for ABA regulated genes. Computational prediction method provides a viable option to design suitable experiments and understand the dynamics of complex molecular genetic networks (Additional Fig. 6).

Conclusions

Identifying the key cis-elements and promoter architecture that regulate the expression of plant genome is a complex task that will require a series of complementary methods such as prediction, extensive experimental validation and proper understanding of the role of cis-elements in combinatorial control of plant gene expression. The ABRE (CGMCACGTGB) identified in this study is novel, rice-specific and can be used for functional classification of ABA responsive genes in related species. This cis- element based targeted gene finding approach will act as a supplemental tool for the classic ORF based gene prediction method for functional classification of genes. We advocate that the overall strategy will be cost effective and efficient for application in related plant species, where information is primarily limited to Genomic Survey Sequences (GSS).