Background

Tea is produced from the plant Camellia sinensis (L.) O. Kuntze in the family Theaceae. Tea is one of the most popular beverages worldwide, and tea leaves represent an important source of many biologically active metabolites such as flavonoids, theanine and caffeine [1, 2]. Flavonoids mainly comprise flavan-3-ols (catechins), epicatechin (EC), gallocatechin (GC), epigallocatechin (EGC), catechin (C) and their respective gallate esters, such as epigallocatechin gallate (EGCG) and epicatechin gallate (ECG) [3]. Tea leaves, which contain various secondary metabolites, are usually used as the raw material for tea production. However, the molecular mechanisms that regulate the biosynthesis of catechins, theanine and caffeine in tea remain elusive.

Great effort has focused on elucidating the molecular mechanisms underlying plant growth, development [4, 5] and secondary metabolite production [6] in tea. Most of these studies have focused on characterizing genes related to secondary metabolism, most of which were revealed through EST sequencing [7] and analysis of the transcriptomes from various tissues of tea plants [8, 9] or under different stress conditions [10, 11]. More recently, Shi et al. discovered novel candidate genes involved in pathways in tea by analyzing transcriptome data [12]. Liu et al. reported the discovery of a novel enzyme involved in galloylated catechin biosynthesis in tea plants [13]. However, the lack of genomic information has become an obstacle to exploring the molecular mechanisms underlying secondary metabolite biosynthesis in tea. Transcriptome sequencing represents an efficient approach to obtaining functional genomic information.

RNA-Seq is a rapid technique for genome-wide gene expression analysis that is widely used to determine gene structures and expression profiles in model organisms. De novo assembly of RNA-Seq data makes it possible to conduct gene analysis in the absence of reference genomes [1416]. Comparative transcriptomic studies have been performed to identify differential gene expression in several organisms [1720].

Another widely known member of Theaceae is oil tea, Camellia oleifera Abel, a tree serving as an important source of edible oil that is grown specifically in China. Oil tea was genetically closely to tea, and they both belonged to genus Camellia. Here, we performed RNA-Seq on buds and second leaves of tea and oil tea to characterize differences in gene expression between these two plants. This comparative transcriptomic analysis provides important insights into the molecular mechanisms underlying secondary metabolite biosynthesis in tea, as well as the phytochemical characteristics of its main metabolites.

Results

Analysis of the contents of catechins, theanine and caffeine

HPLC analyses were conducted to determine the contents of catechins, theanine and caffeine, and related intermediates in buds and five leaves of tea and oil tea (Fig. 1). All standard compounds showed good linearity (R2 > 0.9991) in a relatively wide concentration range. Compared to oil tea, most of these metabolites were present at higher concentrations in tea (Fig. 1b). The average contents of three characteristic components (total catechins, theanine and caffeine) in tea leaves were 1.5- to 173-fold higher than those in oil tea leaves. In particular, tea contained over a 180 mg/g of total catechins in its leaves and buds. The only exception is that the anthocyanin content in oil tea leaves was 32-fold higher than that in tea leaves. These results confirm that tea is rich in catechins, theanine and caffeine (Table 1).

Fig. 1
figure 1

Determined contents of compounds involved in three main secondary metabolic pathways. a Samples examined. Buds and five initial leaves were collected from tea and oil tea. Bud, L1, L2, L3, L4 and L5 indicate the bud, first, second, third, fourth and fifth leaf, respectively. b Variation in the contents of compounds biosynthesized by catechin-, theanine- and caffeine-related metabolic pathways

Table 1 Variation in the contents of major compounds in tea and oil tea

Moreover, the contents of these characteristic constituents varied during the period from the appearance of buds to the appearance of the five leaves. The levels of GC, EGC and EC increased from the bud to the second or third leaves in tea, whereas a general decline in caffeine, total catechins, ECG and EGCG levels was observed in tea leaves.

A steady decrease in theanine levels was observed from the first leaf to the fifth leaf in tea, and the levels of this compound were almost seven-fold greater in buds than in leaves. A similar variation was detected in oil tea, but the absolute contents were much lower. Due to the variation in the contents of most compounds (EC, EGC, GC and ECG) in the three initial leaves, we selected the second leaves and buds of tea and oil tea for RNA-Seq.

De novo assembly and comparative analyses of RNA-Seq data

We utilized Illumina RNA-Seq technology to sequence the buds and second leaves of tea and oil tea. After removing adaptor sequences, duplication sequences, ambiguous reads and low-quality reads, a total of 23.4 Gb of clean reads was generated, with an average of 5.85 Gb clean reads per sample (Table 2).

Table 2 Statistics from the generated RNA-Seq reads

The final assembly of tea had 81,826 unigenes with an N50 length of 1,265 bp (Table 3). Functional annotation revealed 53,786, 49,174, 34,636, 31,024, 18,748 and 40,838 unigenes with alignments to the NR (Non-redundant protein database), NT (Non-redundant nucleotide database), Swiss-Prot (Annotated protein sequence database), KEGG (Kyoto encyclopedia of genes and genomes), COG (Clusters of orthologous groups of protein) and GO (Gene ontology) databases, respectively. The final assembly of oil tea consisted of 78,863 unigenes with an N50 length of 1,254 bp. Of these, 54,115, 49,009, 34,682, 30,990, 19,126 and 41,325 unigenes were annotated by alignment against the NR, NT, Swiss-Prot, KEGG, COG and GO databases, respectively (Additional file 1). Sequence comparisons revealed that 17,459 genes are shared by both tea and oil tea, 9,725 of which were mapped to KEGG pathways (http://www.genome.jp/kegg/) [21]. High amino acid sequence identity was found in the homologous genes between tea and oil tea, as 64 % of the genes shared over 70 % identity. We also detected 64,826 specific transcripts in tea and 61,863 in oil tea.

Table 3 Summary of assemblies of RNA-Seq data

Analysis of the differentially expressed genes (DEGs)

The DEGs were identified by comparing FPKM (Fragment Per Kilobase of exon model per Million mapped reads) values [22] between different libraries under the thresholds of log2 (Fold-change) over 1 and FDR less than 0.001 (Fig. 2 and Additional file 2). The results indicated that both tea and oil tea had more genes with higher transcription levels in the second leaves than in buds. Compared with oil tea, tea contained more DEGs (3,787 in buds and 4,042 in leaves) with increased expression in both buds and leaves than oil tea (3,359 in buds and 3,302 in leaves). Next, we analyzed the DEGs using KEGG pathway analysis, which assigned 4,226 DEGs derived from tea buds versus oil tea buds (TBvsOTB), 4,174 from tea buds versus tea leaves (TBvsTL), 4,334 from tea leaves versus oil tea leaves (TLvsOTL) and 3,418 from oil tea buds versus leaves (OTBvsOTL). High proportions of these DEGs are involved in secondary metabolite pathways, including 483 DEGs (11.43 %) from TBvsOTB, 503 (11.61 %) from TL2vsOTL2, 594 (14.23 %) from TBvsTL and 482 (14.1 %) from OTBvsOTL, respectively. The estimated rich factors (number of DEGs mapped to a certain pathway/total number of genes mapped to this pathway) of secondary metabolism were 0.4–0.7 in TBvsOTB and TLvsOTL (Fig. 3a and b), whereas they were 0.1–0.3 in TBvsTL and OTBvsOTL (Fig. 3c and d). The DEGs identified through comparisons between tea and oil tea were clustered in the pathway secondary metabolism, suggesting that there are different secondary metabolism pathways in these two species. A lower rich factor between two stages for either of two species implies that steady metabolism occurs during this period (Additional file 3).

Fig. 2
figure 2

Number of identified DEGs identified by comparing gene expression levels between any two tissues. The numbers of significantly up-regulated genes (log2(Fold-change) > 1; FDR < 0.001) between two tissues (indicated by double-headed arrows). Arrows indicate up-regulated genes associated with each number

Fig. 3
figure 3

Pathway enrichment analysis involving comparisons between any two tissues. The rich factors indicate the ratio of the number of DEGs mapped to a certain pathway to the total number of genes mapped to this pathway. Greater rich factor means greater intensiveness. The Q value was calculated using hypergeometric test through Bonferroni Correction. Q value is coorrected p value ranging from 0-1, and less Q value means greater intensiveness. Gene number means number of DEGs mapped to a certain pathway

Based on alignments against the Swiss-Prot, COG and KEGG databases with an e-value cutoff of less than 1 × 10−30, 117, 51 and 18 tea genes and 110, 52 and 20 oil tea genes were found to be involved in the biosynthesis of catechins, theanine and caffeine, respectively (Additional file 4). We detected over 200 homologous genes in tea and oil tea encoding enzymes potentially involved in catalyzing these reactions. Tea and oil tea contain a similar number of genes encoding most enzymes in the assembled gene models, but their transcription levels are considerably different (Table 4).

Table 4 Transcription levels of genes involved in the biosynthesis of catechins, theanine and caffeine

Identification of DGEs involving in characteristic metabolic pathways in tea

We used qRT-PCR to confirm the differential expression levels of 34 DEGs involved in the biosynthesis of catechins, theanine and caffeine and quantified their maximum transcription levels in tea and oil tea (Fig. 4 and Additional file 5). Of these genes, the data from 25 (74 % of 34) matched the RNA-Seq data. As determined from the published flavonoid pathways [23], catechin biosynthesis occurs via successive enzymatic reactions (Fig. 4a). Interestingly, PAL (phenylalanine ammonia-lyase) and CHI (chalcone isomerase) genes, which are employed in the upstream phenylpropanoid pathway, were more highly expressed in oil tea than in tea. However, in the downstream biosynthetic pathway of catechins, the F3H (flavanone 3-hydroxylas), DFR (dihydroflavonol 4-reductase) and ANR (anthocyanidin reductase) genes were more highly expressed in tea. Notably, the ANR gene encodes an enzyme that catalyzes the transfer of anthocyanidins to 2,3-cis-flavan-3-ol, which is an intermediate in the final step of esterified catechin synthesis. Both RNA-Seq and qRT-PCR analyses revealed considerable activation of the ANR gene in tea but not in oil tea, which is consistent with the data from HPLC analyses of EC, EGC, C and GC contents. The DFR, LAR and ANR genes in tea are responsible for the biosynthesis of nongalloylated catechins [24]. The differential expression levels of F3H, DFR and ANR genes might be responsible for the differences detected in the levels catechin components between tea and oil tea.

Fig. 4
figure 4

Pathways of the three main secondary metabolites in tea and oil tea. a Core reactions of flavonoid biosynthesis involved in the biosynthesis of catechins. Compound names are shown below each arrow. Abbreviations beside the arrows indicate the enzymes catalyzing the transfer. The gene expression levels detected by qRT-PCR are shown in the histograms within the corresponding circles. Transcription level is indicated as the mean (2ΔCt) ± SD. b Core reactions of the theanine biosynthesis pathway. (C) Core reactions of the caffeine biosynthesis pathway

Tea buds and leaves contain theanine at levels as much as 252-fold and 86-fold those of oil tea (Fig. 1), respectively. However, we did not identify genes encoding the enzyme responsible for the final reaction in theanine biosynthesis. The qRT-PCR analysis revealed that the GS (glutamine synthetase) and GDH (glutamate dehydrogenase) genes were more highly expressed in tea than in oil tea (Fig. 4b). Previous studies suggest that theanine is synthesized from glutamic acid and ethylamine by TS (theanine synthetase), which is highly homologous to glutamine GS [25]. Phytochemical analysis revealed a much higher content of theanine in tea buds and leaves than in oil tea, suggesting a potential connection between the activation of GS genes and high theanine levels in tea. In our transcriptomic data, five GS unigenes were found in tea and seven in oil tea. Whether they are functional copies of TS genes remains to be confirmed by further analysis of enzymatic reactions.

There are three key enzymes in the caffeine biosynthesis pathway: TCS (tea caffeine synthase), IMPDH (inosine-5′-monophosphate dehydrogenase) and SAMS (S-adenosylmethionine synthetase) [26]. We detected homologous genes that are involved in four steps of the caffeine pathway. TCS catalyzes the final step in caffeine biosynthesis. The TCS gene was much more highly expressed in tea buds and leaves (by over 45-fold) than in oil tea, although the genes responsible for the upstream reactions had higher transcription levels in oil tea, which was confirmed by qRT-PCR (Fig. 4c).

Taken together, our investigation of gene expression in tea revealed the activation of related metabolic pathways compared to oil tea. Most genes exhibited slightly higher expression levels in buds than in leaves (Table 4). These findings are potentially related to the differences in metabolic components revealed by HPLC.

Discussion

In this study, we observed differences in the contents and gene expression patterns of the characteristic compounds in tea compared to oil tea. We found that tea contains more beneficial nutrients, such as catechins, theanine and caffeine, in its buds and leaves because the pathways related to these metabolites were considerably more active in tea than in oil tea. Theanine is a unique non-protein amino acid that was first discovered in tea. There are trace amounts of this compound in two other Camellia species (C. japonica and C. sasanqua) and in one species of mushroom (Xerocomus badius) [27].

Of the phenolic compounds, high flavonoid levels are present in oil tea, as revealed by HPLC (140.06 mg/g dry material) [28]. Flavonoids are a class of important secondary metabolites including flavanones, flavones, dihydroflavonols, flavonols and flavan-3-ols (catechins). These compounds are important for tea quality and are beneficial for human health (especially catechins) [29]. Catechins, theanine and caffeine are the main characteristic compounds in tea, and the results of our analysis of these compounds are in accordance with recent reports [30, 31]. Oil tea is genetically closely to tea, but no theanine and caffeine were reported except flavonoids in oil tea leaves in previous study [32, 33]. We chose tea and oil tea buds and leaves of plants from the same environment for analysis to reveal the mechanism behind the high levels catechins, theanine and caffeine in tea. Our results indicated that the catechins, theanine and caffeine in tea were also present in oil tea, but in much lower amounts. We detected increased expression of some key genes in these three metabolic pathways in tea compared to oil tea, which might lead to the differences in their contents.

Our results indicated that the genes encoding F3H, DFR and ANR in the flavonoid pathway were more highly expressed in tea than in oil tea. On the contrary, the expression levels of PAL and CHI genes were lower in tea than in oil tea. These observations were consistent with previous results [34]. High PAL activity was associated with the accumulation of flavonoids and other phenolic compounds [35, 36], and DFR, ANR and LAR played an important role in the formation of catechins [3]. Xiong et al. found that stable expression of F3H insured the formation of dihydrokaempferol, the precursor of individual catechins [37]. In the current study, we did not observe a difference in the expression levels of the C4H gene between tea and oil tea.

Our analysis of the DEGs related to flavonoid, theanine and caffeine metabolism in tea and oil tea suggests that these two species share common pathways, but the expression levels of some key genes in these pathyways might result in differential biosynthesis of catechins, theanine and caffeine.

Since tea is self-incompatible and recalcitrant to genetic manipulation, little genetic or genomic information is currently available for this species. Therefore, instead of providing a comprehensive in-depth investigation of the tea transcriptome, our experiment was designed to generate a quick view of the landscape. Moreover, since there were significant differences in the contents of the major components from one bud and five leaves of tea versus oil tea, we used the transcriptome data to search for key genes in these metabolic pathways and to uncover the factors underlying this divergence. The quality of tea in large part depends on its metabolic profiles. We therefore performed additional analyses of catechin, theanine and caffeine biosynthesis. We were able to detect almost all genes in these metabolic pathways. Many of these genes appeared to form multigene families, implying that the tea genome, like the genomes of many other higher plants, had undergone one or more rounds of genome duplication during evolution [38], which might explain why higher levels of gene expression did not always lead to higher enzyme activity in the present study. In our annotated tea and oil tea transcriptome dataset, multiple transcripts encoding all DEGs involved in flavonoid, theanine and caffeine biosynthesis pathways were identified.

Using a reciprocal best hit (RBH) method with relatively strict filters, 13,025 putative ortholog pairs were identified between tea and oil tea. We calculated their Ka (non-synonymous) /Ks (synonymous) ratios to estimate the rate of gene evolution [39, 40]. Of these ortholog pairs, 12,400 (95.2 % of 13,025) had a Ka/Ks value of 1 or less than 1, while 625 (4.8 % of 13,025) had a Ka/Ks value of over 1 (Additional file 6), suggesting that they were under positive selection (PS). Functional GO analysis revealed that most genes under PS were grouped into GO terms cell, cell part, binding and metabolic process (Fig. 5). Of the 625 PS genes, 68 exhibited differential expression among tissues (Additional file 7). Notably, some PS orthologs encode CHI and DFR in the flavonoid pathway. CHI is a rate-limiting enzyme, and DFR is key enzyme, in the catechin-producing branch of the flavonoid biosynthesis pathway [41, 42]. Since the Ka/Ks ratio is widely used to detect selective pressure acting on protein-coding sequences [43, 44], rapid evolution of the CHI and DFR genes might be associated with adaptive selection in plants. No PS ortholog was assigned to the theanine or caffeine pathway. Environmental factors might play an important role in the evolution of the flavonoid pathway. Indeed, the highest quality green tea from Japan (a fine powder made from tencha) was grown in the shade and contains high levels of amino acids but low levels of catechins [45].

Fig. 5
figure 5

GO classification of orthologs between tea and oil tea under positive selection (Ka/Ks > 1). GO analysis of orthologous genes between tea and oil tea under positive selection based on biological process, cellular component and molecular function categories. The y-axis indicates the number of genes assigned to the same GO terms

Conclusions

In this study, we examined the levels of characteristic metabolites in tea compared to oil tea, revealing (for the first time) trace amounts of theanine in oil tea. The contents of major metabolites were higher in tea than in oil tea. The genes involved in most of these pathways were more highly expressed in tea than in oil tea, especially key enzymes that function at branch points in these pathways, which might explain the differential biosynthesis of metabolites (resulting in different components) in tea versus oil tea. Comparative transcriptome analyses demonstrated the connection between gene expression and the biosynthesis of catechins, theanine and caffeine. Comparative transcriptome analyses comparing the levels of metabolites between tea and oil tea not only enabled us to provide a preliminary description of the gene expression profiles, but it also helped elucidate the molecular mechanisms underlying the biosynthesis of characteristic biochemicals in tea. The transcriptome data obtained in this study will serve as an invaluable platform for further studies of the molecular biology and genomes of tea and oil tea.

Methods

Plant materials

The six-year-old tea plants (Camellia sinensis [L.] O. Kuntze) and oil tea plants (Camellia oleifera Abel.) used in this study were grown in De Chang fabrication base in Anhui, China. One bud and five leaves were collected from each plant in the summer of 2013 (Fig. 1).

Extraction and HPLC analysis of catechins, theanine and caffeine

Catechins and caffeine were extracted from the samples according to the method described by Shan et al. [46] with minor modifications. Briefly, 0.1 g of freeze-dried tea leaf tissue was ground in liquid nitrogen with a mortar and pestle and extracted with 3 mL 80 % methanol in an ultrasonic sonicator for 10 min at 4 °C. After centrifugation at 6,000 rpm for 10 min, the residues were re-extracted twice as described above. The supernatants were combined and diluted with 80 % methanol to a volume of 10 mL. The obtained supernatants were filtered through a 0.22 μm organic membrane before HPLC analysis.

The catechin and caffeine contents in the extracts were measured using a Waters 2695 HPLC system equipped with a 2489 ultraviolet (UV)-visible detector. A reverse-phase C18 column (Phenomenex 250 mm × 4.6 mm, 5 micron) was used at a flow rate of 1.0 mL/min. The detection wavelength was set to 278 nm, and the column temperature was 25 °C. The mobile phase consisted of 0.17 % (v/v) acetic acid (A) in water, 100 % acetonitrile (B), and the gradient elution was as follows: B 6 % from 0 to 4 min, to 14 % at 16 min, to 15 % at 22 min, to 18 % at 32 min, to 29 % at 37 min, to 45 % at 45 min, to 45 % at 50 min, to 6 % at 51 min and to 6 % at 60 min. Then, 10 μL of the filtrate was injected into the HPLC system for analysis. The filtered sample (10 μL) was injected into the HPLC system for analysis. Samples from each stage of leaf development were analyzed in triplicate.

Amino acids were extracted with hot water [47, 48]. Specifically, 0.15 g of freeze-dried tea leaves was ground in liquid nitrogen with a mortar pestle and extracted with 5 mL deionized water for 20 min in a water bath at 100 °C. After centrifugation at 6,000 rpm for 10 min, the residues were re-extracted once as described above. The supernatants were combined and diluted with water to a volume of 10 mL. The supernatants were also filtered through a 0.22 μm membrane before HPLC analysis. Theanine in tea was detected using a Waters 600E series HPLC system equipped with a quaternary pump and a 2489 ultraviolet (UV)-visible detector. A reverse-phase C18 column (Phenomenex 250 mm × 4.6 mm, 5 micron) was used at a flow rate of 1.0 mL/min. The column oven temperature was set to 25 °C. The detection wavelength was set to 199 nm for analysis [49]. The mobile phase consisted of 0.05 % (v/v) trichloroacetic acid (A) in water, 50 % acetonitrile (B), and the gradient elution was as follows: B 0 % (v/v) to 100 % at 40 min, to 100 % at 45 min and to 0 % at 60 min [31]. Then, 5 μL of the filtrate was injected into the HPLC system for analysis.

Amino acids in tea were detected using a Waters 600E series HPLC system equipped with a quaternary pump, a 2475 fluorescence detector and a 2489 ultraviolet (UV)-visible detector. The Waters AccQ•Tag method [50] with a Waters AccQ•Tag column (Nova-Pak C18, 4 μm, 150 mm × 3.9 mm) was employed to detect various amino acids according to the protocol of the AccQ•Fluor Reagent Kit [51, 52]. To determine the linearity of the chromatographic techniques, calibration plots of standards were constructed based on peak areas (y) using solutions of various concentrations (x). All plots were linear in the examined ranges; the linear ranges for different concentrations of standard compounds are shown in the plots (μg mL−1). The R2 value refers to the correlation coefficient of the equation for calculating the content of a compound. The standard compounds C, EC, EGC, ECG, EGCG, GC, theanine and caffeine were purchased from Shanghai Winherb Medical Technology, Ltd., China.

Anthocyanin was extracted as follows: 0.1 g freeze-dry tea leaf tissue was ground in liquid nitrogen and extracted with 5 mL extraction solution (80 % methanol: 1 % hydrochloric acid [HCl]) using an ultrasonic sonicator for 10 min at room temperature. After centrifugation at 6,000 rpm for 10 min, the residues were re-extracted twice as described above. The supernatants were combined and diluted with extraction solution to 10 mL, followed by extraction with trichloromethane. The anthocyanin content was determined by colorimetry at 525 nm [53].

RNA extraction, library construction and RNA-Seq

Total RNA from tea and oil tea was extracted separately using the modified CTAB method [54]. The RNA integrity was measured using gel electrophoresis and spectrophotometry (Nanodrop). Equal amounts of RNA from three biological replicates were pooled prior to cDNA preparation. Enrichment of mRNA, fragment interruption, addition of adapters, size selection, PCR amplification and RNA-Seq were performed by staff at Beijing Genome Institute (BGI; Shenzhen, China). First, mRNA was enriched from 20 μg total RNA using magnetic beads with Oligo (dT) 25 (Invitrogen) and cleaved into short fragments. Second, using these short fragments as templates, first-strand cDNA synthesis was carried out with random primers (Japan, Takara) to produce double-stranded cDNA. Third, the ends of double-stranded cDNA fragments were further modified with T4 DNA polymerase, Klenow DNA polymerase and T4 polynucleotide kinase (Britain, NEB), and adapters were ligated to the short fragments using T4 DNA ligase (Invitrogen, USA). After the end repair process and ligation of adapters, the products were enriched by PCR to construct the final cDNA library. The cDNA library was examined using an Agilent 2100 Bioanalyzer. Finally, the four libraries were sequenced on an Illumina HiSeq™ 2000.

De novo assembly of RNA-Seq reads

Clean reads from four samples were obtained after quality control. Of these, two were from tea and two were from oil tea, which were combined and assembled separately using the transcriptome assembler Trinity [55]. The total and average lengths of assembled contigs were important criteria for transcriptome quality. Unigenes were defined after removing redundancy and short contigs from the assembly. Unigenes from tea and oil tea were aligned to each other iteratively using BLAST to identify homologous genes in the two species; more than 80 % of the length of each gene in a pair of homologous genes was strictly aligned.

qRT-PCR analysis of the selected genes

To validate the accuracy of unigenes obtained from the assembled transcriptome and profiling of gene expression via RNA-Seq, qRT-PCR analysis was performed. RNA samples were extracted from the samples, and single-stranded cDNAs used for real-time PCR analysis were synthesized from the RNAs using a Prime-Script™ 1st Strand cDNA Synthesis Kit (TaKaRa, Dalian, China). The expression patterns of 34 transcripts were monitored. Detailed information about the selected transcripts, including their unigene IDs and the primer pairs designed in this study, is presented in Additional file 8. An IQ5 real-time PCR detection system (Bio-Rad) was utilized as previously described. The glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene was used as an internal reference gene, and relative expression was calculated using the 2ΔCt method [56]. All qRT-PCR analyses were performed in three biological and three technical replications.

Unigene functional annotation and classification

The unigenes were aligned to the protein sequence database NR, the Swiss-Prot protein database and COG [57] by Blastx with an E-value threshold of 1 × 10−5. The unigenes were mapped to the KEGG metabolic pathway database [58]. Using KEGG annotation, metabolic pathway annotations of unigenes can be obtained, which helps elucidate the complex biological behaviors of genes. Using the COG database, orthologous gene products can be classified, and the possible functions of unigenes can be predicted. Based on NR annotation, GO classifications of unigenes were obtained using WEGO software [59] (http://wego.genomics.org.cn/cgi-bin/wego/index.pl) after annotation by the Blast 2 GO program (Version 2.3.4) [60] to elucidate the distribution of gene functions of a species at the macro level.

Comparison of nucleotide and protein sequence in tea and oil tea

Protein sequences from tea and oil tea were compared by BLAST and MUMmer (http://mummer.sourceforge.net/), and sequences with homology ≥70 % were retained.

Differentially expressed genes related to major secondary metabolism

KEGG pathway analysis was carried out to identify genes with different expression levels. Unigene expression was calculated using the FPKM method. The identification of differentially expressed genes (DEGs) was performed according to “The significance of digital gene expression profiles” [61], which was modified using a rigorous algorithm. FDR ≤ 0.001 and the absolute value of log2Ratio ≥ 1 were chosen as the thresholds for judging the significance of differential expression of each gene. For a given unigene, four FPKM values were generated from the four transcriptomes, respectively. Hierarchical clustering was performed using Cluster 3. DEGs that may play important roles in major secondary metabolism (catechins, theanine and caffeine metabolism) were identified by ggplot2 (http://docs.ggplot2.org/current/geom_point.html) [62]. After further investigating metabolic pathways, several representative pathways were selected for more detailed analyses, including flavonoid metabolism, theanine metabolism and caffeine metabolism.

Identification of orthologous genes between tea and oil tea

To identify genes that are putatively orthologous between tea and oil tea, a RBH [63] method based on the Blastn program was used. Similar methods and criteria have been used in previous studies [64, 65]. Furthermore, GO classifications of putative orthologs under negative selection (Ka/Ks < 1) and under positive selection (Ka/Ks > 1) were compared using WEGO.

Estimation of synonymous and non-synonymous substitution rates between orthologous gene pairs

To calculate Ks and Ka substitution rate for each orthologous gene pair, the equivalent of a biological measurement of the nonsynonymous to synonymous substitution ratio Ka/Ks [66] was introduced, in which Ks and Ka were estimated by codeml (of the PAML package) using the likelihood method [67, 68].

Availability of supporting data

The Illumina RNA-seq data generated from buds and leaves of Camellia sinensis and Camellia oleifera are available in the NCBI SRA (http://trace.ncbi.nlm.nih.gov/Traces/sra) with accessions SRR1928149 and SRR1928150.