Introduction

Plants produce a diverse array of terpenes, the largest class of plant-derived natural products. They range from simple flavor and fragrance compounds, such as limonene, to complex triterpenes, and have numerous potential applications across the food and beverage, pharmaceutical, cosmetic, and agriculture industries. However, there are limitations to the use of large amounts of these compounds as they are naturally produced in small quantities and intricate steps are required for extraction and/or purification. In addition, their chemical syntheses are inherently difficult under stereochemical control due to their complex structure.

Large populations of terpenes are cyclic compounds possessing several chiral centers, and the cyclization catalyzed by terpene synthases (TPSs) is the first step to create diverse structures of terpenes. Therefore, discovery and elucidation of TPSs may provide important information about terpene synthesis in plants. In addition, this information could be used to construct metabolic engineering machinery to produce large quantities of terpenoids under stereochemical control in a tractable microorganism1,2,3.

Sweet broomweed (Scoparia dulcis L.) is a perennial herb widely distributed in the torrid zone, and has been recently placed in the Plantaginaceae family (formally Scrophulariaceae)4. In these districts, plants have been used as a medication for stomach disorders, diabetes, hypertension, bronchitis, and insect bites5, and clinical trial of S. dulcis leaf extract has recently performed in Sri Lanka6. Phytochemical studies revealed that this plant produce various unique diterpenes its leaves: (1) labdane type: scoparic acid A; (2) scopadulane type: scopadulcic acid B (SDB) and scopadulciol; (3) aphidicolane type: scopadulin7. Among these diterpenes, SDB was found to possess various biological activities such as antiherpetic and inhibitory effects on gastric H+, K+ -ATPase8,9. In addition to these diterpenes, miscellaneous biologically active diterpenes have also been isolated from S. dulcis10,11,12,13. Furthermore, numerous triterpenoids have also been isolated from S. dulcis as bioactive substances14. Thus, S. dulcis might be important medicinal resources for providing unique bioactive terpenoids. Due to the unique carbon skeleton and biological activities of SDB, the scopadulane type diterpenes were selected as attractive targets for chemical synthesis and their total syntheses were accomplished by several groups15,16,17. However, their synthetic route included numerous steps and produced SDB as racemic mixtures.

When we evaluated the metabolites produced by S. dulcis from a different perspective, it was clear that S. dulcis may harbor unique biosynthetic enzymes to produce terpenoids. The putative biosynthetic machinery of the unique diterpenes, in particular SDB, can be divided into four stages: (1) synthesis of isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP); (2) conversion of diterpene precursors, geranylgeranyl diphosphate (GGPP), from IPP and DMAPP; (3) cyclization of GGPP to syn-copalyl diphosphate (syn-CPP), then a second cyclization to produce intermediate species possessing scopadulane skeleton; (4) redox modification and esterification by CYP450s and benzoyl CoA transferase, respectively. In the first stage, there are two biosynthetic routes, the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway and the mevalonic acid (MEV) pathway, and SDB has been shown to be produced via the MEP pathway by [13C]-glucose and inhibitor feeding experiments18,19. In the second stage, two GGPP synthases (GGPPSs) from S. dulcis have been cloned and functionally characterized20,21. In addition, our previous study revealed that SdGGPS1 and SdGGPS2 were expressed as constitutive and inducible homologous genes, respectively22. However, the enzymes involved in cyclization and modification to produce SDB at the third and fourth stages described above are still unknown, whereas ent-copalyl diphosphate synthase (SdCPS1) has been cloned and functionally characterized23.

Recent progress in next-generation sequencing techniques has facilitated the discovery of novel gene candidates in non-model organisms. This progress prompted us to carry out transcriptome analysis to discover gene candidates and to elucidate the mechanisms of terpenoid metabolism of S. dulcis using Illumina RNA-Seq technology. Here we report transcriptome analysis of different organs of S. dulcis in response to methyl jasmonate treatment. Combined with quantitative RT-PCR and phylogenetic analysis, our results showed novel aspects of terpenoid metabolism in S. dulcis.

Results

De novo assembly of the S. dulcis transcriptome

To establish a transcriptome catalogue of S. dulcis, we used MiSeq pair-end technology for sequencing transcriptome, which enabled us to create an assembly of contigs. To achieve this, we prepared four cDNA libraries from three tissues including leaves with or without MeJA treatment, young leaves, and roots. We choose these samples for preparation of total RNA, since it is also known that SDB production in S. dulcis leaf tissue is rapidly and transiently stimulated by MeJA as an elicitor24, and SDB contents in young leaves were higher than those in adult leaves25. In total, there were approximately 20.6 million raw reads from S. dulcis (Supplementary Table S1). The sequencing raw data have been submitted to the DDBJ Sequence Reads Archive (DRA) under the accession number DRA004058. Quality trimming and filtration resulted in 20.5 million cleaned reads that were assembled using Trinity, and generated 60,012 transcripts with an average length of 938 bp and an N50 of 1,430 bp. The sequences were clustered using the CD-HIT-EST to remove any redundant sequences. After clustering of the sequences with 95% identity, 46,332 transcripts with an average length of 934 and an N50 of 1,464 bp were generated (Table 1).

Table 1 Length distribution of assembled transcripts and unigenes.

Functional annotations of transcripts

To make a functional annotation and classification of the putative identities of the assembly, all unigenes were searched against public databases including the non-redundant protein (Nr), non-redundant nucleotide sequence (Nt), Uniprot/Swiss-Prot, and Cluster of Orthologous Groups of proteins (COG). The best hits were selected from the matches with an E-value of less than 10−5.

30,471 (65.8%), 17,663 (38.1%), 22,485 (48.5%), and 10,105 (21.8%) unigenes were annotated based on BLASTx (cut-off E-value 10−5) searches of the public databases; Nr, Nt, Swiss-Prot, and COG, respectively (Supplementary Figure S1). In total, 30,872 annotated sequences were identified, as shown in Supplementary Table S2. Among these annotated unigenes, the species with the highest number of best hits were sesame (Sesamum indicum, 65.5% matched gene) and common monkey-flower (Erythranthe guttata, formerly Mimulus guttatus, 13.6% matched gene) (Supplementary Table S3). These findings are consistent because sesame and common monkey-flower species both belong to Lamiales with sequenced genomes.

Gene Ontology (GO) terms were subsequently assigned to S. dulcis unigenes based on their sequence matches to known protein sequences using the Blast2GO program with Nr annotation. Unigenes were classified into 47 groups that could be categorized into three main classifications: “biological process”, “cellular component”, and “molecular function” (Fig. 1). In the biological process category, cellular process (12,804 unigenes) and metabolic process (14,704 unigenes) represented the major contributors. In the molecular function category, binding (12,720 unigenes) and catalytic activity (11,834 unigenes) represented the major contributors. In addition, gene classification in the “metabolic process” was shown in Supplementary Figure S2.

Figure 1: Histogram of GO classifications of assembled Scoparia dulcis unigenes.
figure 1

The results are grouped into three main categories: biological process, cellular component, and molecular function.

Functional classification and pathway assignment were also performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG). In total, 142 KEGG pathways, including 10,328 unigenes, were found in this study (Supplementary Table S2). Among them, 381 unigenes were involved in the biosynthesis of secondary metabolites, of which 156 were for terpenoids, 103 for phenylpropanoids, 25 for flavonoids, 35 for alkaloids, and 34 for other metabolites.

Prediction of genes involving terpenoids biosynthesis

To investigate terpenoid metabolism in S. dulcis, we conducted tissue-specific expression analysis of candidate unigenes responsible for terpenoid biosynthesis. Combined with the data obtained using the Blast2Go software, BLAST (tblastn) and HMMER approaches were also used to predict the candidates. In the tblastn search, we selected corresponding protein sequences from Arabidopsis thaliana and Oryza sativa, and searched against an in-house transcripts database of S. dulcis. In the present study, we detected gene candidates involved in MEV and MEP pathways, which are biosynthetic pathways providing general isoprenoid precursors, isopentenyl diphosphate (IPP), and dimethylallyl diphosphate (DMAPP). As shown in Fig. 2, almost all of the gene candidates involved in the MEV pathway were predominantly expressed in root tissue, whereas those involved in the MEP pathway were in leaves, in particular young leaf tissue. In addition, distinctive expression patterns were observed in some of the genes. 3-Hydroxy-3-methylglutaryl-CoA reductase 1 (HMGR1) was specifically expressed in root tissue, whereas HMGR2 were in young leaf tissue. Moreover, of the four 1-deoxy-D-xylulose 5-phosphate synthase (DXS) genes analyzed, DXS1 and DXS2, which were induced by MeJA, were expressed in leaf tissues, whereas DXS3, which was not induced by MeJA, was specifically expressed in root and young leaf tissues. In addition, the expression pattern of DXS4 was different from that of other DXSs; DXS4 was expressed predominantly in young leaf tissue and was induced by MeJA stimulation.

Figure 2: Prediction of terpenoid metabolic pathways and differentially expressed orthologous genes in Scoparia dulcis.
figure 2

Heatmap depicting the expression profile of isoprenoid and terpene metabolism-related genes in young leaf, leaf (+) with or (−) without treatment of MeJA, and root tissues of S. dulcis. MeJA-inducible genes were shown in bold face. Color gradient illustrating the Z-score of the gene expression values by calculating the FPKM values.

Then, we identified and examined the expression profiles of genes involved in terpenoid precursor biosynthesis such as geranyl diphosphate synthase (GPPS), farnesyl diphosphate synthase (FPPS), geranylgeranyl diphosphate synthase (GGPPS), and squalene synthase (SQS). As shown in Fig. 2, FPPSs and SQS were predominantly expressed in root tissue, whereas GPPS was expressed in both young leaf and root tissue. On the other hand, GGPPSs were abundantly expressed in leaves when compared with expression in roots. In addition, the present result is consistent with previous data that GGPPSs might be composed of several homologous genes22.

Next, we attempted to extract gene candidates responsible for the formation of terpene skeletons in S. dulcis using a HMMER search of translated amino acid sequences of transcripts in a Pfam database. From our RNA-seq data, 26 unigenes contained conserved domains [terpene synthase N terminal domain (PF01397) and terpene synthase metal binding domain (PF03936)], as summarized in Table. 3. Among them, 20 genes (SdTPS1 to 20) were suggested to be mono- and sesquiterpene synthase genes based on the sequence homology to functionally characterized TPSs. As shown in Fig. 2, some SdTPSs were induced by MeJA, whereas five SdTPSs, SdTPS4, SdTPS9, SdTPS16, SdTPS17, and SdTPS18, were suggested to express constitutively in roots. On the other hand, the other five genes were suggested to be involved in diterpene biosynthesis, and two genes were identical with previously isolated genes, ent-copalyl diphosphate synthase (SdCPS1) and kaurene synthase (SdKS). The remaining three homologous gene candidates were also predicted and named as putative syn-CPS (SdCPS2) and kaurene synthase-like genes (SdKSLs). SdKSL1 shows almost full-length cyclase (804 amino acids in length), however, the longest ORF of the SdKSL2 sequence was not long enough to encode a class I terpene cyclase (only 519 amino acids in length). This indicates that only SdKSL1 can be reliable categorized as a class I diterpene cyclase.

Table 3 List of Full-length TPSs of S. dulcis.

The biosynthetic route of SDB has been predicted, as illustrated in Fig. 2. Previous studies have indicated that SDB is accumulated in the aerial part of the plant, in particular in young leaves25, and that the biosynthesis of SDB might be dependent on differentiation of leaves26. It has been suggested that the biosynthetic gene for SDB may be expressed in leaf tissue, therefore, we identified candidate genes involved in diterpene metabolism in S. dulcis on the basis of their expression patterns. In contrast to SdTPS genes, expression of diterpene synthase genes were easily clarified in a tissue-specific manner. SdCPS1 and SdKS were specifically expressed in roots, whereas SdCPS2 and SdKSLs were expressed in young leaves. In addition, SdKO1, SdKO2, and SdKAO1, which are involved in gibberellin biosynthesis, were also predominantly expressed in root tissues. Therefore, it was suggested that gibberellin biosynthesis was active in root tissue, probably in the meristem, at the stage when transcripts were obtained (8-week old plants).

It has been reported that phylogenetic analyses of TPS protein sequences recognized seven major clades, and the function and distribution of plant TPS subfamilies have been summarized27. Thus, phylogenetic comparison of the translated sequences of TPSs might help to predict their function. As shown in Fig. 3, we applied nine TPSs, which contained full-length of ORFs, to phylogenetic analyses with known TPSs, summarized in Supplementary Table S4, and categorized them into appropriate clades. SdTPS8, SdTPS9, SdTPS10, and SdTPS11 were placed into the TPS-a subfamily, which is reported to be involved in sesquiterpene synthesis. SdTPS10 was closely related to LdTPS1 (δ-cadinene synthase) and LdTPS5 (bicyclogermacrene synthase), and SdTPS8 was closely related to PcGAS (germacrene A synthase) and NtEAS (epi-aristrochene synthase). SdTPS9 and SdTPS11 showed close phylogenetic relationships with PcTPSA (γ-curcumene synthase), ObCDS (γ-cadinene synthase), and MpFS (β-farnesene synthase). Therefore, these four SdTPSs were suggested to be involved in sesquiterpene biosynthesis in S. dulcis.

Figure 3: Phylogenetic analyses of TPSs from S. dulcis.
figure 3

The maximum likelihood tree illustrates the phylogenetic relatedness of S. dulcis terpene synthases of other species of the terpene synthases. The ancestral Physcomitrella patens ent-kaurene/kaurenol synthase was used to root the tree. Descriptions of terpene synthases used in the phylogeny are listed in Table 3 and Supplementary Table S4. Red- and tangeline-marked enzymes show terpene synthases from S. dulcis and gymnosperms, respectively.

SdTPS1, SdTPS2, SdTPS3, SdTPS5, and SdTPS7 were assigned to the TPS-b subfamily, which predominantly contains monoterpene synthases from angiosperms. Among them, SdTPS1 was placed into a sub-clade consisting of monoterpene cyclases, and SdTPS7 was closely related to acyclic oxygenated monoterpene synthases such as ObGES (geraniol synthase) and ObLIS (R-linalool synthase). SdTPS3 and SdTPS5 showed a close relationship with MrTPS4 (β-ocimene synthase). Therefore, these four terpene synthases were suggested to be involved in monoterpene metabolism. On the other hand, SdTPS2 belonged to the sub-clade consisting of monocyclic sesquiterpene synthases such as ObZIS (α-zingiberene synthase) and LdTPS7 (trans-α-bergamontene synthase). This data suggests that SdTPS2 might be a monocyclic sesquiterpene synthase.

SdCPS1 and SdCPS2 were placed into the TPS-c clade, whereas SdKS and SdKSL1 were placed into the TPS-e/f clade. The TPS-c and TPS-e/f clades contain exclusively monofunctional class II and class I enzymes, respectively. SdCPS1 has previously been functionally annotated as an ent-CPS23, and this enzyme showed close relationships with other ent-CPSs from Lamiales. On the other hand, SdCPS2 was placed into a sub-clade distinct from those of ent-CPSs, and it was closer related to diTPSs involved in specialized metabolism, such as oxygenating diterpene synthases (SsLPPS, labda-13-en-8-ol diphosphate synthase) and (+)-CPSs like MvCPS3 and SmCPS1. In addition, sequence alignment also revealed our SdCPS2 could be distinguished from ent- and (+)-CPSs (Supplementary Figure S4). Potter et al. has reported that H263 and N322 residues are key catalytic base dyads in A. thaliana28, and that they are well conserved in ent-CPSs. In the case of SmCPS1, the corresponding residues were F256 and H315, and these are well conserved in (+)-CPSs. However, sequence alignment showed that the corresponding residues of SdCPS2 were F279 and P340, and that they did not agree with those in ent- and (+)-CPSs. Therefore, SdCPS2 was suggested to be different from the enzymatic activity of ent- and (+)-CPSs.

Phylogenetic analysis also suggested that two of the class I diTPSs (SdKS and SdKSL1) may be functionally distinct since SdKS and SdKSL1 showed close relationships with ent-kaurene synthases and known diTPSs with specialized functions such as MvELS (9,13-epoxy-labd-14-ene synthase) and SsSS (sclareol synthase). Thus, SdKSL1 was deduced to catalyze the cyclization step of syn-CPP in the pathway of unique and specialized diterpene metabolism in S. dulcis.

Finally, SDB and other unique diterpenes produced by S. dulcis are substituted with a benzoyl unit at the C-6 position. Thus, it was suggested that benzoyl-CoA transferase (BCT) also plays an important role in diterpene metabolism in S. dulcis. To predict the responsible gene(s), benzoyl-CoA:taxane 2α-O-benzoyltransferase from Taxus cuspidata (AF297618) was used to search against an in-house transcriptome database using the TBLASTN approach. As a result, two putative candidates, SdBCT1 and SdBCT2, were obtained that were expressed in young leaf tissue. In addition, they were found to be induced by MeJA, as shown in Fig. 2. Thus, these genes were suggested to be involved in unique diterpene biosynthesis in S. dulcis.

Prediction and classification of CYP450 genes in S. dulcis

Terpene diversification is driven by the machinery consisting of TPSs and cytochrome P450-dependent monooxygenases (CYP450s). The latter is important for modifying and diversifying the terpenoid scaffolds by redox modification. Therefore, we examined the CYP450s responsible for the terpenoid biosynthesis in S. dulcis. By searching for transcripts possessing the cytochrome p450 domain (PF00067) against a Pfam-A database, 341 candidates were detected. After detecting ORFs, we found 87 full-length CYP450 genes in S. dulcis. In addition, four CYP gene fragments, which were identical with those previously isolated from S. dulcis, were also added to the candidates. Subsequently, those CYP450 ORFs was classified by comparison with amino acid sequences derived from typical plant CYP450s. As shown in Table 2, the amino acid length of CYP450s ranged from 403 to 544, and most of them (62/87) were suggested to be present in a secretory pathway, i.e., they were inserted into the ER membrane, since they contained signal peptides.

Table 2 List of Full-length CYP450s of S. dulcis.

Then, we comparative analyzed SdCYPs against S. miltiorrhiza. The sequences of 119 CYP450 proteins (SmCYPs) and 91 SdCYPs were used to construct trees for CYP450s by maximum likelihood estimation (Fig. 4). As a result, 52 SdCYP450s were A-type and distributed into 12 families, whereas 39 were non-A type and belonged to 16 families and 7 clans. Among them, genes belonging to the CYP71 clan have been reported to be involved in secondary metabolism29,30. Moreover, TPS genes were predominantly found in combination with CYP71 clan genes, such as CYP71, CYP76, and CYP99 families, in angiosperms31,32,33. CYP71D16 from a tobacco plant and CYP71D51 from a tomato plant have been reported to catalyze hydroxylation of cembrenediol and lycosantalene, respectively33,34. CYP76 members in rice have been shown to be involved in diterpene metabolism35,36,37,38,39. In addition, CYP76AH and CYP76AK sub-family members are responsible for diterpene hydroxylation in Lamiales40,41,42,43,44. In the present study, fourteen CYP71 and nine CYP76 genes were obtained from S. dulcis (Table 2). Among the CYP71 family genes, CYP71D and CYP71CV genes were assigned to same clan, as shown in Fig. 4. Therefore, CYP71CV, CYP71D, and CYP76 families were suggested to be candidates involved in unique diterpene metabolism in S. dulcis.

Figure 4: Phylogenetic analyses of CYP450s from S. dulcis, S. miltiorrhiza, and A. thaliana.
figure 4

The unrooted maximum likelihood tree illustrates the phylogenetic relatedness of CYP450s from S. dulcis (red), Salvia miltiorrhiza (black), and A thaliana (tangeline). Descriptions of CYP450s are listed in Table 2, Supplementary Table S5, and Supplementary Table S6, respectively.

On the other hand, CYP716 and CYP51 family genes might be involved in triterpene biosynthesis31,45,46. As shown in Fig. 4, three genes and one gene were found to belong to the CYP716 and CYP51 families, respectively. So far, it has been reported that S. dulcis produces several triterpenes, such as a betulinic acid, therefore, these CYPs may be involved in triterpene metabolism.

Real-time PCR analysis of putative genes involved in diterpene biosynthesis

As described above, we discovered novel candidate genes involved in the biosynthesis of unique diterpenes in S. dulcis. To clearly elucidate their function, we examined their expression levels when stimulated by MeJA. As shown in Fig. 5, SdCPS2 and SdKSL1 were immediately induced by MeJA stimulation and their strong inductions were continued until 12 h post administration, whereas the expression levels of SdCPS1 and SdKS differed from those of SdCPS2 and SdKSL1. It was noteworthy that MeJA treatment did not alter the relative expression level of SdKS.

Figure 5: qRT-PCR analysis of the expression level of diTPSs and selected SdCYPs by MeJA treatment.
figure 5

Leaves were harvested at indicated time points after treatment with 0.1 mM MeJA. 18S rRNA gene was used for normalization. The transcript levels of each gene in the leaf at 0 hr were set to 1.0. CYPs were grouped into four patterns (green, yellow, orange, and purple) based on their expression patterns. Data are shown as mean ± SD (n = 3). Asterisks indicate significant differences from the control (*p < 0.05, **p < 0.01, and ***p < 0.001).

To detect SdCYPs induced coordinately with SdCPS2 and SdKSL1, we selected twelve genes based on the data of their expression patterns in tissues, as shown in Supplementary Figure S3. Several SdCYPs belonging to the CYP71 and CYP76 families were analyzed following treatment with MeJA. As shown in Fig. 5, expression patterns of SdCYPs could be classified into four patterns. Four SdCYPs (shown with green bars), such as SdCYP71CV1, SdCYP71D489, SdCYP76B72, and SdCYP76B73, were up-regulated at 3 h post-treatment with MeJA. It was noteworthy that their expression patterns were consistent with those of SdCPS2 and SdKSL1, therefore, they were considered to express coordinately with these TPSs. On the other hand, expression levels of SdCYP76S20, SdCYP76S21, and SdCYP71CV2 (shown with orange bars), were increased ca 175-fold, ca 250-fold, and ca 50-fold, respectively, compared with those before MeJA treatment, and appeared to be induced strongly and transiently by stimuli. However, these expression patterns were quite different from those of SdCPS2 and SdKSL1. Expression of SdCYP71D175, SdCYP71D493, and SdCYP71A70 (shown with purple bars), increased slowly and reached maximum expression at 6 to 12 h post-treatment with MeJA. In addition, expression patterns of SdCYP71D491 and SdCYP76S18 (shown with yellow bars), showed a bimodal pattern.

Discussion

Recent progress in next-generation sequencing technologies has expanded the capabilities for studying non-model plants. Therefore, we utilized these methodologies to sequence the transcriptome in the present study, and identified a large number of novel genes in S. dulcis. Consequently, we could postulate the mechanisms of terpenoid metabolism in S. dulcis by identification of gene candidates for terpene biosynthesis. Recently, interest in the biosynthesis of terpenes, in particular diterpenes, has gradually increased due to their industrial and scientific importance. Therefore, our present study provides important information for plant sciences and/or natural products chemistry.

The Lamiales include a large number of economically important plants, and most of them produce a huge number of terpenes. Several Lamiales have been used as medicinal plants, and their bioactive principles are unique diterpenes. For example, Isodon plants produce a large array of ent-kaurene-type diterpenes47 and S. miltiorrhiza biosynthesize a miltiradiene that is an intermediate of tanshinone biosynthesis48. When considering the biosynthetic machineries of these diterpenes, careful attention must be paid to their stereochemistry. Briefly, the former Isodon diterpenes are synthesized via ent-CPP by ent-CPS, whereas the latter, miltiradiene, is biosynthesized via (+)-CPP by (+)-CPS. Therefore, these enzymes might be important for diversification of diterpenes in nature. Indeed, it is suggested that SDB might be synthesized via syn-CPP in S. dulcis because of its stereochemical configuration. To date, syn-CPS has been solely isolated from O. sativa49,50, and the rice syn-CPS (OsCPS4) has been implicated in the biosynthesis of phytoalexins, momilactones, and oryzalexin S. Despite the rice syn-CPS being well studied, a syn-CPS has not yet been identified from dicots because few diterpenes are synthesized via syn-CPP. When we phylogenetically analyzed TPSs being predicted by de novo assembly of transcripts, SdCPS2 was placed into a sub-clade consisting of (+)-CPSs and oxygenating diterpene synthases, which was relatively far from a sub-clade consisting of rice CPSs (Fig. 3). Furthermore, amino acid sequence alignment also revealed distinct properties in important catalytic base amino acid residues in the rice syn-CPS, OsCPS451 (Supplementary Figure S4). OsCPS4 contains H251 and C310 residues at the same position proposed to be the catalytic base dyad in ent-CPS, however, alanine substitution did not significantly alter their activity. In a previous report, Potter et al. showed that a H501 residue presented in the active cavity is an important catalytic base to produce syn-CPP51. In our SdCPS2, Y528 corresponded to H501 of OsCPS4, and tyrosine is invariably conserved in plant CPS. Thus, the control of stereochemically unique reactions is suggested to be different in SdCPS2 from that in OsCPS4, although the enzymatic reaction is hypothesized to be the same. We are now currently focusing on characterizing/elucidating the enzymatic reaction of SdCPS2. Similarly, SdKSL1 was also deduced to be involved in specialized diterpene metabolism, as described above, since it showed close relationships with MvELS and SsSS.

Frequently, genes associated with identical metabolic pathways are often co-expressed so that they can catalyze a linear chain of reactions52. In the present study, we used a criteria based on differential expression patterns and qPCR analyses to choose gene candidates involved in diterpene metabolism. We found distinct patterns between SdCPS1-SdKS and SdCPS2-SdKSL1 linages (Fig. 5). So far, it has been shown that SDB synthesis is significantly induced by exposure to MeJA24. Putative SdCPS2 and SdKSL1 were induced at 3 h post-treatment with MeJA and their expression persisted even at 12 h post-treatment. In addition, we selected four CYP450 candidates, SdCYP71CV1, SdCYP71D489, SdCYP76B71, and SdCYP76B72, involved in SDB biosynthesis. As shown in Fig. 2, it was suggested that three CYPs might be involved in SDB biosynthesis, such as in the hydroxylation of C-6 and carboxylation of C-18. Therefore, these CYP genes might be the most likely candidates for redox modification of diterpene precursor in S. dulcis.

While SdCPS2 has not yet been characterized, it seems likely to produce the syn-CPP intermediate required for SDB biosynthesis, which would provide the first example of such a syn- specific CPS from dicots. Further functional investigations of SdCPS2 and SdKSL1 have already begun, and the results will be published in the near future. The transcriptome sequences and gene expression profiles provide a solid foundation for functional genomic studies of S. dulcis in the future and will facilitate a better understanding of the molecular mechanisms of diterpenes (SDB) biosynthesis.

Conclusion

The present paper revealed that transcriptome analyses provide useful information about novel gene discovery. We revealed gene candidates involved in terpene metabolism in S. dulcis. Among the identified genes, SdCPS2 represents the first gene to produce syn-copalyl diphosphate in dicots. In addition, SdKSL1 was also suggested to participate in the SDB biosynthetic pathway. In addition to these two genes, other candidate genes involved in SDB biosynthesis, were also identified from the results of our RNA-seq analysis. qPCR analyses provided evidence that CYP450s participated in diterpene metabolism. Therefore, these identified genes associated with diterpene biosynthesis will facilitate research and genetic engineering of diterpene metabolism in S. dulcis.

Methods

Plant material and MeJA treatment

S. dulcis were grown under sterilized conditions in 1/2 Murashige and Skoog (MS) agar media under constant light conditions at 25 °C. Eight-week-old plants were used for MeJA treatment.

For RNA-seq library, the plants were treated with or without 0.1 mM MeJA (Sigma-Aldrich, MO, USA) using sprays, and leaves were harvested after 24 h. At this time point, we confirmed the enhanced production of SDB by HPLC using a previously established method24. Various tissues, such as young leaves (first and second leaf set from the top), mature leaves (third leaf set from the top treated with or without MeJA), and roots were harvested and frozen immediately in liquid nitrogen, and stored at −80 °C for RNA extraction. For qRT-PCR, plants were kept for 0, 3, 6, 12, and 24 h at 25 °C after MeJA treatment. At each time point, samples were collected from three or four separated plants and directly frozen in liquid nitrogen.

RNA-seq library construction

Total RNAs were isolated using a TRIzol reagent (Invitrogen, CA, USA). The integrity of total RNA was checked using Agilent 2100 Bioanalyzer. The mRNA was isolated from total RNA using PolyATtract® mRNA Isolation Systems (Promega, MA, USA), and the RNA-seq libraries were constructed using the SMARTer® stranded RNA-Seq kit (Clontech, CA, USA). The library was sequenced using an Illumina MiSeq sequencer (Illumina, CA, USA) after checking the quality with an Agilent 2100 Bioanalyzer.

Data processing, assembly and annotation

The raw reads were cleaned by removing reads containing adapter, reads containing poly-N, and low quality reads using FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) and PRINSEQ53. Sequence quality was examined using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). De novo assembly of clean reads was performed using Trinity54. The resulting de novo assembly was clustered using CD-HIT with 95% global sequence identity55.

All the assembled unigenes were searched against the Nr database to identify the putative mRNA functions using an E-value cut-off of 10−5. Functional annotation and Gene Ontology analysis was carried out using Blast2Go software56.

Abundance estimation and differential expression analysis

Gene expression analysis was carried out with RSEM57 bundled with the Trinity package. Differentially expressed transcripts across the tissues were identified and clustered according to expression profiles using EdgeR Bioconductor package58 using R statistical software.

Computational prediction of TPS and CYP450 genes in S. dulcis

Computational prediction of TPS and CYP450 genes were performed under the following criteria. Coding regions of transcripts were extracted using Transdecoder, and were searched by HMMER against the Pfam-A database with an E-value cutoff of 1e-5. The ORFs matching the HMM model (PF00067, or PF01397 and PF03936) were selected as CYP450 or TPS candidates, respectively. The hit candidate genes were then searched against the CYPED database59 and SwissProt database with an E-value cutoff of 1e-5.

To perform phylogenetic analysis, multiple sequence alignments were performed on the TPS or CYP homologs. The MAFFT program was used in these alignments by employing a highly accurate method: L-INS-I60. Maximum likelihood (ML) trees were built on the datasets using RAxML61. RAxML analyses were conducted with the JTT model and 500 replicates of bootstrap analyses, and the obtained phylogeny was displayed using FigTree (http://tree.bio.ed.ac.uk/software/figtree).

qPCR analysis of selected candidate genes responsible for diterpene biosynthesis

First-strand cDNAs were synthesized using a PrimeScript™ II 1st strand cDNA Synthesis Kit (Takara Bio Inc., Shiga, Japan). The resulting first-strand cDNAs were used as templates for qPCR. Real-time PCR was performed using Brilliant III Ultra-Fast SYBR® Green QPCR Master Mix on an Mx3005p real-time QPCR system (Agilent Technologies). S. dulcis 18S rRNA gene (JF718778) was used for normalization. The sequences of primers used in this study are listed in Supplemental Table S7.

Additional Information

How to cite this article: Yamamura, Y. et al. Elucidation of terpenoid metabolism in Scoparia dulcis by RNA-seq analysis. Sci. Rep. 7, 43311; doi: 10.1038/srep43311 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.