Introduction

The application of synthetic biology approaches to engineering plant systems has facilitated advances in the control and expression of biosynthetic pathways, enabling plants to serve as an alternative biochemical production chassis1,2. A relative of tobacco, N. benthamiana3,4 has emerged as a favored species for plant-based production of pharmaceutical proteins5 and metabolic pathway reconstitution2. Successes include gram-scale production of triterpenoids6 and milligram scale production of etoposides7. However, several studies have reported the accumulation of unintended side products, presumably produced by the off-target activities of endogenous N. benthamiana enzymes such as oxidases and glycosyltransferases8,9,10,11,12,13,14,15,16,17,18. Although the activity of endogenous enzymes has been exploited in some studies to produce novel molecules9 or to compensate for the lack of a known enzyme19, derivatization of molecules for the most part is a disadvantage, reducing the potential purity and yield of the target compound.

Monoterpene indole alkaloids (MIAs) are a large group of plant-produced natural products of which over 3000 have been identified20. This class of molecules includes many medicinally valuable compounds used to treat addiction, heart conditions, dementia, pain, cancer, malaria, and diabetes. The best characterized MIA-producing plant is Catharanthus roseus (Madagascar periwinkle), which makes over 130 MIAs including the bioactive vinblastine and vincristine, which are used as chemotherapies. However, these valuable molecules are present in low concentrations in C. roseus (0.0005% dry weight)21, which limits their availability. Mass cultivation of C. roseus cells is feasible, but a cell line that consistently produces these anti-cancer molecules has yet to be reported22. Although methods for transient expression23 and stable genetic transformation24 of C. roseus plants have been reported, genetically engineering the native plant host to increase yields of these compounds remains technically difficult. Furthermore, the structural complexity of many MIAs means chemical synthesis is often challenging25,26. Consequently, alternate routes for production are desirable and the recent discovery of missing steps in the vinblastine pathway27,28 makes pathway reconstruction in a heterologous host an increasingly attractive option.

Achieving the production of therapeutically useful amounts of MIAs requires pathway engineering to maximize metabolic flux through the early parts of the pathway. Strictosidine is the last common biosynthetic intermediate from which all 3000+ known MIAs derive. Reconstitution of its ~11 step biosynthetic pathway in microorganisms can require extensive tuning of enzyme expression conditions and strain optimization29, for example, poor expression of geraniol 8-hydroxylase (G8H) has hampered strictosidine production in yeast30. Obtaining useful yields of molecules such as vinblastine, which would require the expression of a further 16+ enzymes beyond strictosidine, is likely to require substantial engineering, though yeast has recently been engineered to produce ajmalicine via the genomic integration of 29 expression cassettes, demonstrating the potential for heterologous reconstruction of plant natural product biosynthetic pathways effort31.

While yeast remains a promising host for heterologous expression of metabolic pathways, plant-derived proteins frequently require less optimization and engineering for successful expression in a plant host. Expression in N. benthamiana was used by Miettinen and co-workers to reconstitute the C. roseus iridoid pathway, enabling elucidation of the remaining four missing steps of the pathway26. However, they encountered a metabolic bottleneck midway through the 13-step pathway requiring reconstitution in two phases with the latter part requiring the provision of an intermediate substrate (iridotrial) in order to obtain strictosidine27. Moreover, the rich endogenous metabolism of N. benthamiana acted on the early, hydrophobic intermediates of strictosidine biosynthesis to produce a variety of derivatized, dead-end products.

Here we successfully engineer de novo production of strictosidine in N. benthamiana. We identify endogenous glycosyltransferases that derivatize early pathway intermediates and demonstrate that fewer derivatives accumulate in plant lines with loss-of-function-mutations in the genes encoding these enzymes. We utilize additional genes that enhance the production of nepetalactol and enable high levels of strictosidine to be produced from the central metabolism of N. benthamiana without the need for supplementation of any metabolite precursors or intermediates. Overall, this study demonstrates the potential of N. benthamiana as a bioproduction chassis for small molecules.

Results

Metabolites of the early iridoid pathway are derivatized by the endogenous metabolism of N. benthamiana

Strictosidine is derived from the terpenoid secologanin. Secologanin belongs to the iridoid class of monoterpenes, which are derived from the oxidation, reductive cyclization and substantial derivatization of geraniol. After formation, secologanin undergoes a condensation and cyclization with tryptamine to form strictosidine. Previous studies aiming at the heterologous expression of low molecular weight terpenoid biosynthesis pathways in N. benthamiana have reported the accumulation of derivatized biosynthetic intermediates29. To investigate derivatization in the early iridoid pathway steps, we transiently expressed the early strictosidine pathway enzymes by co-infiltrating N. benthamiana with A. tumefaciens strains containing plasmids encoding pathway steps up to cis-trans nepetalactol (Fig. 1). To enhance the pool of geraniol pyrophosphate (GPP) substrate from the plastidial 2-C-methyl-D-erythritol 4-phosphate/1-deoxy-D-xylulose 5-phosphate (MEP/DOXP) pathway, we included a bifunctional geranyl/geranylgeranyl pyrophosphate synthase from Picea abies (PaGPPS) and 1-deoxy-D-xylulose 5-phosphate synthase (DXS) from C. roseus (CrDXS), both targeted to the plastid. To produce geraniol, these were co-expressed with geraniol synthase from C. roseus (CrGES), previously shown to enable the heterologous production of geraniol in plants32. We then added the first three dedicated strictosidine pathway steps: geraniol 8-oxidase (CrG8H), 8-hydroxygeraniol oxidoreductase (CrGOR) and iridoid synthase (CrISY) to produce cis-trans nepetalactol and perfomed high-resolution mass spectrometry of transiently infected N. benthamiana leaves to identify the enzymatic products. As reported by Dong and coworkers32, we found that transient expression of CrDXS, PaGPPS and CrGES produces a range of non-volatile glycosylated and oxidized derivatives of geraniol (Fig. 1 and Supplementary Table 1). The addition of later pathway steps, through to cis-trans nepetalactol further modified the profile of derivatized products. In general, fewer derivatives accumulated as more pathway steps were added, suggesting that the strong, constitutive expression enabled the pathway enzymes to outcompete endogenous substrates (Fig. 1).

Fig. 1: Metabolites of the early iridoid pathway are derivatized by endogenous enzymes from N. benthamiana.
figure 1

A common modification was the addition of pentose and hexose sugars e.g., Peak 1, hexosyl hydroxygeraniol [M + HCOOH-H], 3.84 min, m/z 377.1817; Peak 2, hexosyl hydroxycitronellal [M + HCOOH-H], 4.12 min, m/z 379.1974; Peak 3, trihexosyl geranic acid [M + HCOOH-H], 4.35 min, m/z 699.2703); Peak 4, pentosyl hexosyl geraniol [M + HCOOH-H], 5.32 min, m/z 493.2288; Peak 5, acetyl dihexosyl geraniol [M-H], 5.454 min, m/z 519.2445; DXS, 1-deoxy-D-xylulose 5-phosphate synthase; GPPS, geranyl diphosphate synthase; GES, geraniol synthase; G8H, geraniol 8-oxidase; GOR, 8-hydroxygeraniol oxidoreductase; ISY, iridoid synthase.

One of the common features of derivatization was the addition of pentose and hexose sugars. The enzymes that are primarily responsible for glycosylation of plant natural products, including monoterpenes, belong to glycosyltransferase Family 133. We hypothesized that many of these UGTs are likely to be involved in the biosynthesis of endogenous secondary metabolites used, for example, in defense. Alternatively, these UGTs may be involved in the detoxification of a variety of metabolites. We therefore considered that introducing loss-of-function mutations into the genes encoding them would be unlikely to have substantial impacts on the development of plants grown in the controlled environments used for agroinfiltration. However, while large numbers of UGTs can be readily identified by searching the genomes of all vascular plants for appropriate homologues, predicting the substrate specificities, and identifying individual genes responsible for modifying specific metabolites is challenging34.

The expression of endogenous UGTs is altered by the early iridoid pathway

To investigate if changes in gene expression correlated with the chemical modifications observed, we conducted transcriptome analysis of leaf samples infiltrated with the iridoid pathway. As a substantial transcriptional response to agroinfiltration has previously been observed35, we compared changes in expression to both non-infiltrated controls and to control plants infiltrated with a constitutively expressed green fluorescent protein (GFP) (see methods). We observed that the expression of some UGTs increased in response to infiltration, while others increased specifically in response to expression of the pathway to geraniol or to nepetalactol (Supplementary Table 2). We also performed a phylogenetic analysis of all Family 1 UGTs identified in the N. benthamiana leaf transcriptome (see methods), investigating their relationship to those found in Arabidopsis thaliana and to previously characterized plant UGTs known to be active on geraniol and nepetalactol substrates (Fig. 2 and Supplementary Fig. 1).

Fig. 2: Maximum likelihood (RAxML) phylogenetic comparison of 193 Family 1 UDP-glycosyltransferases (UGTs) from N. benthamiana and A. thaliana and nine UGT sequences previously shown to be active on geraniol or iridoid substrates.
figure 2

Groups A-P are annotated according to the nomenclature used by Caputi et al. 201233. Labeled taxa indicate enzymes in which Cas9-mediated targeted mutations were subsequently introduced. Filled gray circles at nodes indicate bootstrap supports >95. Scale bar represents the number of substitutions per site. A tree with all taxa and bootstrap values is provided in Supplementary Fig. 1.

Production of N. benthamiana lines with Cas9-mediated targeted mutations in UGTs

We hypothesized that one way to improve the N. benthamiana chassis would be to remove these endogenous UGTs by introducing mutations into the genes encoding them. Using both the expression profiles and the predicted substrate selectivity, we selected genetic targets for Cas9-mediated targeted mutagenesis. We then constructed 8 plasmids targeting a total of 25 UGTs (Table 1). We first designed three binary vectors for the stable transformation of wild-type N. benthamiana plants (pEPQDPKN0720, pEPQDPKN0724, pEPQDPKN0361) (Supplementary Figs. 24). These constructs encoded sgRNAs targeting genes encoding UGTs in Group D, E and G, previously shown to be active on geraniol36,37 as well as uncharacterized UGTs from the same groups with the logic that they may be active on the same substrate (Table 1 and Supplementary Table 3). As the UGTs targeted by each construct were closely related, some sgRNAs were able to target more than one gene. Consequently, each construct encoded six sgRNAs targeting the first or second exon of either four Group D UGTs, five Group E UGTs or three Group G UGTs (Table 1 and Supplementary Figs. 24). These constructs were delivered to wild type plants and the genotypes of regenerated transgenic (T0) plants were determined. We discarded lines in which the genotype was unclear or indicated potential genetic chimerism and selected lines with homozygous, heterozygous or biallelic mutations at target locations. The genotypes were confirmed in T1 plants, together with the presence or absence of the T-DNA. All 18 sgRNAs introduced mutations in at least one T0 plant. Notably, we were unable to recover any plants with frameshift mutations in both NbUGT72B58 and NbUGT72B35, however, we were able to recover lines with mutations in each of these genes in combination with other Group E UGTs (Table 1). We selected six T1 plants with mutations in different combinations of 12 individual genes (Table 1). All lines were morphologically normal with no obvious changes in growth and development and maturing at the same time as control lines.

Table 1 Lines of N. benthamiana with Cas9-induced mutations in UGTs.

We also designed and built five TMV-based viral vectors encoding mobile sgRNAs using the methods previously described by Ellison and colleagues38. Four vectors contained mobile sgRNAs targeting UGTs that were either strongly upregulated following infiltration (pEPQDKN0777), strongly upregulated in response to expression of the geraniol pathway (pEPQDKN0778) or the nepetalactol pathway (pEPQDKN0779) or had high normalized counts following expression of the nepetalactol pathway (pEPQDKN0780) (Table 1, Supplementary Fig. 5 and Supplementary Table 2). A final vector encoded three mobile sgRNAs targeting four UGTs known to be active on geraniol37 (pEPQDKN0781) (Table 1, Supplementary Fig. 5 and Supplementary Table 3). These vectors were infiltrated into transgenic plants constitutively expressing Cas9. Progeny seeds (designated as E1) were collected from pods that formed on stems in which mutations were detected at the target sites (see methods). The genotypes of E1 plants were analyzed, again selecting lines with homozygous, heterozygous or biallelic mutations at target locations. Overall, the mobile sgRNA method was less efficient, with eight out of fifteen mobile sgRNAs introducing mutations at their target. This resulted in the selection of five E1 lines, each with mutations in one, two or three target UGTs (Table 1 and Supplementary Fig. 5). As above, there were no changes in growth or development.

Targeted mutagenesis identifies UGTs active on geraniol and prevents accumulation of glycosylated derivatives

To investigate if the introduction of mutations in UGTs would reduce the presence of geraniol derivatives, mutated plants of which the genotypes of all target loci had been confirmed were infiltrated with strains of agrobacterium encoding pathway enzymes to produce either geraniol or cis-trans nepetalactol. Samples were analyzed for derivatives by UHPLC/MS as previously discussed and the metabolic profile was compared to parental lines (wild type and Cas9 transgenic lines) and to the progeny of a non-transgenic, non-mutated control line regenerated in tissue culture and grown in identical conditions.

Plants with mutations in the Group A UGT, NbUGT94E7, did not accumulate trihexosyl geranic acid (m/z 699.2713, [M + HCOOH-H]) following expression of pathway enzymes to produce either geraniol or nepetalactol (Fig. 3, Supplementary Data 1, Supplementary Figs. 6 and 7). The presence of acetyl dihexosyl geraniol (m/z 519.2445, [M-H]) was also eliminated in samples in which pathway enzymes to produce nepetalactol were expressed. In addition, three lines with mutations in the Group G NbUGT85 A73 in combination with either NbUGT72AY1 (Group E) or NbUGT85A74 and NbUGT85A104 (Group G) accumulated less or no hexosyl hydroxygeraniol (m/z 377.1817, [M + HCOOH-H]), hexosyl hydroxycitronellal (m/z 379.1974, [M + HCOOH-H]), pentosyl hexosyl geraniol (m/z 493.2288, [M + HCOOH-H]) or acetyl dihexosyl geraniol (m/z 519.2445, [M-H]) (Fig. 3, Supplementary Data 1, Supplementary Figs. 6 and 7). Interestingly, these peaks were only absent in samples in which pathway enzymes to produce nepetalactol were expressed.

Fig. 3: Mutation of NbUGT94E7 (Group A) or NbUGT85A73 (Group G) eliminates the accumulation of specific derivative compounds.
figure 3

The assigned identities of peaks 1-5 are provided in Fig. 1. Asterisks indicate that activity on geraniol has previously been reported. Values and error bars represent the mean and the standard error of n = 3 biological replicates. For the pairwise comparison of all mutated lines to all three control lines (gray bars), means followed by a common Greek letter (α,β,γ,δ) are not significantly different by a one-way ANOVA with post-hoc Tukey HSD at the 5% level of significance. Plots with no Greek letter values have no experimental lines with significant differences from the three control lines.

MLPL1 from Nepeta mussinii is essential for 7-DLA production in N. benthamiana

We then focussed on extending the iridoid pathway with the goal of reconstituting the pathway through to the central strictosidine intermediate. A previously reported attempt encountered a metabolic bottleneck requiring the infiltration of iridotrial in order to obtain strictosidine38. The biosynthetic pathway for production of cis-trans nepetalactone in the genus Nepeta shares the secoiridoid pathway from C. roseus up to the stereoselective reduction of 8-oxogeranial to an enol intermediate by the NADPH-dependent iridoid synthase (ISY)39. It has recently been shown that in Nepeta ISY works in combination with either nepetalactol-related short-chain dehydrogenase-reductases (NEPS) or a major latex protein-like enzyme (MLPL) to control the stereoselectivity of the ring closure40,41. We therefore hypothesized that addition of the MLPL from Nepeta mussinii (a.k.a Nepeta racemosa), which is specific for the stereochemistry found in strictosidine, would enhance flux through the pathway in N. benthamiana. Recent efforts at reconstitution in yeast31 and cell-free42 systems have also indicated that this enzyme substantially enhances yield.

Infiltration of all pathway enzymes to produce 7-DLA without MLPL did not produce a peak for 7-DLA (expected m/z 359.1349) or acylated 7-DLA (expected m/z 401.1447), consistent with previous reports39 (Fig. 4). However, the inclusion of MLPL produced a clear peak of m/z 359.1342 which matches the retention time of the 7-DLA standard. Exclusion of 7-deoxyloganetic acid glucosyl transferase (7-DLGT) produces a peak with the same m/z but this does not match the retention time of 7-DLA. It is possible that endogenous UGTs are able to use 7-deoxyloganetic acid to produce a glucose ester. As observed with the early pathway, strong, constitutive expression of pathway genes (in this case, 7-DLGT) may outcompete native enzymes resulting in the absence of this putative glucose ester peak in the spectra of the full 7-DLA pathway.

Fig. 4: MLPL1 from Nepeta mussinii enables 7-deoxyloganic acid production in N. benthamiana.
figure 4

Transient expression of all pathway enzymes plus MLPL produces a peak of 359.1342 m/z which matches the mass and retention time of a 7-deoxyloganic acid (7-DLA) standard. The reconstituted pathway without DLGT also produces a 359.1343 m/z peak but at a different retention time, likely due to endogenous glycosyltransferases of N. benthamiana producing the glucose ester of 7-deoxyloganetic acid. DXS, 1-deoxy-D-xylulose 5-phosphate synthase; GPPS, geranyl diphosphate synthase; GES, geraniol synthase; G8H, geraniol 8-oxidase; GOR, 8-hydroxygeraniol oxidoreductase; ISY, iridoid synthase; MLPL, major latex protein–like; IO, iridoid oxidase; 7-DLGT, 7-deoxyloganetic acid glucosyltransferase.

Reconstitution of the complete strictosidine pathway

Building on the experimental conditions that successfully produced 7-DLA, we sequentially added the remaining five pathway enzymes and measured the pathway intermediates after the sequential addition of each gene (Fig. 5a, Supplementary Data 1, Supplementary Figs. 812). Co-infiltration of strains expressing enzymes for the entire pathway produces a distinct peak at 531 m/z matching the strictosidine standard (Fig. 5b, c, Supplementary Data 1, Supplementary Fig. 11). This pathway configuration produces 4.29 ± 2.00 µM of strictosidine which correlates to 0.23 ± 0.11 mg strictosidine/g dry weight leaf tissue (0.023% DW). The only major biosynthetic intermediate that was observed to accumulate was loganin, suggesting secologanin synthase is a bottleneck step. This contrasts with previous data suggesting that loganic acid O-methyltransferase (LAMT), which has a low substrate affinity (Km = 12.5–14.8 mM)43,44, might be a rate-limiting step of the late stages of the secoiridoid pathway. In addition to strictosidine, transient expression of the full pathway also produces a smaller amount of a compound (Fig. 5c, Supplementary Fig. 11) with a mass shift of 86 Da from strictosidine, suggesting that this may be a malonylated derivative of strictosidine produced by endogenous N. benthamiana acyltransferases. Transient expression of the strictosidine pathway without GPPS and MLPL (Fig. 5b, Supplementary Data 1) confirms the beneficial effect of these enzymes on strictosidine yield. Addition of MLPL increased strictosidine production >60 fold while supplementation of GPPS improved yield ~5 fold. DXS supplementation did not change the amount of strictosidine produced (Fig. 1).

Fig. 5: Reconstitution of strictosidine biosynthesis in N. benthamiana.
figure 5

a Quantification of intermediates and final product strictosidine by UPLC/MS analysis. b Absence of MLPL reduces strictosidine production >60 fold while supplementation of GPPS improves yield ~5 fold. c the total ion chromatogram of leaf tissue infiltrated with the entire pathway to strictosidine (including DXS, GPPS, and MLPL). The peak at 4.09 min retention time in the total ion chromatogram (TIC) and extracted ion chromatogram (EIC) at 531.2336 m/z matches a strictosidine standard. N.D., not detected; 7-DLH, 7-deoxyloganic acid hydroxylase; LAMT, loganic acid O-methyltransferase; SLS, secologanin synthase; TDC, tryptophan decarboxylase; STR, strictosidine synthase. Values and error bars represent the mean and the standard error of n = 3 or n = 6 biological replicates (independent leaf samples).

We also infiltrated the full pathway into the previously identified lines with mutations in UGT94E7 (Group A) or NbUGT85A73 (Group G), each of which had shown the accumulation of fewer early iridoid derivatives. However, we did not observe any changes in the yield of strictosidine or strictosidine by-products (Supplementary Fig. 13).

Mimicking C. roseus subcellular compartmentalization maximizes strictosidine production

To compare the effect of chloroplast and cytosolic enzyme localization on yields of 7-DLA and strictosidine production, we added (or removed in the case of GPPS/GES) a transit peptide to each enzyme (Fig. 6, Supplementary Data 1). To enhance flux of isoprenoid precursors in the cytosol, we aimed to alleviate the rate limiting step of the mevalonate pathway by co-infiltrating a truncated 3-hydroxy-3-methylglutaryl-coenzyme A reductase (tHMGR) from oat, previously shown to improve titres of the triterpenoid β-amyrin in N. benthamiana42. When all enzymes were localized to the cytosol, flux through the secoiridoid pathway was minimal (~90-fold reduction for 7-DLA) while localization of all pathway enzymes to the chloroplast resulted in 5-fold less 7-DLA (Fig. 6a, Supplementary Data 1). The best yields of 7-DLA (Fig. 6a, Supplementary Data 1) and strictosidine (Fig. 6b, Supplementary Data 1) were obtained with chloroplast localization of the early pathway and cytosolic localization of subsequent steps, which mimicked the native localization pattern in C. roseus. Production of 7-DLA in the chloroplast is possibly limited by the availability of partner P450 reductases for G8H and iridoid oxidase (IO) or small molecule substrates such as UDP-glucose for 7-DLGT, however, all possible divisions of pathway enzymes between the cytosol and chloroplast still produce 7-DLA indicating that pathway intermediates can cross the chloroplast membrane.

Fig. 6: Subcellular location of pathway enzymes affects product yields.
figure 6

Relocation of 7-DLA (a) and strictosidine (b) biosynthesis genes to the cytosol (blue) or the chloroplast (green) decreases product yield. tHMGR, truncated 3-hydroxy-3-methylglutaryl-coenzyme A reductase. Values and error bars represent the mean and the standard error of n ≥ 3 biological replicates (independent leaf samples).

Discussion

In this work, we set out to improve the production of biosynthetic intermediates and products of the iridoid and MIA pathway in N. benthamiana. Previous studies have reported that, when N. benthamiana is used as a host for heterologous expression of terpenoid natural products, a variety of terpenoid derivatives accumulate, potentially limiting the use of this species as a chassis for the production of some molecule types8,9,10,11,12,13,14. As found in previous studies, we observed here that early iridoid biosynthetic intermediates were frequently modified by the addition of pentose and hexose sugars suggesting the involvement of N. benthamiana Family 1 UGTs (Fig. 1 and Supplementary Table 1). Further, while it is already known that the gene expression is affected by agroinfiltration43, comparative transcriptomics revealed that the expression of some endogenous UGTs are further modulated by the expression of monoterpene or iridoid metabolic pathways (Supplementary Table 2). This indicates that plants not only respond to the infiltration of Agrobacterium, but also to the presence of foreign enzymes and/or metabolites.

Although N. benthamiana does not accumulate many monoterpenes, these endogenous UGTs may be involved in the biosynthesis of larger decorated metabolites and are thus able to act promiscuously on the early iridoid intermediates. Alternatively, they may be part of a xenobiotic detoxification mechanism to protect against bioactives produced by pathogens. We reasoned that these enzymes were unlikely to be essential and focused efforts to improve the N. benthamiana chassis by identifying and eliminating the activity of these enzymes. We used Cas9-based molecular tools to introduce loss-of-function mutations into 20 UGT target genes (Table 1 and Supplementary Figs. 25). We employed two approaches: the production of transgenic plants constitutively expressing Cas9 together with multiple sgRNAs45 and the recently described use of a viral vector to transiently express mobile sgRNAs in plants carrying a Cas9 transgene38. While we were able to recover lines with mutations in most or all target lines using the former method, the mobile sgRNA method, though less efficient, was less laborious and could more easily be scaled to investigate many target genes.

Relative to control lines, plants with mutations in the Group A UGT, NbUGT94E7, no longer accumulated the peak putatively assigned as trihexosyl geranic acid (m/z 699.2713, [M + HCOOH-H]) (Fig. 3, Supplementary Figs. 6 and 7). This UGT also lacks the GSS motif proposed to be important for defining the loop structure of monoglucosyltransferases37,39,40,41,42,43,44,46. We additionally identified that, relative to control lines, four peaks were diminished or absent in plants with mutations in the Group G UGT, NbUGT85A73 (Fig. 3 and Supplementary Figs. 6 and 7). This UGT was previously reported as being active on geraniol37. Surprisingly, these peaks were only eliminated in plants infiltrated with all pathway enzymes required to produce nepetalactol, and still remained when pathway genes for expression of geraniol were present. It is likely that hexosyl hydroxygeraniol (m/z 377.1817, [M + HCOOH-H]) and hexosyl hydroxycitronellal (m/z 379.1974, [M + HCOOH-H]) are produced by glycosylation of 8-hydroxygeraniol and therefore require the expression of geraniol 8-oxidase. However, it is unclear why the accumulation of pentosyl hexosyl geraniol (m/z 493.2288, [M + HCOOH-H]) and acetyl dihexosyl geraniol (m/z 519.22425, [M-H]) were reduced only when the extended pathway was infiltrated.

Our approach provides proof-of-concept for the application of gene editing approaches to improve N. benthamiana as a bioproduction chassis for small, terpenoid type molecules by reducing the accumulation of derivatives. However, we observed that mutations of individual UGTs each reduced the accumulation of minor peaks. To achieve impact, the production of lines with mutations in multiple genes may be required to dramatically impact the yield of metabolites of interest. We also found that as the pathway is extended by the co-expression of additional enzymes that fewer derivative peaks are observed (Figs. 1, 4 and 5). This suggests that the pathway enzymes have high affinity for the substrates and, particularly when expressed from strong promoters, are very likely able to outcompete endogenous enzymes for substrates. Strictosidine is a polar, glycosylated molecule that is likely sequestered in the vacuole for storage and is less prone to derivatization than its hydrophobic iridoid precursors. Indeed, when the pathway for strictosidine was expressed in lines that produced fewer derivatives or early pathway intermediates, there was no change in yield compared to expression in wild type N. benthamiana (Supplementary Fig. 13).

We were able to demonstrate the production of high quantities of strictosidine (0.23 ± 0.11 mg/g DW) from central metabolism in a photosynthetic organism. We also enhanced the production of strictosidine by improving the key cyclization step required for formation of nepetalactol. The plant genus Nepeta L., colloquially known as catnip or catmint, uses the early part of the secoiridoid pathway to produce nepetalactones. Recent work revealed that production of cis-trans nepetalactol is assisted by MLPL in Nepeta41. The heterologous expression of MLPL from N. mussinii to assist with cyclization of the reactive enol product of ISY overcame bottlenecks in the secoiridoid pathway (Fig. 4) and was critical for enabling heterologous production of strictosidine in N. benthamiana (Fig. 5). MLPL similarly enhances secoiridoid metabolic engineering in microorganisms such as yeast31. Additionally, a cell-free in vitro one-pot enzyme cascade included MLPL with nine other pathway enzymes, accessory proteins, and cofactor regeneration enzymes to produce ~1 g/L nepetalactone42.

The MIA pathway in C. roseus is highly compartmentalized across subcellular compartments and cell types. The first committed step is geraniol synthesis (GES), which is localized to the chloroplast of internal phloem-associated parenchyma cells47. To increase substrate availability for GES, we co-expressed GPPS from P. abies48 which was also utilized by Miettinen and coworkers14. Expression of chloroplast-targeted PaGPPS improved yield ~5 fold (Fig. 5) consistent with previously reported effects on the production of geraniol13. Of note, PaGPPS is nearly identical (Supplementary Fig. 14) to a GPPS from Picea glauca shown in vitro to produce higher levels of the monoterpene limonene relative to six other GPPS sequences commonly used in terpene metabolic engineering49. We also co-expressed DXS from C. roseus. Interestingly, CrDXS had relatively little effect on the yield of strictosidine in contrast to previously reported effects of DXS on the production of diterpenoids10,50.

Recent efforts to engineer metabolic pathways have found benefits in altering the compartmentalization of pathway enzymes13,51. For example, the pathway to produce the cyanogenic glucoside dhurrin was relocated to the chloroplast of Nicotiana tabacum where ferredoxin, reduced via the photosynthetic electron transport chain, can serve as an efficient electron donor to the two cytochromeP450s (CYPs) within the pathway52. Additionally, the localization of enzymes encoding the late steps of the artemisinin pathway to the chloroplast in N. tabacum produced higher levels of artemisinin (800 µg/g DW)53 and artemisinic acid (~1200 µg/g DW)54 compared to localization within the cytosol (6.8 µg/g DW artemisinin)55. This increase is possibly due to the isolation of metabolites from the cytosol where they may both impact viability and be exposed to unwanted derivatization by endogenous glycosyltransferases11,12,56. Production of halogenated indican57 and vanillin58 in N. benthamiana also benefited from chloroplast localization. In contrast, a recent report found that production of diterpenoids (typically synthesized in the chloroplast) was dramatically enhanced by co-opting the cytosolic mevalonate pathway to produce GGPP rather than the chloroplast MEP pathway59.

In this study, we found that the optimal configuration for reconstructing the pathway to strictosidine within N. benthamiana leaves is to match the C. roseus localization pattern that utilizes the chloroplast MEP pathway for isoprenoid precursors to produce geraniol and then localizing the remaining pathway enzymes in the cytosol (Fig. 4). We hypothesize that monoterpene production in the cytosol is limited since GPP (10 carbons) produced by GPPS is also the substrate for the N. benthamiana farnesylpyrophosphate synthase (FPPS), which produces farnesylpyrophosphate (FPP) (15 carbons). Supporting this hypothesis is data suggesting that all four copies of NbFPPS are upregulated to produce sesquiterpenoid phytosterols and phytoalexins in response to Phytophthora infestans60. A. tumefaciens also elicits widespread transcriptional remodeling when infiltrated into N. benthamiana35. This competition between GES and FPPS might also explain the higher levels of geraniol and geraniol derivatives in the plastid reported previously13. Future efforts to conditionally inactivate NbFPPS during heterologous production might enable a metabolic engineering strategy that could take advantage of both the plastid and cytosolic route to geraniol production.

In C. roseus, geraniol diffuses or is transported from the plastid into the cytosol to react with G8H, which is tethered to the exterior of the ER membrane. The next steps (G8H to deoxyloganic acid hydroxylase (7-DLH)) are active in the cytosol of phloem-associated parenchyma cells with two CYPs (G8H and IO) also anchored to the ER membrane14. Loganic acid is then transported by NPF2.4/5/661 to epidermal cells where four more enzymes (LAMT to strictosidine synthase (STR)) produce strictosidine62,63. Tryptamine and secologanin are imported into the vacuole, where strictosidine is synthesized and accumulated with export of strictosidine mediated by the transporter NPF2.964. Thus, four pathway CYPs (G8H, IO, 7-DLH and secologanin synthase) are likely interfacing with endogenous N. benthamiana CYP reductases for electron transfer of NADPH to the CYPs. It is possible that the necessary CYP reductases are less abundant in the chloroplast and thus limit the accumulation of strictosidine in this compartment. It is also possible that the CYP450s are improperly membrane anchored or that higher stromal pH of the chloroplast (~8.0)65 inhibits enzyme activity compared to the cytosol (pH ~7.0). We also considered the lack of an additional cofactor for LAMT, S-adenosylmethionine (SAM) to explain the low levels of strictosidine production (relative to 7-DLA) in the chloroplast. However, this small molecule is known to be actively transported into the plastid66 and is an essential substrate for ChlM (Mg-protoporphyrin IX methyltransferase) involved in chlorophyll biosynthesis within the chloroplast.

The production of strictosidine in planta opens up new avenues to produce a wealth of MIA products using biological synthesis. Although the endogenous metabolism of this species is detrimental to the accumulation of monoterpenes, it enables accumulation of complex molecules. Particularly as new biosynthesis pathways for additional MIAs are discovered (e.g., the anti-addictive compound ibogaine67, the antimalarial quinine68), the possibility of coupling this work in a plug-and-play manner with downstream biosynthesis modules is an exciting prospect for natural product synthesis.

Methods

Construction of expression constructs

Binary vectors for Agrobacterium tumefaciens-mediated transient expression (agroinfiltration) were either assembled by cloning coding sequences amplified from C. roseus cDNA into the pEAQ-HT-DEST1 vector (GenBank GQ497235, Supplementary Table 4)69 or assembled into expression constructs using the plant modular cloning (MoClo) toolkit70. For the latter, coding sequences were either synthesized (Twist Bioscience, San Francisco, CA) removing any native chloroplast transit peptide sequence as well as any instances of BpiI, BsaI, BsmBI and SapI recognition sites by the introduction of synonymous mutation or amplified from C. roseus cDNA by PCR with overhangs containing BpiI (BbsI) recognition sites. Amplicons were cloned into pUAP1 (Addgene #63674) as previously described71 resulting in level 0 parts (Addgene# 177019 − 177032) flanked by an inverted pair of BsaI recognition sites that produce overhangs compatible with the phytobrick assembly standard72 (Supplementary Table 5). These level 0 parts were assembled into level 1 acceptors in a one-step cloning reaction71 with level 0 parts encoding CaMV 35 s promoter (CaMV35s) and 5’ UTR from tobacco mosaic virus (TMV), and, when required, a synthetic chloroplast transit peptide sequence (Supplementary Table 6) to alter subcellular localization.

Transient expression in N. benthamiana

N. benthamiana plants were grown in a controlled environment room with 16 hr light, 8 hr h dark with room at 22 °C, 80% humidity, and ~200 µmol/m2/s light intensity. Electrocompetent A. tumefaciens GV3101 (MoClo vectors) or LBA4404 (pEAQ vectors) were transformed with the binary plasmid encoding the gene of interest and an individual colony was used to inoculate liquid LB medium containing antibiotics 50 μg/mL rifampicin, 20 μg/mL gentamicin, and 100 μg/mL carbenicillin (MoClo vectors) or 50 μg/mL rifampicin, 100 μg/mL streptomycin, and 50 μg/mL kanamycin (pEAQ vectors). Overnight saturated cultures were centrifuged at 3400 × g for 30 min at room temperature and cells were resuspended in infiltration medium (10 mM 2-(N-morpholino)ethanesulfonic acid (MES) pH 5.7, 10 mM MgCl2, 200 µM 3′,5′-Dimethoxy-4′-hydroxyacetophenone (acetosyringone)) and incubated at room temperature for 2-3 h with slow shaking. All resuspended cultures were diluted to 0.8 OD600nm (MoClo) or 0.5 OD600nm (pEAQ) and mixed in equal ratios as dictated by the experimental condition. For MoClo vectors, a separate A. tumefaciens strain encoding a gene expressing the P19 suppressor of gene silencing from Tomato Bushy Stunt Virus (TBSV) previously shown to increase heterologous expression was included in every infiltration69. Healthy plants (29–37 days old) with 3-4 fully expanded true leaves were infiltrated on the abaxial side of the leaf using a 1 mL needleless syringe and grown for five days in an MLR-352-PE plant growth chamber (Panasonic Healthcare Co, Oizumi-Machi, Japan) with 16 hr light, 8 hr h dark at 22 °C and 120–180 µmol/m2/s light intensity. All chemical compounds were purchased from Sigma-Aldrich (St. Louis, MO).

Metabolite extraction

Five days post-infiltration, 100–300 mg of infiltrated N. benthamiana leaf tissue was collected in 1.5 mL microcentrifuge tubes and flash-frozen in liquid nitrogen. Leaf tissue was lyophilized overnight using a VirTis BenchTop SLC freeze dryer (SP Industries, Stone Ridge NY, USA) set to −49 °C and 300 mTorr. Samples were then ground to powder using a 3 mm tungsten carbide bead (Qiagen Cat. No. / ID: 69997) on a TissueLyser II (Qiagen, Hilden, Germany) set to 20 Hz for 20 sec. Lyophilized leaf tissue was extracted with 70% methanol + 0.1% formic acid (1:100, w-v). The solvent contained 10 µM of harpagoside (Extrasynthese, Genay, France) as an internal standard. The extractions were performed at room temperature for 1 hr, with 10 min sonication and 50 min constant shaking. Samples were centrifuged at 17,000 × g for 10 min to separate the debris and filtered through 0.2 µm PTFE disk filters before ultra-high performance liquid chromatography-mass spectrometry (UPLC/MS) analysis.

Metabolite analysis

UPLC/MS analysis was performed on an Impact II qTOF mass spectrometer (Bruker) coupled to an Elute UPLC (Bruker) chromatographic system. Chromatographic separation was carried out on a Phenomenex Kinetex column XB-C18 (100 × 2.10 mm, 2.6 μm particle size) kept at 40 °C and the binary solvent system consisted of solvent A (H2O + 0.1% formic acid) and solvent B (acetonitrile). Flow rate was 600 μL/min. The column was equilibrated with 99% A and 1% B. During the first minute of chromatography, solvent B reached 5%. Then a linear gradient from 5% B to 40% B in 5 min allowed the separation of the compounds of interest. The column was then washed at 100% B for 1.5 min and re-equilibrated to 1% B. Injection volume was 2 μL. Mass spectrometry was performed both in positive and negative ion mode with a scan range m/z 100–1000. The mass spectrometer was calibrated using sodium formate adducts. The source settings were the following: capillary voltage 3.5 kV, nebulizer 2.5 Bar, dry gas 11.0 L/min, dry temperature 250 °C. Data analysis was performed using the Bruker Data Analysis software. Quantification of 7-deoxyloganic acid (7-DLA), loganin, loganic acid and strictosidine was based on calibration curves generated using pure compounds. Loganin and loganic acid were purchased from Sigma. 7-deoxyloganic acid and strictosidine were synthesized as previously described73,74. The standards were diluted in 70% methanol + 0.1% formic acid to give nine calibrants with concentrations between 40 nM and 10 µM. A linear response was observed for all compounds in this range of concentrations (R2 > 0.993). Putative identification of metabolites was based on the acquisition of high-resolution mass spectrometry data to determine the best fit elemental composition using the Data Analysis software (Bruker).

Expression analysis of infiltrated plants

Agroinfiltration experiments were performed as described above with four sets of pEAQ vectors: (1) low geraniol (GFP, CrGES), (2) high geraniol (GFP, CrDXS, CrGGPPS.LSU, CrGES), (3) nepetalactol (GFP, CrDXS, CrGGPPS.LSU, CrGES, CrG8H, Cr8HGO, CrISY), and (4) an infiltration control (GFP only) (Supplementary Table 7). Leaf tissue from three biological replicates of each condition and a mock infiltrated control were collected five days post-infiltration and flash-frozen in liquid nitrogen. Total RNA was isolated using the RNeasy plant mini kit (Qiagen) with recombinant DNase I (Roche) treatment. Libraries were constructed on a Sciclone® G3 NGSx workstation (PerkinElmer, Waltham, MA) using the TruSeq RNA protocol v2 (Illumina 15026495 Rev.F). RNA quality was assessed using the Quant-iT™ RNA Assay Kit (Life technologies/Invitrogen Q-33140). 1 µg of RNA was purified to extract polyadenylated mRNA using biotin beads, fragmented, and primed with random hexamers to produce first strand cDNA followed by second-strand synthesis to produce ds cDNA. Sample ends were blunted and A-tailed and indexing adapters with corresponding T-overhangs were ligated. The ligated products were size-selected and unligated adapters were removed using XP beads (Beckman Coulter A63880). Samples were amplified with a cocktail of PCR primers to the adapters. The insert size of the libraries was verified using the LabChipGX (PerkinElmer 5067–4626) and DNA High Sensitivity DNA reagents (PerkinElmer CLS760672). Equimolar quantities of each library were pooled, five libraries per pool, and sequenced on one lane of a HiSeq 2500 generating 100 base pair paired-end reads. Data analysis was performed using a web-based Galaxy interface (https://galaxy.earlham.ac.uk)75. The N. benthamiana draft genome assembly v0.5 http://benthgenome.qut.edu.au/76 was expanded to include the genome and plasmids of Agrobacterium C58 (Genbank AE007869.2, AE007870.2, AE007871.2, and AE007872.2) as well as the pEAQ vectors used for infiltration (Supplementary Table 4). All short reads from RNA-seq were mapped to the expanded genome using hisat2 v2.1.0 default parameters. Assembled transcripts were generated using Stringtie v1.3.3.1. Transcripts from n = 15 experimental samples were consolidated using Stringtie merge v1.3.3. All short reads were again aligned to the expanded genome (this time with the merged transcriptome as a reference) using hisat2 v2.1.0. Differential expression tables were generated using DESeq2 v2.11.40.1. Transcripts from the merged transcriptome were translated using transdecoder and the longest coding sequences annotated using phmmer v3.1v2 with the Swiss-Prot protein database (accessed May 2018) as reference.

Phylogenetic reconstruction of UDP-glycosyltransferases (UGTs)

N. benthamiana transcripts encoding sequences annotated as Family 1 Glycosyltransferases (GT1, Protein family (Pfam) PF0201 or PF3033) were analyzed resulting in 77 sequences >327 amino acids and including an intact Plant Secondary Product Glycosyltransferase (PSPG) box. Protein sequences of 107 UGTs from Arabidopsis thaliana were obtained from the A. thaliana cytochrome P450, cytochrome b5, P450 reductase, b-glucosidase, and glycosyltransferase site (http://www.p450.kvl.dk) as in described33. An additional 9 UGT sequences previously reported to have activity on geraniol or other iridoid substrates from Actinidia deliciosa (kiwifruit)77, Camellia sinensis (tea)78, C. roseus79, Gardenia jasminoides (Cape jasmine)80, Sorghum bicolor81, Vitis vinifera (grape)82,83 were also included. Sequences and accession numbers are listed in Supplementary Table 8. The 193 sequences were aligned using MUSCLE 3.8.425 (Edgar 2004) and a phylogenetic tree with 100 bootstraps was generated using RAxML version 8.2.1184 within the Geneious program. Phylogenetic trees were visualized using Interactive Tree Of Life (iTOL)85.

Cas9-mediated targeted mutagenesis by Agrobacterium-mediated transformation

Binary vectors expressing Cas9 and single guide RNAs (sgRNAs) for the desired targets were assembled using the plant modular cloning toolkit71 as previously described45. In brief, primers encoding the desired sgRNA spacer (Supplementary Table 9) were used to PCR amplify the sgRNA stem extension scaffold reported by Chen and coworkers86. The resulting PCR amplicon was assembled with a level 0 part encoding either an Arabidopsis U6–26 promoter (AtU6–26 Addgene#68261) or a N. benthamiana U6 promoter (NbU6-1 Addgene#185623 and NbU6-2 Addgene#185624) (Supplementary Table 9). U6 promoters from N. benthamiana were identified by sequence homology to Arabidopsis U6–26 and efficacy was confirmed by transient infiltration as previously described45. The resulting Level 1 constructs were assembled with synthetic genes conferring resistance to kanamycin and for constitutive expression of Cas9 (Supplementary Fig. 15). Efficacy of sgRNAs was confirmed by transient infiltration as previously described45. The resulting constructs were transformed into the hypervirulent A. tumefaciens strain AGL1 for plant transformation. An individual colony was used to inoculate 10 mL LB medium with antibiotics (50 μg/mL kanamycin and 50 μg/mL rifampicin). Overnight saturated cultures were centrifuged at 3000 × g for 10 min at room temperature and cells were resuspended in 10 mL MS medium with 100 μM acetosyringone and optical density OD600 adjusted to 0.6-0.8. N. benthamiana was transformed as previously reported87 with slight modifications. Young leaves were harvested from 4-week-old, non-flowering plants and surface sterilized. 1-2 cm squares were inoculated with Agrobacterium for five minutes at room temperature, blotted dry on sterile filter paper and placed abaxial side down on co-cultivation media pH 5.8 containing MS basal salt88, Gamborg’s B5 vitamins89, 3% (w/v) sucrose, 0.59 g/L MES hydrate, 6 g/L agarose, 6-benzylaminopurine (BAP, 1.0 mg/L) and naphthaleneacetic acid (NAA, 0.1 mg/L). Explants were co-cultivated for 3 days under white fluorescent light (16 h light/8 h dark) at 22 ± 2 oC then transferred to selection medium containing MS basal salts, Gamborg’s B5 vitamins, 3% (w/v) sucrose, 0.59 g/L MES hydrate, 6 g/L Agargel, 6-Benzylaminopurine (BAP, 1.0 mg/L) and naphthaleneacetic acid (NAA, 0.1 mg/L), 100 mg/L kanamycin and 320 mg/L ticarcillin. Explants were sub-cultured onto fresh selection medium at 14-day intervals. Putative transgenic shoots were cultured on rooting media containing MS salts with vitamins (half strength), supplemented with 1 mg/L indole-3-butyric acid, 0.5% sucrose, 0.3% Gelrite, 100 mg/L kanamycin and 320 mg/L ticarcillin. Plantlets were transferred to sterile peat blocks (Jiffy7) in MagentaTM vessels (Sigma), before being transplanted to peat-based compost (90% peat, 10% grit, 4 kg/m3 dolomitic limestone, 0.75 kg/m3 compound fertilizer (PG Mix™), 1.5 kg/m3 controlled-release fertilizer (Osmocote Bloom)) and transferred to a glasshouse.

Cas9-mediated targeted mutagenesis using RNA viruses and mobile sgRNAs

A TRV2 plasmid vector SPDK3888 (Addgene #149276) was gratefully received from Savithramma Dinesh-Kumar and Dan Voytas. This was modified by adding AarI restriction sites and a lacZ cassette for selection producing pEPQD0KN0750 (Addgene#185627). Spacer sequences for selected targets were incorporated into pEPQD0KN0750 by Golden Gate assembly into AarI sites as previously described38 (Supplementary Table 10). Constructs were transformed into A. tumefaciens GV3101 and an individual colony was used to inoculate LB medium containing 50 μg/mL rifampicin, 20 μg/mL gentamicin, and 50 μg/mL kanamycin and grown overnight shaking at 28 °C. Saturated cultures were centrifuged and resuspended in infiltration medium as described above except that cultures were diluted to 0.3 OD600nm and Agrobacterium strains were mixed in equal ratio with a strain containing pTRV1 (Addgene #148968). Seed of transgenic N. benthamiana plants constitutively expressing Cas9 (Cas9 Benthe 193.22 T5 Homozygous) were gratefully received from Dan Voytas and grown for 6 weeks in a greenhouse at 22 ± 2 °C. Plants were infiltrated as described above and allowed to grow for 13 weeks before samples of leaf tissue were taken from two different stems (designated A and B).

Identification of lines with induced mutations

Samples of 40–60 mg leaf tissue were collected from T0 plants generated by agrobacterium-mediated stable transformation or from leaves sampled from two stems of plants infiltrated with RNA viruses expressing mobile sgRNAs. Genomic DNA was isolated as previously described45. Target loci were amplified using a proof-reading polymerase (Q5® High-Fidelity DNA Polymerase, New England Biolabs, Ipswich, MA) and primers flanking the target sites (Supplementary Table 11). Amplicons were sequenced by Sanger sequencing (Eurofins, Luxembourg). Amplicons with multiple peaks that suggested the presence of either genetic chimerism, heterozygous or biallelic mutations, were resolved by cloning amplicons into Zero Blunt™ TOPO™ (Thermo Fisher, Waltham, MA) or pGEM®-T Easy (Promega, Madison, WI) followed by sequencing of plasmids isolated from 10–15 colonies. For plants generated by agrobacterium-mediated stable transformation, the T-DNA copy number of T0 plants was estimated by ddPCR as previously described90 using primers to nptII and the single-copy reference gene Rdr191. T1 seeds from single copy plants with homozygous, heterozygous or biallelic mutations at target locations were collected and grown for subsequent analyses. For plants generated using RNA viruses and mobile sgRNAs, seeds (designated as E1) were harvested from individual pods at the distal end of stems in which mutations were detected. T1 and E1 were grown, and the genotype of each target loci was confirmed as above.

Statistics and reproducibility

A minimum of n = 3 biological replicates were used for transcriptome analysis, assessment of derivative peaks in mutated plants and quantification of yield. Key experiments including the quantification of 7-deoxyloganic acid (7-DLA) and strictosidine were repeated such that n = 6 or n = 11. Changes in yield were analyzed using one-way ANOVA with post-hoc Tukey HSD. No data were excluded. The arrangement of samples agroinfiltrated into leaves was randomized across leaves and across plants within a given experiment. For identification of derivatives and quantification of metabolite products the experimental design, leaf infiltration and sample collection were performed by one researcher and samples were blindly analyzed by another researcher.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.