The reconstruction and biochemical characterization of ancestral genes furnish insights into the evolution of terpene synthase function in the Poaceae

Key Message Distinct catalytic features of the Poaceae TPS-a subfamily arose early in grass evolution and the reactions catalyzed have become more complex with time. Abstract The structural diversity of terpenes found in nature is mainly determined by terpene synthases (TPS). TPS enzymes accept ubiquitous prenyl diphosphates as substrates and convert them into the various terpene skeletons by catalyzing a carbocation-driven reaction. Based on their sequence similarity, terpene synthases from land plants can be divided into different subfamilies, TPS-a to TPS-h. In this study, we aimed to understand the evolution and functional diversification of the TPS-a subfamily in the Poaceae (the grass family), a plant family that contains important crops such as maize, wheat, rice, and sorghum. Sequence comparisons showed that aside from one clade shared with other monocot plants, the Poaceae TPS-a subfamily consists of five well-defined clades I–V, the common ancestor of which probably originated very early in the evolution of the grasses. A survey of the TPS literature and the characterization of representative TPS enzymes from clades I–III revealed clade-specific substrate and product specificities. The enzymes in both clade I and II function as sesquiterpene synthases with clade I enzymes catalyzing initial C10-C1 or C11-C1 ring closures and clade II enzymes catalyzing C6-C1 closures. The enzymes of clade III mainly act as monoterpene synthases, forming cyclic and acyclic monoterpenes. The reconstruction and characterization of clade ancestors demonstrated that the differences among clades I–III were already present in their ancestors. However, the ancestors generally catalyzed simpler reactions with less double-bond isomerization and fewer cyclization steps. Overall, our data indicate an early origin of key enzymatic features of TPS-a enzymes in the Poaceae, and the development of more complex reactions over the course of evolution. Electronic supplementary material The online version of this article (10.1007/s11103-020-01037-4) contains supplementary material, which is available to authorized users.


Introduction
Terpenes are a structurally diverse group of natural products that are ubiquitous in plants and many other organisms. They are involved in basic physiological processes, Electronic supplementary material The online version of this article (https ://doi.org/10.1007/s1110 3-020-01037 -4) contains supplementary material, which is available to authorized users. but also play important roles in plant-plant and plant-insect interactions. Many plants, for example, produce and release volatile terpenes in response to insect herbivory to attract natural enemies of their herbivores (Unsicker et al. 2009). Infection with plant pathogens can also induce the formation of terpenes, which in turn act as phytoalexins (Block et al. 2019). Until now, more than 40,000 terpenes have been described (Tholl 2015). This structural diversity can mainly be attributed to terpene synthases (TPS), a class of enzymes that catalyze the conversion of prenyl diphosphates into the different terpene skeletons (reviewed in Degenhardt et al. 2009;Karunanithi and Zerbe 2019). Most monoterpene synthases accept the precursor geranyl diphosphate (GPP) as substrate and produce monoterpenes (C 10 ), while most sesquiterpene synthases convert (E,E)-farnesyl diphosphate (FPP) into sesquiterpenes (C 15 ), and diterpene synthases catalyze the conversion of geranylgeranyl diphosphate (GGPP) into the different diterpenes (C 20 ). Interestingly, a number of terpene synthases have been described to act also on cisprenyl diphosphates such as neryl diphosphate, (Z,Z)-FPP, and nerylneryl diphosphate (Schilmiller et al. 2009;Sallaud et al. 2009;Zi et al. 2014). The terpenes formed may exert a variety of biological functions (Unsicker et al. 2009;Block et al. 2019). They may also act as substrates for modifying enzymes such as cytochrome P450 monooxygenases, O-methyltransferases, and acyltransferases (Dudareva et al. 2004;Degenhardt et al. 2009;Bathe and Tissier 2019).
Although terpene synthases often have broad substrate specificity and accept GPP, FPP, and GGPP in vitro, their in planta function may be narrower due to their subcellular localization (Pazouki and Niinemetz 2016). Monoterpene synthases and diterpene synthases typically contain N-terminal signal peptides and are transported into plastids, the site for GPP and GGPP, but not FPP production. Sesquiterpene synthases, however, are usually found in the cytosol, a site of FPP, but not GPP nor GGPP production ). There is increasing evidence for an exchange of prenyl diphosphates between the different compartments of a plant cell, especially under stress conditions (e.g. Gutensohn et al. 2013;Dong et al. 2016). Thus, the subcellular localization of TPS enzymes as well as the exchange of TPS substrates between compartments can determine the in planta function of terpene synthases with broad substrate specificity (Pazouki and Niinemetz 2016).
Based on sequence similarity and phylogenetic relationships, terpene synthases can be divided into eight subfamilies, TPS-a to TPS-h Chen et al. 2011). The subfamilies TPS-a, TPS-b, and TPS-g comprise predominantly monoterpene synthases and sesquiterpene synthases of angiosperms, while the gymnosperm monoterpene and sesquiterpene synthases belong to the TPS-d subfamily. The subfamilies TPS-c and TPS-e/f mainly comprise diterpene synthases from angiosperms and gymnosperms, and the subfamily TPS-h contains diterpene synthases specific for the Lycopodiophyta (clubmosses) Chen et al. 2011;Karunanithi and Zerbe 2019). Angiosperm plants typically possess 20-60 TPS genes, most of which belong to the TPS-a subfamily (Chen et al. 2011).
The reaction mechanism catalyzed by terpene synthases starts with the formation of a carbocation intermediate that is either derived by a metal ion-dependent ionization of the prenyl diphosphate substrate (class I TPS) or by protonation of a double bond in the prenyl diphosphate tail (class II TPS) ). While most if not all monoterpene synthases and sesquiterpene synthases belong to class I TPS, diterpene synthases can be found in both classes. After its formation, the carbocation intermediate can undergo a complex cascade of different cyclizations, hydride shifts, and skeleton rearrangements until the reaction is terminated by a deprotonation or addition of a nucleophile such as water ). Many terpene synthases are multiproduct enzymes that produce complex mixtures of compounds (Steele et al. 1998;Garms et al. 2010;Irmisch et al. 2014). However, the individual components of the mixture often share the same terpene skeleton or at least belong to structurally related skeleton types, the formation of which is mainly determined by the first cyclization of the reaction cascade (e.g. Köllner et al. 2004;Garms et al. 2010;Irmisch et al. 2012). The maize sesquiterpene synthase TPS4, for example, produces a mixture of more than 22 compounds that are almost all derived by an initial C6-C1 ring closure (numbering as for FPP) of the (Z,E)-farnesyl carbocation (Köllner et al. 2004). In contrast, all products formed by MrTPS2, a terpene synthase found in Matricaria recutita, are derived by an initial C11-C1 closure of the (E,E)farnesyl carbocation (Irmisch et al. 2012;Hong et al. 2013). Although the basic phylogeny of the TPS family, including division into subfamilies, is well accepted, evolutionary relationships among many of the reaction types are still not well understood.
As part of our ongoing effort to investigate the formation of terpenes in the grass family (Poaceae), we aimed to understand the evolution of catalytic function in the TPSa enzymes of this plant family. In the Poaceae, the TPS-a subfamily dominates the TPS gene family in terms of absolute numbers of genes (Chen et al. 2011) and includes a large number of previously characterized TPS enzymes. Ancestral sequences of the Poaceae TPS-a subfamily were reconstructed, synthesized, and heterologously expressed in Escherichia coli. Recombinant enzymes were incubated with the potential substrates GPP and (E,E)-FPP, and reaction products were analyzed using gas chromatography-mass spectrometry (GC-MS). Our data showed that many of the distinct features of specific clades of the Poaceae TPS-a subfamily arose early in grass evolution, but the reactions catalyzed have become more complex with time.

Most of the TPS-a genes in the Poaceae belong to five well-defined clades
In order to identify TPS-a genes from the Poaceae, we constructed a phylogenetic tree of terpene synthase genes that were extracted using BLASTP analysis from six grass species, including Zea mays, Sorghum bicolor, Setaria italica, Panicum virgatum, Oryza sativa, and Brachypodium distachyon, and a number of other monocotyledonous and dicotyledonous species available in the NCBI database and the Phytozome 9.1 database (Supplemental Figure S1). In accordance with the literature (Chen et al. 2011), all obtained TPS genes clustered into one of the established TPS subfamilies TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g. The TPS-a genes formed two groups, one containing genes from dicotyledonous species and the other containing genes of monocotyledonous species (Supplemental Figure S1, Chen et al. 2011). A deeper dendrogram analysis of the identified monocotyledonous TPS-a genes revealed five well-defined monophyletic clades, I-V, each of which possessed exclusively TPS-a genes from the Poaceae (Fig. 1). Two genes from P. virgatum were similar to clade IV and clade V sequences, but clustered separately. Four other Poaceae TPS-a genes, from rice, sorghum, and bamboo, were found to form a very basal clade among the monocotyledonous TPS-a subfamily, while TPS genes from families outside the Poaceae clustered separately from the five Poaceae clades (Fig. 1).

The Poaceae TPS-a genes in clades I-V encode enzymes with distinct catalytic features
A literature survey of the properties of the Poaceae TPSa encoded enzymes characterized to date revealed that clades I-V have distinctive catalytic features (Table 1). All previously characterized clade I enzymes are sesquiterpene synthases that accept (E,E)-FPP as substrate and, with two exceptions, catalyze an initial C10-C1 or C11-C1 closure (Fig. 2). All characterized clade II enzymes are also sesquiterpene synthases, but produce either acyclic terpenes or cyclic terpenes derived from an initial C6-C1 closure. In contrast to the sesquiterpene synthases of clades I and II, characterized clade III enzymes mainly accept GPP as substrate and produce monoterpenes derived from an initial C6-C1 closure (Fig. 2). Moreover, most of the clade III sequences are predicted to contain a transit peptide for plastid localization, indicating that these enzymes likely function as monoterpene synthases in planta. The terpene synthases in clades IV and V, however, are diverse in terms of substrate specificity and initial cyclization reaction. The characterized enzymes in these clades form monoterpenes or sesquiterpenes, and catalyze initial C6-C1, C10-C1, or C11-C1 closures ( Table 1).

Characterization of TPS-a enzymes from maize and Brachypodium distachyon
To provide more information about the differences among these Poaceae TPS-a clades, we characterized three previously unstudied TPS-a genes from maize and two TPS-a genes from B. distachyon, a species from which no TPS has been characterized so far. The selected terpene synthases belong to clade I (maize ZmTPS20-Del and ZmTPS22-Del, B. distachyon Bradi3g14710), clade II (B. distachyon Bradi3g15956), and clade III (maize ZmTPS15-B73). Genes were cloned and heterologously expressed in E. coli, and partially purified proteins were incubated with the potential substrates GPP and (E,E)-FPP in the presence of 10 mM Mg 2+ as cofactor. Enzyme products were analyzed using GC-MS. The three clade I enzymes ZmTPS20-Del, ZmTPS22-Del, and Bradi3g14710 showed exclusively sesquiterpene synthase activity and produced sesquiterpenes derived by either an initial C10-C1 closure or an initial C11-C1 closure, corresponding to the properties of clade I enzymes in the literature. ZmTPS20-Del produced mainly germacrene A (C10-C1 closure), which appeared as the thermal rearrangement product β-elemene in the traces of the GC chromatograms ( Fig. 3; Supplemental Figure S2). ZmTPS22-Del produced a complex mixture of sesquiterpenes with the bicyclic β-copaene (C10-C1 closure) as the major component (Fig. 3). In contrast to ZmTPS20-Del and ZmTPS22-Del, Bradi3g14710 had higher product specificity and formed (E)-β-caryophyllene (C11-C1 closure) with trace amounts of α-humulene and germacrene A (Fig. 3). Monoterpene synthase activity with GPP was not observed for any of the three enzymes. The clade II enzyme Bradi3g15956 converted (E,E)-FPP into β-macrocarpene and γ-macrocarpene, two bicyclic sesquiterpenes both derived by an initial C6-C1-closure (Fig. 3). GPP, however, was not accepted by Bradi3g15956 as substrate. Maize ZmTPS15-B73 showed no activity when provided with (E,E)-FPP. However, it converted GPP into a mixture of monoterpenes dominated by the bicyclic alcohol 1,8-cineole (C6-C1 closure) (Fig. 3). Subcellular localization prediction using three different algorithms revealed the presence of an N-terminal transit peptide for plastid localization (Supplemental Figure S3), indicating that ZmTPS15-B73 likely acts as a monoterpene synthase in maize plastids.

The reconstruction and characterization of TPS-a ancestor enzymes
Most of the enzymes in clades I-III share clade-specific catalytic features, catalyzing either a common initial cyclization or having the same substrate specificity (Table 1; Fig. 2). To test whether these features were already pronounced in the ancestors of the different clades, we reconstructed the respective ancestor sequences, expressed the synthesized genes in E. coli, and tested the recombinant enzymes obtained with GPP and (E,E)-FPP. The putative clade I ancestor showed only sesquiterpene synthase activity and produced mainly germacrene A (C10-C1) and (E)-βcaryophyllene (C11-C1) (Figs. 2 and 4, Supplemental Figure  S2). The putative clade II ancestor accepted both GPP and (E,E)-FPP as substrate and produced mixtures of monoterpenes and sesquiterpenes. While the monoterpene blend was dominated by the acyclic myrcene, the major sesquiterpenes were identified as β-bisabolene and (E)-γ-bisabolene, which are both formed via an initial C6-C1 closure (Figs. 2 and 4). In contrast to the putative ancestors of clade I and clade II, the clade III ancestor, which was expressed without the  N-terminal signal peptide, possessed exclusively monoterpene synthase activity and produced the cyclic limonene (C6-C1) as its major product (Figs. 2 and 4).

Discussion
Many important crops such as maize, wheat, rice, and sorghum belong to the Poaceae, a plant family known to be rich in terpenes. Terpenes play important roles in plant defense and their biosynthesis has been intensively investigated during the last three decades ). In this study, we aimed to understand the functional evolution of the TPS-a subfamily in the grasses by reconstructing and characterizing ancestral TPS enzymes and comparing their catalytic activities to those of modern terpene synthases. The Poaceae family consists of three major lineages, the PACMAD (Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, Danthonioideae) lineage, the BEP (Bambusoideae, Ehrhartoideae, Pooideae) lineage, and the basal Anomochlooideae (Strömberg 2011, Fig. 5). Fossil records indicate that the common ancestor of the grasses existed more than 70 Ma in the Late Cretaceous, and that the PACMAD and BEP lineages evolved between 70 and 50 Ma (Strömberg 2011). A phylogenetic analysis of TPS-a genes from different monocotyledonous species revealed five monophyletic clades I-V that exclusively contain genes from the Poaceae (Fig. 1). Since all of the five TPS-a clades contain both PACMAD and BEP genes (Table 1; Supplemental Figure S4), it is likely that the direct ancestors of the five clades evolved by at least four consecutive gene duplication events from a common TPS-a progenitor prior to the separation of the two lineages (Fig. 5).
A survey of the TPS literature and characterization of representative TPS in this study revealed that most of the enzymes in clades I, II, IV, and V are sesquiterpene synthases, while clade III possesses predominantly monoterpene synthases (Table 1; Fig. 3). It is thus tempting to speculate MT monoterpene synthase activity  that the TPS-a progenitor in the early grasses was a sesquiterpene synthase. In this scenario, the ancestor of clade III had to evolve monoterpene synthase activity, presumably by the accumulation of mutations leading to a decreased active site pocket that no longer allowed binding of the FPP substrate, and gained an N-terminal signal peptide for transport to the plastid, where GPP is produced. Indeed, the reconstruction of this ancestor based on the modern clade III sequences and its functional characterization showed that the enzyme was able to convert GPP into monoterpenes but not (E,E)-FPP into sesquiterpenes (Fig. 4). Interestingly, a few enzymes in clades IV and V have also been described as monoterpene synthases (Table 1), indicating multiple evolution of this enzymatic function within the TPS-a subfamily. OsTPS19 and OsTPS20 from rice, for example, belong to clade V and were found to be localized in plastids. The manipulation of OsTPS19 gene expression in transgenic rice clearly confirmed its function as a monoterpene synthase in planta (Chen et al. 2018). However, the subcellular localization of other putative monoterpene synthases in clades β-copaene (12) γ-macrocarpene (21) Fig. 3 Characterization of selected terpene synthases from the Poaceae TPS-a clades I-III. Genes were heterologously expressed in Escherichia coli and partially purified proteins were incubated with (E,E)-FPP (TPS20-Del, TPS22-Del, Bradi3g14710, Bradi3g15956) or GPP (TPS15-B73). TPS reaction products were collected from the headspace of the enzyme assays using a solid phase microextraction (SPME) fiber and analyzed by gas chromatography-mass spectrometry. Total ion current chromatograms are shown. 1, β-elemene; 2, (E)-β-caryophyllene; 3, α-humulene; 4, unidentified sesquiterpene hydrocarbon1; 5, unidentified sesquiterpene hydrocarbon2; 6, unidentified sesquiterpene alcohol1; 7, δ-elemene; 8, unidentified sesquiterpene hydrocarbon3; 9, α-copaene; 10, β-cubebene; 11, β-ylangene; 12, β-copaene; 13, γ-elemene; 14, germacrene D; 15, α-muurolene; 16, cubebol; 17, δ-cadinene; 18, (E)-β-farnesene; 19, unidentified sesquiterpene hydrocarbon4; 20, β-macrocarpene; 21, γ-macrocarpene; 22, myrcene; 23, limonene; 24, 1,8-cineole; 25, (E)-β-ocimene; 26, linalool. Structures of major TPS products are shown IV and V is rather unclear. Some of them do not appear to contain signal peptides (Muchlinski et al. 2019) and, in contrast to clade III monoterpene synthases that only accept GPP as substrate (Lin et al. 2008;Muchlinski et al. 2019;Fig. 3), they often display both monoterpene and sesquiterpene synthase activity in vitro. Thus, a reliable prediction of their function in vivo is still not possible.
The Poaceae Tps-a clades differ among each other in the type of reactions that they catalyze. Nearly all monoterpene and sesquiterpene synthases are class I TPS enzymes, which catalyze a metal ion-dependent ionization of the substrate by abstracting the pyrophosphate moiety ; Fig. 2). The first reaction intermediate formed from GPP by monoterpene synthases is the geranyl cation. It can be deprotonated or captured by water addition to form acyclic monoterpenes, but due to conformational constraints can only be cyclized after isomerization to the linalyl cation, C1 of which is located near to the C6-C7 double bond (Fig. 2). The ionization of (E,E)-FPP by sesquiterpene synthases results in the formation of the (E,E)-farnesyl cation, which can undergo direct C10-C1 or C11-C1 closures to yield 10-membered or 11-membered rings, respectively. Alternatively, the isomerization of the C2-C3 double bond of the (E,E)-farnesyl cation leads to the tertiary (Z,E)-farnesyl cation, in direct analogy with the isomerization of GPP to the linalyl cation. The (Z,E)-farnesyl cation can then undergo monoterpene products sesquiterpene products clade I ancestor Fig. 4 Proteins encoded by putative ancestors to Poaceae TPS-a clades have terpene synthase activity in vitro. Reconstructed TPS ancestor sequences of clades I, II, and III of the Poaceae TPS-a genes were synthesized, cloned, and heterologously expressed in Escherichia coli. Partially purified proteins were incubated with the potential substrates GPP and (E,E)-FPP. TPS reaction products were collected from the headspace of the enzyme assays using a solid phase microextraction (SPME) fiber and analyzed by gas chromatography-mass spectrometry. Total ion current chromatograms are shown. 1, β-phellandrene; 2, myrcene; 3, limonene; 4, (E)-β-ocimene; 5, terpinolene; 6, linalool; 7, α-terpineol; 8, α-pinene; 9, camphene; 10, terpinen-4-ol; 11, β-elemene; 12; (E)-β-caryophyllene; 13, α-humulene; 14, germacrene D; 15, 7-epi-sesquithujene; 16, sesquithujene; 17, (Z)-α-bergamotene; 18, (E)-α-bergamotene; 19, (E)-βfarnesene; 20, sesquisabinene; 21, zingiberene; 22, β-bisabolene; 23, β-sesquiphellandrene; 24, (E)-γ-bisabolene; 25, (Z)-α-bisabolene cyclization involving either the central or distal double bond forming 1,6-, 1,7-, 1,10-, or 1,11-products ). Although the cyclic mono-and sesquiterpene cations formed in this way can be modified by further ring closures, ring contractions, and other rearrangements, the basic skeleton type is determined by the initial cyclization and isomerization reactions ). Interestingly, the different TPS-a enzymes in clades I, II, and III catalyze distinct and clade-specific initial reaction steps. With one exception, all clade I enzymes characterized to date produce sesquiterpenes derived from an initial C10-C1 or C11-C1 closure that does not require isomerization of the farnesyl cation ( Fig. 2; Table 1). In contrast, clade II and clade III enzymes catalyze the isomerization of the farnesyl cation and the geranyl cation, respectively, prior to a C6-C1 ring closure. The reconstructed ancestors of clades I, II, and III were shown to catalyze the same basic initial reactions as the present day members of the clade (Figs. 2  and 4), indicating that the key enzymatic features of the modern TPS-a enzymes evolved early in the evolution of the grasses. However, in contrast to the modern enzymes, the reconstructed TPS ancestors formed simpler structures that are mainly derived by a deprotonation of the first cyclic cation (Fig. 2). Present day enzymes typically catalyze reaction cascades with additional steps, such as the isomerization of carbon-carbon double bond in the initial cation to allow alternate ring closures or additional cyclizations. The structural diversity of terpenes formed in today's grasses likely evolved later in the grass evolution within single subfamilies, genera, or even species. Successive gene duplications of the respective clade ancestors and the subsequent accumulation of mutations led to the multitude of modern TPS-a enzymes, many of which catalyze more complex reactions than the ancestor. Although it is universally accepted that evolution of natural product biosynthesis has led to the formation of more and more complex structures, this process has rarely been documented at the level of a specific enzyme and plant group.

Plant material
Seeds of the maize (Zea mays L.) inbred line B73 were provided by KWS seeds (Einbeck, Germany), and seeds of the maize hybrid variety Delprim were obtained from Delley Samen und Pflanzen (Delley, Switzerland). Plants were grown in commercially available potting soil in a climate-controlled chamber with a 16 h photoperiod, 1 mmol (m 2 ) −1 s −1 of photosynthetically-active radiation, a temperature cycle of 22 ºC/18 ºC (day/night), and 65% relative humidity. Ten day old seedlings were harvested and immediately frozen in liquid nitrogen. Seeds of the Brachypodium distachyon inbred diploid accession Bd21.3 were obtained from the National Plant Germplasm System, US Department of Agriculture. Seeds were sowed into pots and placed at 4 °C in the dark for 2 days, and then the pots were transferred

Gene identification, sequence analysis, and dendrogram analysis
Using poplar TPS9 (Irmisch et al. 2014) as a query, we conducted a TBLASTN analysis against 6 Poaceae gene sets (Zea mays, Sorghum bicolor, Setaria italica, Panicum virgatum, Oryza sativa, Brachypodium distachyon) and 27 angiosperm gene sets (Aquilegia coerulea, Arabidopsis lyrata, A. thaliana, Brassica rapa, Capsella rubella, Carica papaya, Citrus clementina, Citrus x sinensis, Cucumis sativus, Eucalyptus grandis, Eutrema salsugineum, Fragaria vesca, Glycine max, Gossypium raimondii, Linum usitatissimum, Malus x domestica, Manihot esculenta, Medicago truncatula, Mimulus guttatus, Phaseolus vulgaris, Populus trichocarpa, Prunus persica, Ricinus communis, Solanum lycopersicum, S. tuberosum, Theobroma cacao, Vitis vinifera) available in the Phytozome 9.1 database (phytozome.jgi. doe.gov). In addition, the NCBI database (www.ncbi.nlm. nih.gov) was similarly screened for TPS genes from monocotyledonous species outside the Poaceae (species of Arecales, Asparagales, Liliales, and Zingiberales). Sequences longer than 1499 bp were considered as full-length genes and subjected to phylogenetic analysis. For the estimation of a phylogenetic tree, we used the MUSCLE (codon) algorithm (gap open, − 2.9; gap extend, 0; hydrophobicity multiplier, 1.2; clustering method, UPGMB) implemented in MEGA6 (Tamura et al. 2011) to compute a nucleotide codon alignment with the 932 obtained full-length TPS sequences. Based on the MUSCLE alignment, the tree was reconstructed with MEGA6 using a maximum likelihood algorithm (model/method, General Time Reversible model; substitutions type, nucleotide; rates among sites, gamma distributed with invariant sites (G + I); gamma parameters, 5; gaps/missing data treatment, partial deletion; site coverage cutoff, 50%). Guided by the resulting TPS tree, 163 out of the 932 sequences could be identified as monocotspecific TPS-a2 genes and were subjected to deeper phylogenetic analysis using the same methods and parameters as described above, except with a site coverage cutoff of 80%. A bootstrap resampling analysis with 1000 replicates was performed to evaluate the topology of the tree. FASTA files are available as Supplemental data set 1 (angiosperm TPS) and Supplemental data set 2 (Poaceae TPS).

Ancestor reconstruction
Codon alignments of characterized TPS-a genes from the clades I, II, and III were created using the PRANK algorithm provided by the Guidance web-based interface (https :// guida nce.tau.ac.il/) (MSA scores are given in Supplemental  Table S1; FASTA files are available as Supplemental data sets 3 (clade I), 4 (clade II), and 5 (clade III)). The PRANK alignments were loaded into MEGA6 and model tests were performed to search for the optimal substitution models and parameters. Maximum likelihood (ML) trees (Supplemental Figures S5-S7) were generated with MEGA6 using the models and parameters given in Supplemental Table S1. All sites of the sequences were considered for the generation of the trees. The most likely protein ancestor sequences were extracted from the ML trees using the program ExtAncSeq-MEGA (Hall 2011) (for accuracy scores see Supplemental Table S1).

Cloning and gene synthesis
Total RNA was isolated using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Single-stranded cDNA was synthesized from total RNA using SuperScript™ III Reverse Transcriptase (Invitrogen, Carlsbad, CA) and oligo(dT) primers. The complete open reading frames (ORF) of ZmTPS20-Del, ZmTPS22-Del, and ZmTPS15-B73 from maize were amplified from cDNA and cloned into the sequencing vector pCR4-TOPO (Invitrogen). After complete sequencing, ZmTPS20-Del and ZmTPS22-Del were subcloned as BsaI fragments into the expression vector pASK-IBA7 (IBA-GmbH, Göttingen, Germany). Before subcloning, an internal BsaI restriction site in ZmTPS22-Del was mutated by sitedirected mutagenesis as described in Köllner et al. 2006. The ORF of ZmTPS15-B73 lacking the first 68 codons, which encode the N-terminal signal peptide, was subcloned into the expression vector pET100/D-TOPO (Invitrogen). The complete ORFs of Bradi3g14710 and Bradi3g15956 from Brachypodium distachyon were amplified from cDNA and cloned into the expression vector pEXP5-CT/TOPO (Invitrogen). The reconstructed ancestor sequences of clades I-III were synthesized as codon-optimized genes and subsequently cloned into the expression vector pET100/D-TOPO. The N-terminal signal peptide (33 amino acids) of the clade III ancestor was truncated before heterologous expression. All primers used for amplification, subcloning, and mutagenesis are listed in Supplemental Table S2. Sequences were deposited in GenBank (https ://www. ncbi.nlm.nih.gov) with the accession numbers MT294265 (ZmTPS20-Del), MT294266 (ZmTPS22-Del), MT294267 (ZmTPS15-B73), MT294268 (Bradi3g14710), MT294269 (Bradi3g15956), MT294270 (clade I ancestor), MT294271 (clade II ancestor), and MT294272 (clade III ancestor). The codon-optimized sequences of the TPS ancestors are given in Supplemental Figure S8.

Heterologous expression of TPS genes in Escherichia coli
For heterologous expression in E. coli, pASK-IBA7-constructs were introduced into the strain TOP10 (Invitrogen), while pET100/D-TOPO-constructs were expressed in the strain BL21 (DE3). Liquid cultures of the bacteria harboring the expression constructs were grown at 37 °C to an OD 600 of 0.6. Expression of the recombinant proteins from pET100/D-TOPO constructs in BL21 (DE3) was induced by addition of isopropyl-b-thiogalactopyranoside to a final concentration of 1 mM, and the expression of pASK-IBA7 constructs in TOP10 cells was induced with 200 mg/l of anhydrotetracycline (IBA, Göttingen, Germany). Cultures were incubated for 20 h at 18 °C and cells were collected by centrifugation and disrupted by a 4 × 30 s treatment with a sonicator (Bandelin UW2070, Berlin, Germany) in chilled extraction buffer (50 mM Tris-HCl, pH 7.5, with 5 mM dithiothreitol and 10% (v/v) glycerol). The cell fragments were removed by centrifugation at 14,000 g and the supernatant was desalted into assay buffer (10 mM Tris-HCl, pH 7.5, 1 mM dithiothreitol, 10% (v/v) glycerol) by passage through a Econopac 10DG column (BioRad, Hercules, CA, USA).

TPS enzyme assays
To determine the catalytic activity of the different terpene synthases and the reconstructed TPS ancestors, enzyme assays containing 40 μl of the bacterial extract and 60 µl assay buffer with 10 μM (E,E)-FPP or GPP and 10 mM MgCl 2 , in a Teflon-sealed, screw-capped 1 ml GC glass vial were performed. A SPME (solid phase microextraction) fiber consisting of 100 µm polydimethylsiloxane (Supelco, Bellefonte, PA, USA) was placed into the headspace of the vial for 0.5 h incubation at 30 °C. For analysis of the adsorbed reaction products, the SPME fiber was directly inserted into the injector of the gas chromatograph.

Terpene synthase product analysis
Qualitative analysis of TPS enzyme products was conducted using an Agilent 6890 Series gas chromatograph coupled to an Agilent 5973 quadrupole mass selective detector (interface temp.: 270 °C; quadrupole temp.: 150 °C, source temp.: 230 °C, electron energy: 70 eV). The constituents of the product blends were separated with a DB-5MS column (Agilent, Santa Clara, CA, USA, 30 m × 0.25 mm × 0.25 µm) and helium (1 ml min −1 ) as carrier gas. The SPME fiber was injected without split at an initial oven temperature of 60 °C. The temperature was held for 2 min and then increased to 220 °C with a gradient of 7 °C min −1 , followed by a further increase to 300 °C with 60 °C min −1 and a hold for 3 min. Compounds were identified by comparison of retention times and mass spectra to those of authentic standards obtained from Fluka (Seelze, Germany), Roth (Karlsruhe, Germany), Sigma (St, Louis, MO, USA), or Bedoukian (Danbury, CT, USA), or by reference spectra in the Wiley and National Institute of Standards and Technology libraries.