The terpene synthase genes of Melaleuca alternifolia (tea tree) and comparative gene family analysis among Myrtaceae essential oil crops

Terpene synthases (TPS) are responsible for the terminal biosynthetic step of terpenoid production. They are encoded by a highly diverse gene family believed to evolve by tandem duplication in response to adaptive pressures. Taxa in the Myrtaceae family are renowned for their diversity of terpenoid-rich essential oils, and among them, the tribe Eucalypteae has the largest TPS gene family found in any plant (> 100 TPS). In this study, comparative analysis of Melaleuca alternifolia (tea tree), from the related tribe Melaleuceae, revealed some Myrtaceae have smaller TPS families, as a total of 58 putatively functional full-length TPS genes, and 21 pseudogenes were identified by manual annotation of a newly released long-read assembly of the genome. The TPS-a and TPS-b2 subfamilies that synthesise secondary compounds often mediating plant-environment interactions were more diminutive than those in eucalypts, probably reflecting key differences in the evolutionary histories of the two lineages. Of the putatively functional TPS-b1, 13 clustered into a region of around 400 kb on one scaffold. The organisation of these TPS suggested that tandem duplication was instrumental in the evolution and diversity of terpene chemistry in Melaleuca. Four TPS-b1 likely to catalyse the synthesis of the three monoterpenoid components that are used to classify tea tree chemotypes were encoded within a single small region of 87 kb in the larger cluster of TPS-b1, raising the possibility that coregulation and linkage may lead to their behaviour as a single locus, providing an explanation for the categorical inheritance of complex multiple-component chemotypes in the taxon.


Introduction
Terpenes and terpenoids are hydrocarbon-based substances produced by plants, fungi, bacteria, and insects (Chen et al. 2011;Schmidt-Dannert 2015;Yamada et al. 2015;Beran et al. 2016). In plants, all terpenoids are derived from the isomeric 5-carbon precursors isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP), which can be synthesised in two distinct pathways that are both well understood (see Online Resource 1 and Vranová et al. (2013) and Tholl (2015)). They are a structurally diverse group of compounds that can be categorised based on the number of carbon atoms they contain, i.e. hemiterpenoids (C5), monoterpenoids (C10), sesquiterpenoids (C15), and diterpenoids (C20) (Sell 2010). Many terpenoids are primary metabolites such as sterols, which are incorporated into membranes, abscisic acid, or gibberellins, which play essential roles as phytohormones, carotenoids that are pigments for photosynthesis (Yu and Utsumi 2009), and phytol, which is a component of chlorophyll (Gutbrod et al. 2019). Other mono-and sesquiterpenoids are secondary metabolites, which are not vital for survival and are only represented in some groups of plants. Nonetheless, these volatile terpenoids are important as mediators between a plant and its biotic and abiotic environment, facilitating direct and indirect defences against herbivores or pathogens (Block et al. 2019), or attracting 1 3 13 Page 2 of 19 pollinators (Pichersky and Gershenzon 2002). They have been characterised extensively in many plant species and are the largest group of plant secondary metabolites (Dudareva et al. 2004). The structures for more than 40,000 terpenoids are already known (Fürstenberg-Hägg et al. 2013), and large variation in the terpenoid composition is observed among and within plant species (Moore et al. 2014). The main driver for the high diversification of terpenoid substances is most likely herbivore and pathogen specialisation. In 1964, Ehrlich and Raven proposed plant and insect coevolution as the main factor contributing to the development of highly diverse plant secondary substances (Ehrlich and Raven 1964). Hence, the continuous interaction between herbivores and plants led to stepwise adaptations by selection, resulting in the vast array of plant secondary compounds we see today and a similarly wide array of insect mechanisms to deal with these plant defences (Ryan and Byrne 1988;Keane and Ryan 1999;Yu 2008). Particularly long-lived plants, such as trees or woody perennials, are believed to be rich in terpenoids because they have to defend themselves against rapidly evolving pathogens and insect herbivores over a long period of time.
Comparative studies among species within plant families have provided insight into the importance of terpenoids in the evolution and chemical ecology of plants. The final steps of terpenoid biosynthesis are primarily catalysed by a class of enzymes called terpene synthases (TPS). They can produce an extensive variety of terpenoids from a few common substrates, with many enzymes being capable of synthesising different products from a single substrate (Schwab 2003). Within genomic sequences, the enzymes are coded by members of the TPS gene family. Comparative genomics allowed both the identification of conserved regions in genes among species that are suggestive of shared functional importance, as well as regions of divergence that may indicate rapidly evolving sequences subject to adaptative evolution (Strachan and Read 2011). It has been shown, for example that TPS genes can be classified based on their intron-exon organisation on the genomic level (Trapp and Croteau 2001), and that all angiosperm and gymnosperm terpene synthases have highly similar structural features (Cao et al. 2010;Chen et al. 2011). This coincides with the observation that overall amino acid sequences in TPS are well conserved, while their catalytic pockets are highly variable (Bohlmann et al. 1998). Current evidence suggests that despite the high diversification of TPS gene family, they most likely all evolved from one ancestral gene (Trapp and Croteau 2001;Keeling et al. 2010).
One plant group renowned for its terpenoid diversity is the angiosperm family of Myrtaceae, which consists of trees and shrubs that are mainly distributed in forests and woodlands of the southern hemisphere (Wilson 2011). Myrtaceae are a major element of the Australian flora and are renowned for producing high concentrations of volatile terpenoids usually stored in schizogenous secretory cavities in their leaves (Brophy et al. 2013). The oils of these species are mostly dominated by mono-and sesquiterpenoids and display terpenoid diversity at a taxonomic, population, as well as individual level (Keszei et al. 2010a). Within the family, the three tribes Melaleuceae, Eucalypteae, and Myrteae have the highest diversity regarding unique monoterpenoid and sesquiterpenoid compounds in their leaves (Padovan et al. 2014). The Eucalypteae tribe, which consists of the genera Eucalyptus, Corymbia, and Angophora, is one of the bestknown groups within the Myrtaceae, with some species of worldwide economic importance for the timber and essential oil industries (Grattapaglia et al. 2012).
The terpenoid diversity of eucalypt oils is matched by corresponding high numbers of TPS genes. Using the reference genome for the model Myrtaceae Eucalyptus grandis, Külheim et al. (2015) identified 113 TPS genes, which is the highest number of TPS genes known for any plant. Relatively high numbers of TPS have subsequently been found in other members of the eucalypt group, with 106 putatively functional TPS genes in E. globulus  and 102 in Corymbia citriodora . In terpenoid-rich species, such as eucalypts, the TPS genes, as well as TPS pseudogenes, are usually found in genomic clusters with high sequence similarities Butler et al. 2018). Like other gene families involved in plant-environment interactions, TPS genes are thought to expand via lineage-specific tandem duplications in response to adaptive pressures (Tholl 2006;Hanada et al. 2008;Moore et al. 2014). The mono-and sesquiterpenoid producing TPS families, in particular, often undergo large expansions in gene numbers in some higher plants (Martin et al. 2010;Külheim et al. 2015). The large TPS gene families in eucalypts are thought to be a consequence of the need to evolve complex plant defences to allow these long-lived woody species to persist in a diverse range of often adverse habitats. Whether this lineage-specific expansion is a feature of eucalypts, or more broadly of the Myrtaceae is unknown.
The Melaleuceae provide an outgroup for phylogenetic and evolutionary studies of the Eucalypteae, and are arguably a model taxon for terpene biochemistry and genetics in the Myrtaceae. The essential oil distilled from the leaf of Melaleuca alternifolia (medicinal tea tree) is renowned for its therapeutic properties, and used in medicinal and cosmetic products throughout the world (Carson et al. 2006). The Australian tea tree industry produces around 1000 metric tonnes of oil per annum, of which 90% are exported to international markets (Larkman 2020). Australian plantations are based on genetically improved populations from a long-term breeding program (Doran et al. 1996;Baker et al. 2010), and a wealth of studies has investigated the chemistry, biochemistry, and genetics of the leaf oils in tea tree, including the identification and characterisation of key TSP (e.g. Shelton et al. 2002;Keszei et al. 2010b;Bustos-Segura et al. 2017;Padovan et al. 2017a). Genomic resources for the taxon were further enhanced with the first draft genome assembly for tea tree in 2018 and the first attempt to characterise its TPS gene family (Calvert et al 2018). Despite the highly diverse terpenoid compounds (~ 100) produced by this species (Brophy et al. 1989), and in contrast with the extraordinarily high number of TPS genes found in eucalypts, only 37 putatively functional TPS were found in tea tree (Calvert et al. 2018). But this first estimate used an assembly created from short Illumina paired-end reads and therefore may have underestimated the number of TPS, as some genes may have been overlooked due to genome fragmentation and incomplete gene models.
To more reliably characterise the TPS gene family for tea tree, a new draft assembly was generated using long reads as well as short reads to enhance completeness (Voelker et al. 2021a). This 362 Mb assembly was close in size to a flow cytometry estimate of 357 Mb (Calvert et al. 2018(Calvert et al. , 2021, and the scaffold N50 was increased by a factor of 214 in comparison with the earlier draft. Hence, this new draft was expected to permit a more reliable detection of genes and a more detailed analysis of gene organisation due to the larger scaffold size (N50 = 1.9 Mb). Although increased relative to the earlier estimate, our manual annotation of this new assembly revealed that the TPS gene family in M. alternifolia was still around half that of the average eucalypt TPS count, mainly due to reduction in the secondary metabolite producing subfamilies. New insights into the organisation of TPS in M. alternifolia indicated that, like other plants, the TPS in tea tree evolve in clusters via tandem duplication.

Methods
A brief description of methods is given below. A more detailed description of the methodology, including the use of software settings and commands, is available as protocol on protocols.io (Voelker 2023).
All lines of gene evidence, the Fgenesh++ prediction, GenemarkET+ prediction, spliced RNAseq alignments, and Exonerate TPS alignments, were visualised together with the genome assembly in the genome annotation editor Apollo (Dunn et al. 2019, RRID:SCR_001936). Manual adjustments of exon-intron borders were carried out where appropriate evidence for canonical splice sites was found in the alignments (Online Resource 2). Potential assembly errors were identified by examining alignments of the raw sequencing reads to the regions of interest. After finishing the manual annotation of TPS, they were named consecutively (MaltTPS001-MaltTPS73) based on their location on a scaffold, from lowest to highest scaffold number, and their location within the scaffold. Some putative TPS were later removed due to potential scaffold duplications, which were assessed by sequence identity of predicted TPS, and all-vs-all alignment of TPS-containing scaffolds. The final predicted protein sequences were screened for the presence of conserved amino acid motifs (RRx8W, DDxxD, RxR, and NSE/DTE), conserved C-and N-terminal TPS Pfam domains (PF03936, PF01397), as well as potential N-terminal chloroplast transit peptides (cTPs). Visualisations of gene structures and TPS organisation on a scaffold were created with the R-packages ggplot2 (Wickham 2016), gggenomes (Hackl and Ankenbrand 2022), gggenes (Wilkins 2020), and ggrepel (Slowikowski 2021).
Other genes potentially involved in terpenoid synthesis were identified by pathway mapping of the Fgenesh++ predicted protein sequences using Mercator4 v5.0 (Schwacke et al. 2019).

Phylogenetic tree construction
The predicted M. alternifolia TPS protein sequences were manually assessed in Unipro UGENE v39.0 (Okonechnikov et al. 2012, RRID:SCR_005579) and where possible, trimmed to the start of the RRx8W motif. Multiple sequence alignment (MSA) was carried out with the MUS-CLE tool (RRID:SCR_011812) in UGENE, and columns with gaps > 75% were removed from the alignment. The trimmed MSA output was used as input for phylogenetic tree construction using PhyML v3.3 (Guindon et al. 2010, 13 Page 4 of 19 RRID:SCR_014629). The resulting gene family tree was visualised in iTOL v6 (Letunic and Bork 2021) and manually rooted at the node that separates subfamilies c and e/f (class I primary metabolite TPS) from a, b, and g (class III secondary metabolite TPS).
For comparison with other species, the TPS amino acid sequences for E. grandis and E. globulus , C. citriodora , and V. vinifera (Martin et al. 2010) were retrieved from the respective published resources. The data from Külheim et al. (2015) also included sequences for A. thaliana and P. trichocarpa; however, many of these sequences contained gaps and seemed to be incomplete. Hence, it was decided to use the TPS identified by Jiang et al. (2019) for these species. Their methodology was similar to the method chosen for M. alternifolia, with TPS being classified as putatively functional when they contain both TPS Pfam domains. The gene names for these TPS were retrieved and used to extract the TPS amino acid sequences from the proteomes of A. thaliana ARA-PORT11 and P. trichocarpa v3.0 from Phytozome v13. The sequences for all species were loaded into Unipro UGENE and where possible, trimmed to the start of the RRx8W motif, as already done for M. alternifolia. Next, MSA was carried out for all species, using MUSCLE and trimming the resulting alignment to remove all columns with > 75% gap representation. Phylogenetic tree construction was carried out with 100 bootstraps and the same settings as for the M. alternifolia tree. After the whole tree was visualised and annotated, different clades were exported into separated trees for a clearer visualisation of each subfamily.

Orthogroup analysis
In addition to a comparative study of TPS, the overall similarity of protein sequences among tea tree was assessed in comparison with other long-lived members of the rosids clade. Tea tree was expected to share more orthologues with eucalypts than the other rosids, with phylogenetic relationships mirroring the overall ordering of species in the TPS gene analysis. Protein sequences for primary transcripts of E. grandis v2, C. citriodora v2.1, P. trichocarpa v4, Salix purpurea v1 (Zhou et al. 2018), and V. vinifera v2.1 were obtained from Phytozome v13. For M. alternifolia, protein sequences predicted by Fgenesh + + and the manually predicted TPS sequences were used. All sequences were used to identify single-copy orthologues, i.e. orthogroups containing one orthologous gene per species, with all species present in the group. After this analysis in OrthoFinder v2.5.2 (Emms and Kelly 2019, RRID:SCR_017118), all single-copy orthologues were aligned, translated into coding sequences (CDS) using pal2nal v14 (Suyama et al. 2006), and pairwise substitution rate calculation within orthogroups was carried out with PAML codeml v4.10.3 (Yang 2007, RRID:SCR_014932). Pairwise synonymous substitution rates among species were visualised with ggplot2 (Wickham 2016) in R (R Core Team 2022). The synonymous substitution (Ks) mutation rate/site/year (R) between species was calculated with R = Ks/(2*divergence age) following the method by Healey et al. (2021).

Manual TPS annotation
After manual annotation of the TPS in M. alternifolia, 73 gene models were initially declared as putatively functional and full length, plus a further 21 pseudogenes. However, upon further investigation of TPS sequence identities, and all-vs-all alignment of the scaffolds containing TPS, 15 fulllength genes were removed as redundant sequences due to overlapping scaffolds or potential assembly errors. After removal of these sequences, a set of 58 putatively functional TPS sequences and 21 pseudogenes remained, with the following classification: I) 29 complete genes containing both TPS Pfam domains and RNAseq alignment evidence, II) 25 complete genes containing both TPS Pfam domains but no RNAseq evidence, III) four genes with both TPS Pfam domains and one premature stop codon or missing start codon, which might be a point mutation or assembly error, and IV) 21 pseudogenes containing only one TPS Pfam domain or multiple stop codons and frameshifts (see Online Resource 2 for gene annotation examples of all four classes).
A phylogenetic tree was constructed from the aligned amino acid sequences of the final 58 TPS in class I, II, or III, which resulted in high bootstrap support values for clade separations (Fig. 1). The putatively functional TPS were classified into the subfamilies TPS-a (25), TPS-b (24), TPS-c (1), TPS-e/f (1 TPS-e and 3 TPS-f), and TPS-g (4). A second phylogenetic tree including translated sequences for the 21 pseudogenes was also generated (Online Resource 3). The pseudogenes were found to be similar to members of all subfamilies except TPS-c. The largest numbers of pseudogenes were classed into the secondary metabolite producing subfamilies TPS-a and TPS-b (Table 1a). The increased sequence variation due to the inclusion of pseudogenes introduced more uncertainty; hence, the bootstrap values were low for many clades. The grouping of genes into subfamilies, however, remained unchanged (Online Resource 3).
As shown in Table 1a, more putatively functional TPS were identified in comparison with the previous study by Calvert et al. (2018). The number of members in the TPS-b2, TPS-c, TPS-e/f, and TPS-g subfamily remained the same, but the count of TPS-a and TPS-b1 TPS was increased by eleven and eight, respectively. Two TPS amino acid sequences, MaltTPS018 and MaltTPS070, were identical to TPS identified by Calvert et al. (2018) (MelG011320 and MelG016462, respectively). One further TPS, MaltTPS040, had only one amino acid differing from MelG016338 identified in the earlier study.  Table 1 (a) TPS gene numbers identified in the newly assembled genome of M. alternifolia in comparison with the previous numbers by Calvert et al. (2018). Information about newly identified pseudogenes is also included, and members of the TPS-b2 subfamily are shown separate from the remaining TPS-b genes. (b) The total number of TPS genes in other species and their classification into subfamilies as observed by phylogenetic analysis in this study In addition to finding a higher number of TPS genes in the latest genome assembly, the overall completeness of the TPS sequences was also improved with the new predictions. Of the previous protein sequences predicted from the short-read assembly (Calvert et al. 2018), 15 of the 37 TPS were shorter than 400 amino acids (aa), and five of these short protein sequences contained only one of the two TPS Pfam domains. In contrast, all protein sequences from the newly annotated TPS were longer than 400 aa and had the N-and C-terminal Pfam domains. Three of the manually annotated TPS were still relatively short with a length of around 400-420 aa, but since they also contained the conserved amino acid motifs of class III TPS, we considered them as putatively functional.
Besides classifying the TPS based on the presence of Pfam domains, the predicted protein sequences were screened for the conserved N-terminal motif RRx8W and the C-terminal motifs RxR, DDxxD, and NSE/DTE. The highly conserved DDxxD motif and the less conserved NSE/DTE motif are usually located on opposite sides of the catalytic pocket of the enzyme (Christianson 2006;Degenhardt et al. 2009). Based on our annotation, all tea tree TPS-a members contained a conserved RxR and DDxxD motif, separated by exactly 34 amino acids in all but two predicted sequences. The RRx8W motif was found in all but one sequence, with the first two positions showing variable amino acids in 13 TPS-a. In all 24 TPS-b, the DDxxD motif was located exactly 34 amino acids upstream of the RxR motif. The RRx8W motif was encoded in all TPS-b but interrupted by a premature stop codon in one gene (Fig. 2).
Each of the TPS-g genes showed sequence similarities to one of the four shorter and most likely incomplete TPS-g identified by Calvert et al. (2018). Two TPS-g genes contained seven exons and the other two had six. All four TPS possessed both the conserved DDxxD motif, and the RRx8W motif, with the first two arginines being less conserved. In the sequence of one TPS-g protein, MaltTPS067, the distance of the RxR to the DDxxD motif was 55 aa instead of the usual 34 aa, and the motifs were encoded on the second exon of the gene. The gene might have lost some exons after duplication, leading to a shorter protein length of 426 aa. Nevertheless, all required motifs were present, and the encoding gene was still considered putatively functional.
The NSE/DTE motif was identified in 49 of the 53 TPSa, TPS-b, and TPS-g amino acid sequences. The sequence pattern in the tea tree TPS was [LF] [MNCTWGI][ND]  (Christianson 2006), and was similar to variations reported in Vitis vinifera (Martin et al. 2010).
The predicted TPS-f genes contained 12-13 exons and a resulting protein length of 725-845 aa. MaltTPS065 was predicted to be TPS-e and had 14 exons and a protein length of 784 aa. The DDxxD motif was encoded in the TPS-e/f members, but not in the TPS-c gene, which does not require this domain (Martin et al. 2010). The RRx8W motif was not well conserved in the genes of these three TPS clades, as can be seen in the gene structure visualisation in Fig. 2.
The location of TPS genes on scaffolds was also assessed. In 12 cases, a single TPS was predicted on a scaffold with no other TPS present. Some scaffolds, however, were long enough to provide evidence for TPS clustering. For example all TPS-g genes were located on one scaffold, together with two incomplete pseudogenes, and all three TPS-f genes were encoded on a single scaffold, together with two TPS-b genes further downstream. Another scaffold had 13 predicted putatively functional TPS-b1 genes (i.e. > 50% of all TPSb), along with six additional TPS pseudogenes, arranged in tandem in a cluster of around 400 kb on a scaffold with an overall length of 720 kb (see later for further discussion). Overall, ten scaffolds were found to contain combinations of between two and five TPS genes.
A further quality control to test the completeness of the TPS gene models was a screen for chloroplast transit peptides (cTPs) in the predicted protein sequences. In plants, sesquiterpenes are produced in the cytosol by TPS-a proteins, while all remaining TPS syntheses occur in the chloroplast (Chen et al. 2011). Hence, a cTP should be present in most of the predicted TPS sequences, especially the monoTPS. However, cTPs display a lot of variation in their sequence length and structure (Lee et al. 2008), leading to difficulties in cTP predictions. In the tea tree TPS, 20 sequences were found to contain a cTP. However, seven TPS-b proteins were putatively allocated to the cytoplasm, indicating that the program either did not recognise the cTP target signal, or that these protein sequences are still incomplete or contain assembly errors. Calvert et al. (2018) reported only five predicted TPS containing a cTP in M. alternifolia, so in order to compare results based on the same methods, the 37 previously predicted tea tree TPS were also analysed, and only nine TPS could be located to the plastids. This implies that even though advancements in predicting transit peptides have been made, the notable increase in the number of predicted cTP sequences compared to the previous studies can be attributed to the improved gene models reported in our study. A more detailed description of the cTP screening results can be found in the supplementary material (Online Resource 5).

Identification of other genes involved in terpenoid biosynthesis
Genes that might be involved in the cytosolic mevalonate (MVA) pathway and plastidic methylerythritol phosphate (MEP) pathway, which both lead to the production of IPP and DMAPP, were also identified in this study (see Online Resource 1a and b for pathway information). Three genes that mapped to the MVA pathway were located on scaffolds with TPS annotations. Two genes encoding potential isopentenyl pyrophosphate isomerases (IPPI) were found. The encoded protein sequences had the same length as isopentenyl diphosphate isomerase (IDI) 1 (291 aa) and IDI2 (235 aa) that were identified for M. alternifolia from previously isolated cDNA clones by Shelton et al. (2004a). The sequences had high similarities, with 226 aa and 223 aa being identical with IDI1 and IDI2, respectively. Furthermore, the predicted protein with high similarity to IDI1 also contained a chloroplast targeting signal, confirming the findings by Shelton et al. (2004a) that IDI1 is responsible for the reversible isomerisation from IPP to DMAPP in the chloroplasts, while IDI2 has the same function in the cytosol. The IDI1-like gene was located on the same scaffold as the TPS-c gene MaltTPS071.

Comparative genomics
Melaleuca and eucalypts are angiosperms and belong to the family Myrtaceae. Molecular phylogenetic studies date the divergence of Myrtaceae from their sister group Vochysiaceae to 86.7-96.4 Ma, and the genera Eucalyptus and Melaleuca are thought to have a shared ancestry until 68 Ma (Thornhill et al. 2015). Based on characteristics of the least specialised members of the genus, Barlow (1988) inferred that Melaleuca originated as a tree in 'seasonally drowned habitats at the margins of tropical rainforests'. This hypothetical ancestor is believed to have evolved scleromorphous traits in the early to mid-Tertiary (23-65 Ma), which resulted in the radiation of the genus and expansion to other habitats. Nowadays, the majority of the genus encompasses small trees and shrubs that grow in wetland or periodically waterlogged habitats, with most species not being well adapted to dry conditions (Brophy et al. 2013).
Within the eucalypt group, the genus Eucalyptus is estimated to have diverged from the Corymbia and Angophora lineage around 52 Ma (Thornhill et al. 2015). In contrast with melaleucas, eucalypts dominate sclerophyllous forests and are specially adapted to survive in arid environments with frequently occurring fires (Hill et al. 2016). This distinction in their habitat may have evoked unique adaptive responses in eucalypts compared to the melaleucas and led to a divergence in the TPS gene family evolution between these genera. Whole genome sequences and manually annotated TPS gene families are available for Eucalyptus grandis (Myburg et al. 2014;Külheim et al. 2015) and Corymbia citriodora (Butler et al. 2017Healey et al. 2021), which provides a good foundation for comparative genomic studies with M. alternifolia. Based on the estimated divergence times and phylogenies, M. alternifolia should still share a relatively high sequence conservation with the two reference eucalypts when compared to other woody angiosperms.
Prior to conducting phylogenetic analyses of the TPS family in different species, the overall similarity of sequences among tea tree and the eucalypts was assessed in comparison with other long-lived members of the rosids clade. The pairwise synonymous substitution rates (Ks) among single-copy orthologous genes that are shared by E. grandis, C. citriodora, M. alternifolia, Populus trichocarpa, Salix purpurea, and Vitis vinifera were investigated. In total, a set of 2488 orthogroups contained a single-copy orthologue from each species, and the pairwise Ks values among species were calculated for the sequences in each orthogroup before being summarised in a density plot. Figure 3 shows the distribution of synonymous substitution rates among pairwise orthologue comparisons. C. citriodora and E. grandis orthologues have the lowest Ks values with a peak maximum of Ks = 0.1379. The Ks rates for orthologues among M. alternifolia and C. citriodora (peak max = 0.1839), as well as among M. alternifolia and E. grandis (peak max = 0.1891) show a highly similar distribution, which is consistent with expectations, since Melaleuca should share a common ancestor with Eucalyptus and Corymbia from before the eucalypt genera diverged. As expected, the pairwise Ks rates among the Myrtaceae are comparatively low relative to M. alternifolia and V. vinifera, which share increased synonymous substitutions. The highest Ks rates were observed when M. alternifolia was compared to P. trichocarpa and S. purpurea, which both belong to the Salicaceae and displayed corresponding Ks distributions. Overall, these observations corresponded with previous Ks rate investigations for C. citriodora (Healey et al. 2021). On the basis of the peak maxima of Ks distributions and the estimated divergence of melaleucas and eucalypts 68 Ma (Thornhill et al. 2015), the synonymous mutation rate per site per year among orthologous genes of M. alternifolia and E. grandis was predicted to be 1.390 × 10 -9 . Between M. alternifolia and C. citriodora, the same calculation led to an estimate of 1.3522 × 10 -9 synonymous mutations (site/ year). These values were in agreement with estimates for Salicaceae (Dai et al. 2014) and the eucalypts (Healey et al. 2021). The Ks peak maximum of 0.1379 for C. citriodora and E. grandis was slightly lower than the value of 0.1585 reported by Healey et al. (2021) but methodological differences in the selection of candidate genes might have led to this small discrepancy.
In order to confirm the species' relationships that should also be expected when investigating the TPS gene relatedness, a phylogenetic tree was created based on the sequence similarities of orthologous genes, which showed anticipated relationships among species. Melaleuca split from a common ancestor shared with Eucalyptus and Corymbia. The remaining rosids are on a separate clade with Salix and Populus being closely related, and Vitis located on a branch that splits from a common ancestor of the Salicaceae (Online Resource 6).

TPS subfamily representation among species
To assess the expansion or contraction of the M. alternifolia TPS subfamilies in comparison with other species, phylogenetic trees were constructed from the multiple sequence alignment of predicted tea tree TPS sequences and TPS proteins of A. thaliana, C. citriodora, E. globlulus, E. grandis, P. trichocarpa, and V. vinifera. The observed numbers of TPS in each subfamily (Table 1b) were consistent with the numbers reported in the previous studies for V. vinifera (Martin et al. 2010), E. grandis and E. globulus , C. citriodora , and A. thaliana (Jiang et al. 2019). For P. trichocarpa (Jiang et al. 2019), however, the classification into subfamilies deviated from the previous findings. The difference was small, with two previous TPS-a members being classified as TPS-b in Fig. 3 Pairwise synonymous substitution rates among single-copy orthologous genes. CC Corymbia citriodora; EG Eucalyptus grandis; MA Melaleuca alternifolia; PT Populus trichocarpa; SP Salix purpurea; and VV Vitis vinifera our study. Since these two subfamilies are structurally similar, a difference in alignment methodology might have led to these results.
Since the TPS in M. alternifolia were only classified as putatively functional if they contained both TPS Pfam domains (C-and N-terminal domain), it was also examined whether the TPS amino acid sequences published by Butler et al. (2018), Külheim et al. (2015), and Martin et al. (2010) contain these domains. Two TPS proteins in C. citriodora, three in E. globulus, and three in E. grandis were found to contain only one of the two TPS Pfam domains, and one further E. grandis protein contained no TPS Pfam domain at all. In V. vinifera, the number was considerably higher, with 16 proteins only encompassing one of the two TPS Pfam domains. For A. thaliana and P. trichocarpa, the methodology by Jiang et al. (2019) already took into account that a functional TPS should have both the C-and N-terminal Pfam domains, but since their study only annotated TPS from published proteomes for the respective species, some TPS might have been missed.

Phylogenetic tree construction and species comparison
The M. alternifolia TPS were found to be closely related to the other three Myrtaceae, C. citriodora, E. globulus, and E. grandis. No TPS cluster was found to be unique for tea tree, on each clade of the phylogenetic tree, the M. alternifolia TPS were closely related to TPS of the other Myrtaceae. Overall, the ordering of TPS in the phylogenetic tree showed the same relationships as the species tree inferred from Orthofinder (Online Resource 6). Melaleuca alternifolia TPS were more divergent to all eucalypt TPS than any among eucalypt comparisons. Within the eucalypt set, E. grandis and E. globulus were closer related, while Eucalyptus and Corymbia citriodora were more divergent (see Figs. 4, 5, and 6). These findings, although expected, confirmed the high quality of the manual TPS annotations for tea tree. Phylogenetic relationships for M. alternifolia TPS pseudogenes were also assessed and can be viewed in Online Resource 7 a-c. Following the convention of Butler et al. (2018), the term orthologous gene pairs will be used in the subsequent results if the TPS gene of one species is more closely related to a gene of another species than to a gene within its own genome. In the TPS-a subfamily, M. alternifolia had only one orthologous pair between MaltTPS018 and Cor-ciTPS036 (Fig. 4, O1). The two cases of orthologous pairing that were previously reported for C. citriodora and Eucalyptus  were also observed in this analysis, with no direct pairing to M. alternifolia genes (Fig. 4, O2 and O3). The notable TPS cluster containing EglobTPS022 and multiple C. citriodora TPS  was also displayed in the phylogenetic tree. While no orthology to any E. grandis TPS genes existed, three M. alternifolia TPS were found to belong to this clade (Fig. 4a). Some TPS-a clades have seen expansions in specific species. For example one cluster showed considerable expansion in C. citriodora (Fig. 4b). This expansion in comparison with the Eucalyptus species has already been mentioned by Butler et al. (2018). M. alternifolia TPS were also present in this clade, and even though they were not as numerous as in C. citriodora, they had two more members than Eucalyptus spp. However, in many cases, M. alternifolia TPS were underrepresented or missing completely from a clade (e.g. Figure 4c). As previously reported by Butler et al. (2018), the E. globulus and E. grandis TPS genes are often in orthologous pairs (e.g. Figure 5, O4). In contrast, only two orthologous pairs were found between Eucalyptus and C. citriodora in the TPS-b subfamily and none for TPS-g. We observed the same orthologous genes that were already reported for C. citriodora, and M. alternifolia was included in these orthologous clusters (Fig. 5, O1 and O2). An additional orthologous pair was found between MaltTPS042 and CorciTPS080 (Fig. 5, O3), whereas the Eucalyptus TPS were expanded in this clade. A potential gene loss in E. grandis ) was also detected (Fig. 5, a). Only one E. globulus gene was member of this clade, whereas M. alternifolia had two, and C. citriodora had three TPS. The TPS-b2 subfamily seems to be contracted in M. alternifolia compared to the other three Myrtaceae. While only two TPS-b2 members were found in M. alternifolia, which is the same as for P. trichocarpa, we observed nine TPS-b2 genes for E. grandis and ten for E. globulus and C. citriodora, respectively. The TPS-g family has also seen a larger expansion in the other three Myrtaceae than in M. alternifolia, and orthologous pairs were only observed between E. grandis and E. globulus.
In a concordance with previously reported findings, the single C. citriodora gene in the TPS-c subfamily formed an orthologous group with Eucalyptus (Fig. 6, O1)  . The M. alternifolia TPS-c gene was located on the same clade. The Eucalyptus spp. TPS-c subfamily was slightly expanded, with an additional orthologous pair between E. globulus and E. grandis.
In the TPS-e subfamily, the phylogeny was similar to the results reported by Butler et al. (2018). MaltTPS065 occurred in an orthologous group with CorciTPS089, EgranTPS093, and EglobTPS116 (Fig. 6, O2), and the TPS-e has seen a slight expansion in E. globulus and E. grandis relative to C. citriodora and M. alternifolia. The TPS-f subfamily was also more expanded in the Eucalyptus species compared to the other two Myrtaceae. Only one orthologous pair was observed between E. grandis and E. globulus (Fig. 6, O3), which was unexpected based on previous observations for this subfamily . Butler et al. (2018) also reported their TPSf subfamily results to slightly differ from the phylogeny presented by Külheim et al. (2015) for eucalypts. Low bootstrap supports for some of the clades might have contributed to these differences.

Identification of specific terpene synthases
Based on the phylogenetic tree construction, only two M. alternifolia TPS gene products were identified as potential TPS-b2 proteins (Fig. 5). As shown in Table 2, one of these proteins, MaltTPS043, showed high sequence similarity to a tea tree TPS that was initially annotated as potential monoterpene synthase (GenBank accession AY279379.1; Shelton et al. (2004b)) but was later shown to share conserved amino acids with isoprene synthases (Sharkey et al. 2005). The phylogenetic tree also showed MaltTPS043 to be an orthologue of the functionally characterised isoprene synthase of E. globulus (Fig. 5, O2) (Sharkey et al. 2013). To investigate whether the potential isoprene synthase contained known conserved amino acids, the sequences of the two TPS-b2 proteins MaltTPS042 and MaltTPS043 were aligned to the kudzu isoprene synthase (Sharkey et al. 2005). MaltTPS043 contained the conserved residues Phe-343, Gly-453, and Phe-493, but had a Thr at position Cys-496 (numbering is based on the kudzu amino acid positions). These observations coincide with the findings by Sharkey et al. (2005) and indicate that MaltTPS043 could encode an isoprene synthase. MaltTPS042, on the other hand, only shared the Gly-453 residue with the kudzu isoprene synthase. Calvert et al. (2018) also predicted two TPS-b2 genes to be present in the tea tree genome and considered the second TPS-b2 gene product to be a potential ocimene synthase.
The TPS-b1 family, which is responsible for the production of monoterpenoids, is of special interest considering that tea tree oil gets its medicinal properties from terpinen-4-ol and other monoterpenoids that have to meet specified proportions in commercial essential oils (ISO 4730: 2017 Essential oil of Melaleuca, terpinen-4-ol type (Tea Tree oil)). Terpinen-4-ol is not a direct product of a TPS, but is generated from non-enzymatic conversion of the monoterpene sabinene hydrate (Cornwell et al. 1999). Padovan et al. (2017a) have analysed transcriptomes from different M. alternifolia chemotypes and functionally characterised one terpinolene synthase (MaTPS-Tln), one sabinene hydrate synthase (MaTPS-SaH), and two 1,8-cineole synthase genes (MaTPS-CinA and MaTPS-CinB). Furthermore, they observed a strong correlation between the expression of MaltTPS-SaH, MaTPS-Tln, and MaTPS-CinA, and the terpenoids terpinen-4-ol, terpinolene, and 1,8-cineole, respectively. These findings were consistent with the suggestion by Keszei et al. (2010b) that as few as three enzymes might be responsible for the creation of chemotypes in tea tree. However, they also observed that terpenoid profiles correlated with the expression of other, as yet uncharacterised putative TPS, indicating that more than the 3 functionally characterised TPS could be involved in the formation of chemotypes in M. alternifolia.
By creating multiple sequence alignments of the newly predicted TPS to sequences obtained from Padovan et al. (2017a), we were able to identify four TPS with high sequence similarity to conserved amino acids of MaTPS-Tln, MaTPS-SaH, and MaTPS-Cin (Table 2). All four TPS that putatively account for chemotype grouping in tea tree are located in tandem within a region of around 87 kb on one scaffold. This scaffold, scf7180000030749, has a total length of 731,646 bp and contains genes for 13 TPS proteins (MaltTPS019-MaltTPS031), as well as TPS pseudogenes (Online Resource 8). MaltTPS020 had 98% amino acid identity with sequences conserved in MaTPS-Tln, MaltTPS021 had 100% similarity with MaTPS-SaH amino acids, and MaltTPS023 and MaltTPS025 were 99% and 100%, identical with regions characteristic of MaTPS-Cin, respectively. Among the TPS encoded on scf7180000030749, MaltTPS019 and MaltTPS026 also had high similarities to the SaH and Cin synthases (Table 2). This was in concordance with the previous findings supposing that one more TPS might have a similar product profile to MaTPS-Tln, one more TPS was likely to produce 1,8-cineole and related compounds, and another TPS was similar to MaTPS-SaH (Padovan et al. 2017a). Although these enzymes would have to be functionally characterised in order to get a definitive answer to this theory, the phylogenetic results of our analysis give weight to this hypothesis. MaltTPS019 was placed on a sister branch to the sabinene hydrate synthase MaltTPS021,  Table 2 Sequence similarity of predicted TPS proteins after alignment to a potential isoprene synthase (Shelton et al. 2004b;Sharkey et al. 2005) and conserved regions of functionally characterised ter-pinolene, sabinene hydrate, and 1,8-cineole synthases (Padovan et al. 2017a). The sequence identity in per cent was determined by protein alignments using MUSCLE MaltTPS026 was closely related to the two 1,8-cineole synthases, and two TPS (MaltTPS008 and MaltTPS056) were similar to the putative terpinolene synthase (Fig. 1).

TPS occur in tandem arrays in Melaleuca alternifolia
This manual annotation of a scaffold-scale assembly of the tea tree genome revealed that genes and pseudogenes from the same TPS subfamily with high sequence similarities were frequently located in close proximity on the same scaffold. This pronounced clustering of TPS genes into tandem arrays in tea tree paralleled the tandem clusters found in E. grandis and C. citriodora Butler et al. 2018), and supported the conclusion that duplication through unequal crossing over, and subsequent sub-or neo-functionalisation was a key mechanism underpinning the evolution of TPS in M. alternifolia. The mechanism of tandem duplication is believed to contribute to the adaptive diversification of genes involved in stress responses, such as the TPS, as they are more likely to be retained following duplication due to adaptive pressures (Hanada et al. 2008). A large cluster of 13 TPS-b genes on scf7180000030749 served as an exemplar for the investigation of tandem duplication of TPS genes in tea tree with a maximum of two non-TPS genes interspersed among TPS gene pairs of MaltTPS019-MaltTPS028. Moreover, four TPS genes, including the putative terpinolene, sabinene hydrate, and one 1,8-cineole synthase (MaltTPS020-MaltTPS023), occurred in a single contiguous tandem array (Online Resource 8). Taken together with the high sequence similarity between the sabinene hydrate synthase and the two 1,8-cineole synthases, as well as their grouping on one clade of the phylogenetic tree ( Fig. 1), this finding supported a hypothesis that the sabinene hydrate synthase evolved by gene duplication and subsequent neo-functionalisation of a more ancestral 1,8-cineole synthase (Keszei et al. 2010c;Padovan et al. 2017a).
The clustering of pseudo-TPS with putatively functional TPS genes further supported tandem duplication as a mechanism of TPS evolution. Together with the 13 TPS-b genes on scf7180000030749, six additional TPS pseudogenes were located to the same scaffold, suggesting that this region of the genome is a hot spot for non-homologous recombination. While some duplicated genes can obtain new functions through sub-or neo-functionalisation, most functionally redundant genes will be lost if they are not subject to selection (Lynch and Conery 2000). In this process of pseudogenisation, one copy of the gene usually accumulates mutations and becomes functionless, eventually accumulating too many mutations to be identifiable (Zhang 2003).

Divergent adaptive histories may account for TPS subfamily differences in eucalypts and Melaleuca
All angiosperm TPS subfamilies are represented in M. alternifolia but variations in the size of certain subfamilies relative to the other Myrtaceae were observed. The largest differences were evident in subfamilies that produce secondary metabolites, and thus are likely to be subject to adaptive pressures. For example eucalypts had around twice as many TPS-a (sesquiterpenoid) genes (Table 1) compared to M. alternifolia. Even if TPS-a pseudogenes are considered (Online Resource 3, Online Resource 7 a), this subgroup in tea tree is unlikely to have had the same significance historically as it had in extant eucalypts. Many sesquiterpenoids are important signalling molecules mediating plant-insect or plant-pathogen interactions by attracting pollinators (Pichersky and Gershenzon 2002), acting as phytoalexins (Vögeli and Chappell 1988) or defence against insect herbivores (Yuan et al. 2008).
Eucalypts also had a higher number of TPS-b2 than M. alternifolia ( Table 1). Enzymes of this subfamily catalyse the formation of the hemiterpene isoprene and the acyclic monoterpenes β-ocimene and myrcene (Sharkey et al. 2013), molecules shown to have roles in biotic and abiotic stress responses (Fäldt et al. 2003;Velikova and Loreto 2005). Eucalyptus grandis, for example possesses a total of nine TPS-b2, including a large cluster of eight TPS-b2 genes located within 107 kb on pseudo-chromosome 11 of its genome Calvert et al. 2018). While all the TPS-b2 genes in tea tree were also located together on a single scaffold separated by around 21.5 kb, there were only two full-length candidates (MaltTPS042 and MaltTPS043), a putative isoprene and ocimene synthase for this subfamily. This suggested that M. alternifolia may not have experienced the same biotic and abiotic adaptive pressures to expand its TPS-b2 subfamily as the eucalypts.

Biotic factors
Tandem duplication and adaptation to biotic stresses such as herbivory are likely to have been drivers of TPS subfamily expansion in both lineages of these long-lived trees, but the degree of expansion may have been a consequence of differences in the guilds of insects, pathogens, and mammals utilising each lineage. Evidence of coevolution in the hosts and their dependants has been found in both lineages. A number of components of both eucalyptus oil and tea tree oil exhibit antibacterial and antifungal properties (Carson et al. 2006;Hendry et al. 2009). Furthermore, eucalypts and tea tree both support a myriad of insects that feed on or otherwise utilise the foliage for shelter, or pollinate the flowers. Members of the Psyllidae and Chrysomelidae (e.g. Paropsisterna tigrina) utilise tea tree as host (Campbell and Maddox 1999). Some have been shown to have developed mechanisms to metabolise its terpenoids (Southwell et al. 1995), and exhibit selective herbivory corresponding with tea tree chemotypes, suggesting long-term associations (Bustos-Segura et al. 2015). Similar evidence of coevolution has been found among insect herbivores of some eucalypts (Edwards et al. 1993;Stone and Bacon 1994). Mammal browsers of eucalypts, such as the marsupial koalas (Phascolarctos cinereus) and common ringtail possums (Pseudocheirus peregrinus), also show feeding behaviours influenced by eucalypt leaf chemistry (Lawler et al. 2000;Moore et al. 2005), and adaptations to digest terpenoids (McLean and Foley 1997;Pass et al. 2002). Similar evidence of utilisation by mammals has not been shown for tea tree, and this may be a point of difference between the lineages.

Abiotic factors
Other differences in the historic environments in which the two lineages evolved likely also contributed to the divergence in TPS gene family compositions. The Myrtaceae originated during the late Cretaceous (Thornhill et al. 2015), and according to phylogenetic studies by Crisp et al. (2011), the most recent common ancestor between Melaleuceae and Eucalypteae was most likely a rainforest tree occurring in wet environments. The eucalypt progenitor is thought to have diverged from the Melaleucae lineage around 68 Ma and emerged from its rainforest environment to inhabit drier habitats. The ancestral eucalypts, therefore, evolved in drier ecosystems with frequent fires (Hill et al. 1999), with present-day species showing constellations of features associated with adaptation to resource-limiting conditions (Keith 1997).
Plants adapted to low-resource environments are thought to be slower growing than resource-rich species, as they place more emphasis on defending precious photosynthetic organs and toughing out challenging conditions by relying on energy storage (e.g. lignotubers and root tubers) (Coley et al. 1985;Chapin et al. 1993). Resource-rich species, on the other hand, are thought to be adapted to exploit moisture and fertile conditions for rapid growth required to outcompete their neighbours, and tend to invest less energy in defence. A richer terpenoid chemistry that provides protection from predation as well as resilience to abiotic stress for eucalypts therefore is consistent with a greater prominence of aridity and nutrient limitations in their evolutionary history relative to Melaleuca.
Such an emphasis on adaptation to aridity may have been particularly important for TPS-b2 expansions. Eucalyptus spp., for instance, are known to emit exceptionally high amounts of isoprene (Benjamin et al. 1996;He et al. 2000), especially under higher temperatures (Guidolotti et al. 2019), which increases a plant's tolerance to heat and water stress. Even though the exact mechanisms still require further investigation, isoprene emissions have been shown to confer thermotolerance by protecting leaves against high temperatures (Velikova and Loreto 2005), and protect plants against damage from reactive oxygen species (Velikova et al. 2004). Another TPS-b2 product, β-ocimene, could also play a role in mitigating abiotic stresses. For example Faralli et al. (2020) reported that transgenic Arabidopsis plants that constitutively emit ocimene and were subject to soil water stress displayed non-conservative behaviour with late stomata closures. They proposed that this might be a beneficial trait for plants in dry environments with limited resources, enabling them to take up water and nutrients under adverse conditions, thus gaining an advantage over competing species.
In contrast with eucalypts, where the most parsimonious explanation for the significant evolution of terpene diversity has been one of the overall expansions, the history of the Melaleuceae lineage may have been more complex and multi-directional, with the progenitors of Melaleuceae keeping stronger ties to the wetter environments of their ancestral origins, yet nonetheless experiencing periods of aridity (Crisp et al. 2011). Barlow (1988) hypothesised that the ancestral melaleucas and eucalypts evolved in distinct habitats, with Melaleuca spp. growing in wet areas, and eucalypts in drier environments with the regular occurrence of fires. Subsequent evolution and diversification, especially during the cyclic aridity of the Quaternary, led to the ecological coexistence that we observe in some members of these genera today. More recently, Thornhill et al. (2015) concluded that the last common ancestor of Melaleuceae and its sister tribe Osbornieae must have been a wetland plant, based on the fact that Osbornia grows in mangrove forests, and Melaleuca spp. often occur in swampy habitats. A history for the ancestors of M. alternifolia with periods of aridity but overall more benign, resource-rich habitats than eucalypts, is consistent with an overall less expanded TPS gene family relative to the eucalypt lineage, but one nonetheless that may carry signatures of past expansion and contraction in a legacy of pseudogenes.

Outlook
The terpene synthase gene family is of special interest to the tea tree industry because of its role in essential oil production. Six chemotypes have been identified in M. alternifolia based on the composition of three main terpenoids (Homer et al. 2000). Of these, one chemotype is dominated by terpinen-4-ol, one by 1,8-cineole, and one by terpinolene, whereas the other three non-cardinal types have intermediate levels of each of the three major constituents in different combinations. Only one chemotype, however, produces the commercially valuable high terpinen-4-ol oil (Homer et al. 2000). Chemotypes have been shown to be highly heritable (Shelton et al. 2002) and segregate within bi-parental crosses. The capacity to fix populations for a chemotype within three breeding cycles suggests major gene effects underly these categorical phenotypes. Our finding that the TPS genes putatively controlling terpinolene, sabinene hydrate, and 1,8-cineole synthesis are located in close proximity (within 87 kb) on one scaffold provides a possible explanation for the apparent categorical inheritance of these chemical attributes if the region behaves as a single locus. The availability of an improved draft genome sequence for tea tree (Voelker et al. 2021a) and the TPS annotation reported here are valuable resources for the tea tree breeding program.
Taken all together, this new information could be applied in molecular breeding to identify polymorphisms in the TPS genes that relate to variation in terpenoid profiles, and to investigate the genetic foundation of the six different oil chemotypes in tea tree. Similar approaches have already been used to associate genetic variants with traits connected to terpenoid concentration or oil yield in Eucalyptus sp. (Padovan et al. 2017b;Kainer et al. 2019). Finally, the release of a chromosome-scale genome assembly for M. alternifolia (Zheng et al. 2022) during the preparation of this work will provide a further opportunity to explore larger scale syntenic blocks shared by eucalypts and Melaleuca in relation to the TPS gene organisation among Myrtaceae in the future.

Information on Electronic Supplementary Material
Online Resource 1. Biosynthetic pathway information for terpenoid synthesis. Online Resource 2. Examples of manual TPS-b annotation evidence. Online Resource 3. Phylogenetic tree for putatively functional Melaleuca alternifolia TPS sequences. Online Resource 4. Sequence logo representation of the NSE/DTE motif in tea tree TPS. Online Resource 5. Information on the screen for organelle-targeting signal peptides. Online Resource 6. Species tree inferred from Orthofinder results. Online Resource 7. Inter-species phylogenetic trees of TPS subfamilies including tea tree pseudogenes. Online Resource 8. Visualisation of the large TPS-b tandem cluster organisation. Online Resource 9. General information about the annotated putatively functional TPS and pseudogenes, including a summary of conserved motifs encoded in each gene. Online Resource 10. GFF3 file listing the genome coordinates from the manual annotation of putatively functional TPS genes and pseudogenes. Online Resource 11. GFF3 file for the 15 TPS sequences that were excluded from further analysis due to potential redundancy and assembly errors.
Online Resource 12. FASTA file containing the protein sequences of the 58 analysed full-length TPS.

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s00606-023-01847-1. usega laxy. eu/), and the patmatdb search tool for sequence motifs was accessed via Galaxy Australia (https:// usega laxy. org. au/) (The Galaxy Community 2022). The authors would also like to thank the two anonymous reviewers whose valuable feedback helped to further improve this manuscript.
Author contributions JV, RM, and MS helped in conceptualisation; JV worked on methodology and formal analysis; JV undertook writingoriginal draft preparation; JV, RM, and MS helped in writing-review and editing; MS worked in funding acquisition; and RM and MS helped in supervision. All authors read and approved the manuscript.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions. Julia Voelker was supported by a PhD stipend and operating expenses from Southern Cross University and the Australian Tea Tree Industry Association (ATTIA). Ramil Mauleon and Mervyn Shepherd were funded by salaries from Southern Cross University.
Data availability statement A more detailed description of the methods and computational commands has been summarised on protocol.io (Voelker 2023). The M. alternifolia genome sequence and accompanying raw-read data are available under NCBI BioProject PRJNA702189 (GenBank assembly accession: GCA_019926035.1). The Fgenesh++ gene prediction and other related data are deposited in the GigaScience Database (Voelker et al. 2021b). All supplementary text and figures mentioned in the article are available in the supplementary material (Online Resource 1 -8), and other supplementary files contain further information about the annotated genes or proteins, e.g. GFF3 and FASTA files (see Information on Electronic Supplementary Material). DNA sequences for annotated TPS genes can be retrieved with the GFF3 file in Online Resource 10 using the genome assembly in the above-mentioned NCBI BioProject.

Conflict of interest
The authors declare that no competing interests exist.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.