Introduction

Triterpenoids are the important group of terpenoids with bioactivities (Chandra et al. 2022). They consist of complex cyclic structures and are present as carboxylic acids, alcohols or aldehydes. Triterpenoids and their derivatives, i.e., steroids, are a diverse family of compounds found in animals, plants, bacteria and fungi (Muffler et al. 2011). Plant terpenoids have important functions in the fluidity of the membrane, respiration, photosynthesis and growth regulation and development (Tholl 2015), and are present in different parts of plants, such as the aerial parts of Lantana camara L. (Abdjul et al. 2017), the leaves of Campsis grandiflora and the cuticle waxes of apple peel (Abdjul et al. 2017). They are also found in marine algae (Laurencia sp) (Singh et al. 2020) and the endophytic fungus of Huperzia serata (Cui et al. 2021) and Fusarium sp. (Ibrahim et al. 2016). The biosynthesis of isoprenoids/terpenoids is important for living organisms and for agriculture and industry. To date, approximately 80,000 terpenoid compounds are found naturally, which perform various (structural and functional) roles as secondary metabolites (Pemberton et al. 2017).

Triterpenoids are biosynthesized in the endoplasmic reticulum and cytoplasm via the combination of two molecules of farnesyl diphosphate to form a C30 precursor (squalene), which is known as the mevalonate/acetate pathway (Yan et al. 2014). Squalene epoxide is the precursor of 3-hydroxytriterpenes, whereas all 3-deoxytriterpenes are directly derived from the cyclization of squalene, and it leads to the biosynthesis of the triterpenoids (Sawai and Saito 2011). Triterpenoids are divided into different subgroups, such as acyclic, monocyclic, bicyclic, tricyclic, tetracyclic and pentacyclic compounds according to the diverse features in their structures (Nguyen et al. 2015). However, the major types of triterpenoids are the tetracyclic derivatives of lanostane, dammarane, sprotostane, cucurbitane, apotirucallane, tirucallane, euphane and cycloartane; the pentacyclic derivatives of baccharenyl cation-type compounds, such as oleanane, lupane, ursane, taraxerane, multiflorane, bauerane, glutinane, friedelane, pachysanane and taraxastene; and the pentacyclic derivatives of hopane type, such as hopane, neohopane, fernane, adianane, filicane and gammacerane (Ludwiczuk et al. 2017).

The field of triterpenoid research has recently endured a shift in paradigms due to the development of highly developed whole-genome sequencing technologies and other bioinformatics tools. Although analytical and chemical methods have been the primary means of accessing this class of compounds for decades, genomics tools are now helping to identify the genes involved in the biosynthetic pathways of the triterpenoids (Ahmed et al. 2021). This revolutionary change has also resulted in a strong need for statistical tools to help scientists in their daily work (Weber and Kim 2016). For assistance, the heterologous triterpenoid has been produced by assembling various triterpenoid biosynthetic pathways into the genome of Saccharomyces cerevisiae. The yield of various triterpenoids has increased considerably, from mL/L to g/L, by engineering the related enzymes and yeast metabolism. This success shows that engineering key enzymes can be regarded as a viable method to overcome the major barriers to the commercial use of these powerful secondary metabolites (Guo et al. 2020).

In the recent years, triterpenoids have attracted a lot of attention, due to their pharmacological properties, which include anti-inflammatory, anticancer, antiulcer, antimicrobial, antiviral and analgesic effects (Borella et al. 2019; Yasin et al. 2021). These are used in the treatment of liver diseases such as hepatitis and protozoal and parasitic infections. The most important use of triterpenoids is due to their cytotoxic properties. The triterpenoids extracted from Ganoderma lucidum have anticancer activity due to the presence of highly oxidized lanostanes (Qu et al. 2017). Ginseng is one of the most important traditional medicinal plant in China. Ginseng contains important compounds known as Ginsenosides, which are triterpene saponins (Kim 2018). Moreover, Torilis radiata has been found to be a hepato-protective herb due to the presence of hepato-protective triterpenoids. It restores liver enzymes, such as aspartate aminotransferase (AST), alanine aminotransferase (ALT) and lactic dehydrogenase (LDH) (Oyebode et al. 2016). Studies have confirmed the anti-cancer properties of cucurbitacin B in breast cancer (Yadav et al. 2010). Triterpenoids such as celastrol block the transcription and replication of human immunodeficiency virus (HIV). The extract obtained from Ipomoea batatas leaves has antioxidant and antimicrobial activity against Staphylococcus aureus, Candida albicans, Streptococcus mutans and Streptococcus mitis. When compared to ascorbic acid, these exhibit 42.94% antioxidant activity. Oral mucosa cancer can be treated using the extract of Ganoderma lucidum (Cheng and Sliva 2015). Jacaranda cuspidifolia contains tetracyclic triterpenoids (Alghasham 2013) and exhibits some antibacterial activity (Arruda et al. 2011) like cucurbitain D has a proteasome inhibitory effect. As a result, it can induce apoptosis in human T cells (leukemia). The extract of the wild herb Drymaria cordata has both analgesic and anti-nociceptive activity due to the existence of tannins, diterpenes, steroids and triterpenoids (Akindele et al. 2011). Keeping in view the current situation and high demand for triterpenoids, it is vital to understand their biosynthesis. In this review, we discuss the different routes used for the biosynthesis of triterpenoids and the strategies and/or methods used to improve their quality and quantity.

Biosynthetic pathways of triterpenoids

In prokaryotes as well as eukaryotes, the isoprenoid pathway is the most abundant and diverse route in which triterpenoids are biosynthesized. There are two types of biosynthetic pathways known for the synthesis of terpenoids in plants: (1) The methylerythritol 4-phosphate/deoxyxylulose 5-phosphate pathway (MEP/DOXP) and (2) the mevalonate (MEV or MVA) pathway (Shi et al. 2010). At the genome level, the MEV or MVA pathway is found in fungi such as ascomycetes and basidiomycetes, which has been investigated using the genes linked in the biosynthetic pathway of terpenoids (Liu et al. 2012). There are numerous secondary and primary metabolites synthesized via the isoprenoid route, commonly C5 building blocks named as isopentenyl diphosphate (IPP) and its isomer, dimethylallyl diphosphate (DMAPP), from which isoprenoids are derived. IPP and DMAPP are synthesized via the MVA and MEP pathways (Pu et al. 2021). While the MEP pathway is exclusively localized in plastids, the MVA pathway distributes between cytoplasm, endoplasmic reticulum and peroxisomes.

Mevalonate (MEV/MVA) pathway

The mevalonate pathway is one of the important metabolic pathways, which plays an important role in the synthesis of isoprenoids, in multiple cellular processes (Yeh et al. 2018). The MVA pathway is complex and has been an interesting subject in recent years. The MVA pathway enzymes are distributed in different subcellular compartments. The main rate-determining enzyme of the pathway, 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), is anchored to the endoplasmic reticulum exposing the catalytic domain of the protein towards the cytosol, whereas other enzymes in the pathway have been found in the cytosol and peroxisomes of the plant cell (Simkin et al. 2011). Different isoprenoids are associated with all of these compounds up to FPP via the MVA route catalyzed by acetoacetyl-CoA thiolase (AACT) to generate acetoacetyl-CoA and 3-hydroxy-3-methylglutaryl-CoA synthase (HMGS) to generate 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA). The change in HMG-CoA to mevalonic acid (MVA) is catalyzed by HMGR, one of the most highly functional enzymes in existence. Mevalonate and certain downstream derivatives such as dioxidolanosterol (a shunt pathway intermediate) and GOH (geranyol) regulate HMGR mRNA translation reducing its rate of synthesis (Peffley and Gayen 2003). HMGR is post-translationally regulated via phosphorylation and ubiquitin/proteasomal degradation. Short-term regulation of HMGR is mediated via phosphorylation by AMPK and dephosphorylation by PP2A (protein phosphatase 2 A). HMGR exists in the cell in both its unphosphorylated (active) and phosphorylated (inactive) states (Ching et al. 1996) (Fig. 1 A).

Plant HMGR activity responds in-vivo to a variety of developmental and environmental signals, such as light, cell division, and infection. Plants regulate HMGR activity at mRNA level by differential induction of HMGR gene family members, and post translationally by enzyme modification. Calcium, proteolytic, and calmodulin degradation may also have a role in regulation of plant HMGR (Stermer et al. 1994) (Fig. 1 A). HMGR is supposed to be the rate-limiting mevalonate pathway enzyme (Tian et al. 2015). In addition, the HMGR activity is also affected by some other regulatory mechanisms. The translation rate of HMGR mRNA is determined by the cell’s demand for non-sterol isoprenoids, while the degradation rate of the HMGR is regulated by the cell’s demand for both sterol and non-sterol isoprenoids. An isoprenoid biosynthesis pathway containing another key enzyme, mevalonate kinase (MVK), catalyzes the phosphorylation of mevalonic acid into phosphomevalonate (MVP) after HMGR. It is known that the MVK activity is feedback inhibition regulated by farnesyl diphosphate and geranyl diphosphate. In addition, geranyl diphosphate is an intermediate in the isoprenoid pathway (McClory et al. 2019). The next step in isoprenoid biosynthesis is regulated by phosphomevalonate kinase (PMK), which transforms ATP and mevalonate 5-phosphate into ADP and mevalonate 5-diphosphate. It is suggested that the MVA pathway is regulated by additional as of yet unknown mechanisms governing flow through the pathway and subsequently metabolite yield. It is to discovered that, in addition to the classical MVA and MEP pathway enzymes, plant genomes encode another IPP generating protein, called as isopentenyl phosphate kinase (IPK) (Dellas et al. 2013). In plants, IPK localizes to the cytoplasm, where it transforms isopentenyl phosphate (IPP) and possibly dimethylallyl phosphate (DMAP) to IPP and DMAPP via ATP-dependent phosphorylation (Henry et al. 2018). Isopentenyl diphosphate isomerase (IDI) is used to produce the dimethylallyl diphosphate (DMAPP) and MVA-5-diphosphate decarboxylase (MVD) is used to produce isopentenyl diphosphate (IPP) (Wang et al. 2014). The isopentenyl diphosphate (IPP) precursor is involved in the production of all terpenoids (Fig. 1 A). Moreover, the MVA/MEV pathway is involved in the synthesis of triterpenes, sesquiterpenes, dolichol and brassinosteroids (Böttger et al. 2018).

Fig. 1
figure 1

 A schematic representation of MEV/MVA and MEP biosynthetic pathways of triterpenoids and their types. (A) The MVA pathway originating from the cytosol of the cell from Acetyle-CoA. It is induced into peroxisome via a series of steps at the point of MVP to MVPP. The enzymes proposed to have the highest degree of control over the metabolic flux via the MVA pathway (HMGR) are underlined. The negative regulators of the MVA pathway at the post-transcriptional level are colored in pink. (B) The MEP pathway occurs in the plastids. It starts from pyruvate and G3P molecules and ends when GPP is form via a series of steps. The enzymes proposed to have the highest degree of control over the metabolic flux via the MEP pathway (DXS) are underlined. The positive and negative regulators of the MEP pathway at the posttranscriptional level are colored in pink and red, respectively. (C) In the endoplasmic reticulum, squalene is synthesized via the fusion of both pathways. GPP leads to the synthesis of 2,3-oxidosqualene. 2,3-Oxidosqualene is the most common compound for pentacyclic as well as tetracyclic products. The enzymes that catalyze the steps are shown along with the arrows. (D) The biosynthetic pathway of pentacyclic triterpenoid scaffolds. Abbreviations: Oxidosqualene cyclase (OSC), chair-chair-chair (C-C-C), multi-functional oxidosqualene cyclase (MOSC), lupeol synthase (LUP) and β-amyrin synthase (BAS). (E) The possible biosynthetic routes for tetracyclic triterpenoids. Abbreviations: (C-C-C) chair-chair-chair manner, (C-B-C) chair-boat-chair manner, (OSC) oxidosqualene synthase cyclases, (LAS) lanosterol synthase, (LAS1) lanosterol synthase 1, (LAS2) lanosterol synthase 2, (FMO) flavin-containing monooxygenase and (GST) glutathione S-transferase

Methylerythritol 4-phosphate/deoxyxylulose 5-phosphate (MEP) pathway

The MEP pathway was originally discovered in bacteria by (Rohmer et al. 1993). However, further evidence has shown that it is widely found in phototrophic eukaryotes (Cordoba et al. 2009). Numerous homologous genes have been isolated and cloned independently from many plant species, such as periwinkle (Catharanthus roseus) (Chahed et al. 2000), Arabidopsis (Arabidopsis thaliana) (Carretero-Paulet et al. 2002) and peppermint (Mentha piperita) (Lange and Croteau 1999). In plants and certain protozoa, IPP/DMAPP biosynthesis occurs in the plastid organelles (Böttger et al. 2018). By compartmentalizing the MVA pathway in the cytoplasm and the MEP pathway in the plastids, plants optimize isoprenoid biosynthesis and regulation according to fixed carbon and ATP availability (Vranová et al. 2013). The MEP pathway utilizes D-glyceraldehyde 3-phosphate and pyruvate for DMAPP and IPP biosynthesis, respectively (Cordoba et al. 2009) (Fig. 1B).

The enzymes in the MEP pathway have been successfully crystallized from many plants and their structures determined. The recent improvements in understanding the structure and function of many enzymes are significant and have been reported in previous studies (Xu et al. 2019; Lim et al. 2020). There are various enzymes needed for the production of the universal building blocks of the terpenoids, IPP and DMAPP from D-glyceraldehyde 3-phosphate and pyruvate, respectively. The first step of the MEP pathway is catalyzed by 1-deoxy-D-xylulose 5-phosphate synthase (DXS), which converts the pyruvate and glyceraldehyde 3-phosphate precursors via a condensation reaction to 1-deoxy-D-xylulose 5-phosphate (DXP) and releases CO2, which is an irreversible reaction that commits carbon to the MEP pathway. The vital enzymes in the MEP pathway are 1-deoxy-D-xylulose 5-phosphate reducto-isomerase (DXR) and 1-deoxy-D-xylulose 5-phosphate synthase (DXS) (Tian et al. 2022). There are many classes of DXS enzymes, which perform different functions in different species in their respective tissues (Zhang et al. 2018a). The second stage requires reductive isomerization by DXR to produce MEP, while the DXR gene is transcribed both in the roots and leaves (Majdi et al. 2014). In the next step, a cytidyl moiety is added to produce 4-(cytidine-5′-diphospho)-2-C-methyl-D-erythritol (CDP-ME) by 2-C-methyl-D-erythritol 4-phosphate cytidyl transferase (MCT) via a diphosphate bridge (Fig. 1B). Phylogenetic analysis shows that the MCTs of plants are different from those of bacteria. Plant MCTs are classified into three groups: (1) monocotyledons, (2) dicotyledons and (3) gymnosperms (Lan 2013). This intermediate is phosphorylated by 4-(cytidine-5′-diphospho)-2-C methyl-D-erythritol kinase (CMK) and then cyclized after the loss of the cytidyl group to generate 2-C-methyl-D-erythritol 2,4-cyclodiphosphate (CDP-MEP) in a reaction catalyzed by 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase (MDS) (Fig. 1B). CMK is known to be encoded by a single-copy gene in both bacteria and plants (Kim et al. 2008). In plants, only the tomato (Rohdich et al. 2000) and Nicotiana (Ahn and Pai 2008) CMKs have been characterized. The homo-trimeric MDS protein has been shown to share three active sites between adjacent copies of the protein and requires Zn2+ and Mn2+ for activity (Richard et al. 2002). In the next step, (E)-4-hydroxy-3-methylbut-2-enyl diphosphate (HMB-PP) is produced in a reaction catalyzed by (E)-4-hydroxy-3-methylbut-2-enyl diphosphate synthase (HDS). The previous study demonstrated that HDS (also named GCPE) was essentially involved in the MEP pathway in E. coli (Hsieh and Hsieh 2015). Consequently, DMAPP and IPP are produced by (E)-4-hydroxy-3-methylbut-2-enyl diphosphate reductase (HDR), as shown in Fig. 1 (Uchida et al. 2018). It is now usually accepted that HDR and DXS, and to some extent DXR, are the key enzymes that control the metabolic flux in the MEP pathway (Uchida et al. 2018). In mutants in which the specific MEP-pathway reactions are blocked, there is generally a correlation between reduced transcription and protein levels for all of the MEP-pathway enzymes with the exception of DXS and HDR (Cordoba et al. 2009), supporting the view that post-transcriptional mechanisms control the MEP-pathway activity. The precursors for the biosynthesis of gibberellins, diterpenoids, carotenoids and chlorophylls have been proposed to be formed via the MEP pathway (Shi et al. 2016).

The biosynthesis route for squalene and squalene epoxide

IPP and DMAPP are biosynthesized via the MEP and MVA pathways. However, the rest of the route to the biosynthesis of squalene, that is a precursor of the triterpenoids, is the same in fungi, plants and bacteria. Upon the fusion of DMAPP and IPP, geranyl diphosphate (GPP) is formed in a chemical reaction catalyzed by GPPS (Chen et al. 2015). FPP is formed in a chemical reaction catalyzed by one of the main enzymes, FPPS (Lee et al. 2017).

Two reactions are involved in the process of converting the FPP to squalene. The initial reaction involves the fusion of two FPP biomolecules to form pre-squalene diphosphate (PSPP) under the action of squalene synthase (SS). In the second reaction, PSPP is transformed to squalene under the activity of SS (Liu and Fu 2018). Squalene monoxygenase (SQE), which is also known as squalene epoxidase, is a flavin adenosine dinucleotide (FAD)-dependent epoxidase enzyme that uses NADPH and molecular oxygen to oxidize squalene into 2,3-oxidosqualene (squalene epoxide). The first oxygenation step in terpenoid biosynthesis is catalyzed by squalene epoxidase and is assumed to be one of the rate-limiting enzymes in the pathway (Yoshioka et al. 2020) (Fig. 1 C). The lack of understanding regarding its structure and function has inhibited the production of its inhibitors (Padyana et al. 2019a). Many scientists are trying to illustrate the structural basis for the specificity of the SQE-catalyzed epoxidation reaction to enable the next-generation of inhibitors to evolve rationally. 2,3-Oxidosqualene is known as a common precursor for steroidal saponins, triterpenoid saponins and phytosterols (Xue et al. 2018; Padyana et al. 2019b; Sagatova 2021).

Biosynthesis of pentacyclic triterpenoids

To date, > 23,000 structures of triterpenes have been discovered from natural sources. These include > 100 structural scaffolds, ranging from acyclic to hexa-cyclic structures. Triterpene scaffolds are adorned with a diverse range of functional groups, such as carbonyl, hydroxyl, carboxyl, epoxy, acyl, malonyl, alkyl and glycosyl, which contribute to a wide variety of structures (Thimmappa et al. 2014a). Pentacyclic triterpenoids and tetracyclic triterpenoids are very important types of triterpenoids because of their variety of pharmacological properties (Nguyen et al. 2015). They can be further subdivided into two types: baccharane derivatives and pentacyclic derivatives of hopane. Oleanane, ursane and lupane, which are derived from β-amyrin, α-amyrin and lupeol, respectively, represent the major pentacyclic triterpenoid scaffolds (Ghosh 2016) (Fig. 1D). The initial diversifying step in the triterpene/sterol pathway is catalyzed by a family of enzymes, known as the oxidosqualene cyclases (OSCs), which can convert 2,3-oxidosqualene into a variety of cyclic triterpenes. A. thaliana cycloartenol synthase (AtCAS1) was the first OSC to be cloned from a plant species (Corey et al. 1993). For more than 50 years, studies on OSC-mediated cyclization reactions have attracted the attention of organic chemists and biochemists. The conversion of 2,3-oxidosqualene into cyclic triterpenes is considered one of the most complex enzymatic reactions that occur in triterpene metabolism (Thimmappa et al. 2014a). Since squalene is the precursor of all triterpenoids, in order to produce pentacyclic triterpenoids, it is oxidized into 2,3-oxidosqualene. 2,3-oxidosqualene is catalyzed and generate various cyclic triterpenes by OSCs which is the first diversification stage of the triterpene biosynthetic pathway and also the branching point for the biosynthesis of sterols and steroid hormones (Ghosh 2016).

Biosynthesis of tetracyclic triterpenoid

The tetracyclic group is a major type of triterpenoid and its structure contains four carbon rings. A few compounds in this group occur in plants, the others are present in fungi. The most important members are the tetracyclic derivatives of lanostane (Yang et al. 2020). Figure 1E identifies potential tetracyclic triterpenoid pathways. The important tetracyclic triterpenoids are discussed in what follows.

Mogrosides (Cucurbitane-type)

Cucurbitanes are isolated from several plants of Cucurbitaceae. Mogrosides, are a family of Cucurbitane-type tetracyclic triterpenoid saponins, which are used worldwide as highly potent sweeteners and possess a variety of notable pharmacological activities. Squalene is oxidized in the presence of squalene epoxidase to 2,3-oxidosqualene and the mogrosides are synthesized from 2,3-oxidosqualene via a series of steps catalyzed by cucurbitadienol synthase, Cyt P450s (P450s) and UDP glycosyltransferases (UGTs). However, the relevant genes have not yet been characterized (Fig. 1E) (Dai et al. 2015; Wang et al. 2020).

Lanostane

Lanostane is an important tetracyclic triterpenoid and lanosterol metabolites are used in the treatment of a variety of diseases, such as hepatitis, hypertension, neurasthenia, chronic bronchitis, leukopenia and the adjuvant treatment of cancer hyperlipidemia (Wu et al. 2013). Squalene is first oxidized in the presence of squalene mono-oxidase enzyme. The product of this reaction is 2,3-oxidosqualene, which is converted into lanosterol in the presence of lanosterol synthase. The energy required for this reaction originates from nicotinamide adenine dinucleotide phosphate (NADPH). In the presence of cytochrome p450 enzyme, lanosterol is converted into lanostane-type triterpenoids.

Shionone

Shionone is the major tetracyclic triterpenoid of Aster tataricus, possessing a unique all-six-membered tetracyclic skeleton and 3-oxo-4-monomethyl structure (Sawai et al. 2011a). Shionone enhances sputum secretion, but reduces xylene-induced ear edema. In the biosynthesis of the Shionone-type triterpenoids, 2,3-oxidosqualene is converted into dammarenyl cation via an enzymatic reaction, and then converted into the baccharenyl cation. Shionone (enol form) is produced when enzymes act on the baccharenyl cation (Sawai et al. 2011b).

Dammarane-type triterpenoids

Dammarane is a tetracyclic triterpene found in the sapogenins forming triterpenoid saponins (Mills and Werner 1955).Various kinds of diseases, such as liver and kidney disease, hyperlipidemia, cardiovascular, neuro-degenerative, urinary bladder and bone diseases, diabetes mellitus, metabolic syndrome and cancer, can be treated and prevented using dammarane-type triterpenoids. For instance, gensinosides are important dammarane-type triterpenoids used in the therapy of different diseases, which also exhibit positive effects such as anticancer, anti-inflammation, antioxidation, anti-aging, antifatigue, and physiological functions (Cui et al. 2017). During the biosynthesis of dammarane from the squalene precursor, squalene is converted into 2,3-oxidosqualene and dammarane diol synthase converts it into dammarendiol, which is transformed into protopanaxadiol and then changed into dammarane-type triterpenoids via an enzymatic reaction (Wei et al. 2009).

Withasteroids

The withasteroids are a group of structurally diverse steroidal compounds with a C28 steroidal lactone skeleton, in which a characteristic feature is the presence of an α,β-unsaturated δ lactone ring in the side chain. They are presented primarily in the Solanaceae family, which includes Acnistus, Datura, Danalia, Physalis, Withania and Jaborosa. The isolation and synthesis of the withanolides have received a considerable amount of attention due to their significant biological activities. More than 170 new natural withanolides have been isolated and identified over the last 5 years (Xu and Wang 2020). Withasteroids have great potential with antitumor, cytotoxic, immunosuppressive, anti-inflammatory and chemoprevention properties (Li et al. 2019).

Glycyrrhizin

Glycyrrhizin is a triterpene glycoside (saponin), which is also known as glycyrrhizinic acid. Normally, it is found in potassium and calcium salts in liquorice plants such as Glycyrrhiza glabra, G. glandulifera and G. typica (Eisenbrand 2006). The main active metabolite of Glycyrrhizin is glycyrrhetinic acid (GA). Glycyrrhizin has been used for the treatment and prevention of various diseases, including gastric and duodenal ulcers and the common cold. Several studies have revealed the antiviral activity of GZ against viruses of the Flaviviridae family, HIV, HBV, HCV, HPV, and influenza (Wang et al. 2015). However, the most interesting of its activities is its activity against severe acute respiratory syndrome coronavirus (SARS-CoV). Since 2019, the world has been fighting a common enemy, SARS-CoV-2, the virus that causes COVID-19 (Chrzanowski et al. 2021).

Strategies used to improve the biosynthesis of triterpenoids

Bioinformatics tools for the discovery of the triterpenoids biosynthetic pathways

To discover the secondary metabolites and their pathways, bioinformatics and computational biology provides assistance. By using these technologies, biosynthetic pathways can be computationally predicted via multiple bioinformatic tools and omics technologies, which pave the way for further wet laboratory experimentation on secondary metabolites (Smanski et al. 2016). Moreover, the other approaches which provide assistance in the determination of secondary metabolites includes product isolation, bioactivity assays and purification techniques.

Recent developments in bioinformatics, especially in sequencing technologies, have tremendous potential to synthesize multiple secondary metabolites and predict their unique structural and biological characteristics. However, the discovery of novel bio-active secondary metabolites remains a challenge in these genome mining techniques. In addition, despite having good bioactivity, the low productivity of many secondary metabolites, such as triterpenoids, significantly limits their practical applications (Ren et al. 2020). Importantly, these techniques from genomics, combined with synthetic biological techniques, will resolve some of the inherent limitations of conventional approaches. For instance, in conventional approaches multiple limitations regarding the cultivation of the target species are encountered, as well as limitations in manipulating their inherent biological pathways. In genome mining technologies, we can efficiently erase these limitations using pre-designed algorithms and software (Rutledge and Challis 2015) (Fig. 2).

Tools used to classify biosynthetic gene clusters of secondary metabolites (triterpenoids)

Specialized metabolites (also called natural products or secondary metabolites) derived from bacteria, fungi (Noushahi et al. 2021), marine organisms and plants constitute an important source of pharmaceuticals. Many specialized metabolites are biosynthesized via metabolic pathways whose enzymes are encoded by clustered genes on a chromosome. Metabolic gene clusters comprise a group of physically co-localized genes that together encode enzymes for the biosynthesis of a specific metabolite. Although metabolic gene clusters are generally not known to occur outside of microbes, several plant metabolic gene clusters have been discovered in recent years (Chavali and Rhee 2018). For example, in the analysis of 17 plant genomes, Boutanaev et al. investigated the prevalence of terpenoid clusters by recovering terpenoid synthase and cytochrome P450 (CYP) gene pairs that were located within 30, 50, 100, 150 and 200 kbp in each genome. Their study found evidence for different mechanisms of pathway assembly in eudicots and monocots (Boutanaev et al. 2015).

Through recent progress in sequencing techniques, several bioinformatics tools have been built for the prediction of mining the sequenced microbial genomes. Most of them are genome mining tools based on signatures that use basic local alignment search tool (BLAST) or hidden Markov models (HMMs) to recognize the signature genes accountable for specific secondary metabolite biosynthesis. PhytoClust; a tool specifically used for the identification of metabolic gene clusters in plants. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes (Töpfer et al. 2017). Furthermore, ClustScan (Chavali and Rhee 2018), Natural Product (NP) searcher (Fedorova et al. 2012), Secondary Metabolite Unknown Regions Finder (SMURF) (Khaldi et al. 2010) and Antibiotics & Secondary Metabolite Analysis Shell (antiSMASH) (Blin et al. 2019) are some examples of bioinformatics tools. In particular, databases for identified biosynthetic gene clusters (BGCs), for instance, Minimum Biosynthetic Gene Cluster Information (MIBiG) 2.0 & antiSMASH can also be used for genome mining (Blin et al. 2019). Environmental Surveyor of Natural Product Diversity (eSNaPD), which is a web-based bioinformatics tool, has also been developed to survey secondary metabolites, BGCs and a variety of metagenomic DNA sequences. PCR-produced sequence tags through metagenomic DNA are matched to the standard databank of clusters of genes to estimate the synthetic heterogeneity hidden inside a metagenomic DNA pool and discover BGCs, which express new secondary metabolites, such as triterpenoids (Reddy et al. 2014).

3.1.2. Methods to identify the genes involved in the secondary metabolism, which are not in the form of clusters.

Analytical techniques have also been designed to determine the biosynthetic pathways of secondary metabolites (triterpenoids) in plants, even if the genes involved are not clustered. Two significant methods, which are co-expression and evolutionary genomics, have been used to analyze complicated metabolic routes (Kliebenstein and Osbourn 2012). Inside a standard co-expression study, genetic variants that encode the enzymes used for biosynthesis of a desired compound may be classified using genes as bait for defined enzymes and rating all other genes using a correlation coefficient with the bait. For instance, gene encoding using cytochrome P450 as a bait has been used to find the genes that show high co-expression and the unidentified 4-hydroxyindole-3-carbonyl nitrile pathway in Arabidopsis thaliana (Rajniak et al. 2015). To measure the patterns of gene co-expression, we can use weighted network correlation analysis (WGCNA). The extraction of highly correlated gene modules with correlation network will lead to the recognition of candidate pathways (Lu et al. 2019). A further efficient choice for observing functional connections between biosynthetic genes is evolutionary genetic analysis using profiles involving phylogeny to identify co-occurrence among genomes. For instance, clustering by inferred models of evolution (CLIME) is a tree-structured cluster genes algorithm based on a predictive background, which can also forecast new representatives of the pathway based on common ancestry (Li et al. 2014).

Genome mining techniques

The completion of the very first pattern of plant genomes gives considerable insight into the capacity of the plant species to produce triterpenes. Analysis of large amounts of data representing the entire set obtained from the natural host is interpreted using different and novel kinds of means, such as comparative transcriptome profiling, co-transcriptional patterns, and functional genomes. In the non-natural host, the substrate-specific enzyme approach is most commonly used. Protein directed evolution, semi-rational design and rational design are carried out to discover new triterpenoids (Furubayashi et al. 2015). The genetic code of Arabidopsis thaliana includes 13 oxidosqualene cyclase genes, which suggest the existence of multiple genes of oxidosqualene cyclases in plant genomes. Moreover, 13 oxidosqualene cyclases of Arabidopsis thaliana have been subsequently identified, which produce various triterpenes (Lodeiro et al. 2007). Many of the already described oxidosqualene cyclase genes have been cloned using expression-based methods, utilizing degenerate primers and rapid cDNA end amplification (RACE), polymerase chain reaction (PCR) or cDNA library screening (Augustin et al. 2012). Current advancements in transcriptomic high-throughput methods are leading towards successful gene discovery techniques that can be extended to various plant organisms.

Methyl jasmonate (MeJA) is used to treat cell suspension cultures and is one of the important approaches used to induce and examine triterpene biosynthesis in reactions treated with an inducer (Thimmappa et al. 2014b). Real-time RT-PCR has also detected the up-regulation of MeJA expression in the Gl-LS gene in Genoderma lucidum mycelia. Gl-LS promoter involvement during the participation of MeJA was also up-regulated (Shang et al. 2010). From the genome sequences of many microorganisms, plants, and fungi, it is evident that various triterpenoid scaffolds hidden away in plant genomes are likely to be synthesized with a broad yet unrealized capacity. Expanding new genomic technologies to include more plant varieties and accessions will undoubtedly help progress in realizing this opportunity (Thimmappa et al. 2014c).

Factors used to improve enzyme activity in a triterpenoid biosynthetic pathway

In order to enhance the biosynthesis of triterpenoids, many factors can be improved. One of the most imperative factors is the enzymatic activity, which is involved in the biosynthesis of triterpenoids. In the microbiologically engineered strains, the activities of enzymes derivated from plants can be increased and improved (Guo et al. 2020). Two main factors are involved in the enzymatic activity, i.e., the enzyme amount and enzymatic activity (Bisswanger 2014). To increase the enzyme amount, the process of transcription can be targeted via promoter optimization and the process of translation can be targeted via codon optimization. The enzymes characteristics, such as stability, activity and selectivity, can be improved using various methods, such as tailor-made enzyme immobilization (Mateo et al. 2007). In this way, we can increase and enhance the enzymatic activity. As a result the rate of the biosynthetic reactions in the pathway will be increased (Wei et al. 2009). Recently, the heterologous production of triterpenoids in Saccharomyces cerevisiae has been successfully implemented by introducing various triterpenoids biosynthetic pathways. By engineering the related enzymes as well as yeast metabolism, the yield of various triterpenoids has been significantly improved from a milligram-scale per liter to gram-scale level per liter. This achievement demonstrates that engineering of the critical enzymes can be considered as a potential strategy to overcome the main hurdles for the translation of these potent natural products to industry. Guo et al. (2020b) have reviewed the strategies used to enhance the activities of enzymes to improve the yield of triterpenoids in S. cerevisiae, which can increase the supply of the 2,3-oxidosqualene precursor, optimize the triterpenoid-involved reactions and decrease the competition of the native sterol pathway.

Optimization of the biosynthetic genes codon

Because the genetic code differs in different organisms, codon optimization is generally used as a technique to improve the success of heterologous gene expression (Elena et al. 2014). Moreover, analytical models have been created for cases such as the reconstitution of an entire secondary metabolite pathway in which multiple genes will require simultaneous optimization. Early tools for codon optimization, such as JCat, UpGene and OPTIMIZER, focus on data using codon information for specific species and optimization of the gene of interest using the abundant availability of analogous codons for every amino acid position, generally alluded to as the “CAI = 1” principle (Puigbò et al. 2007). Furthermore, analogous codons are not used frequently in nature, so these methods have neglected numerous standards for analogous codon usage. For example, the local GC content can alter the consistency of mRNAs (Zhao et al. 2017). The properties of the resulting DNA, including the GC content and repeated sequences, followed by optimization of the codon also greatly influence the expression of proteins (Boël et al. 2016). Fath and co-workers established a multi parameter RNA and codon optimization technique to analyze as well as optimize autologous mammalian gene expression using the lidding window process, in which nine sequence-based variables are considered for sequencing architecture, including enhanced GC material, elimination of the destabilizing RNA components and prevention of secondary RNA structures (Fath et al. 2011). Tian et al. (2015) developed Presyncodon, which analyzes the residue codon utilizing the trends in various fragments and anticipates the synonymous collection of a codon in E. coli (Frank and Christiansen 2018). Using Presyncodon, the researchers developed mApple and eGFP with 2.3- and 1.7-fold higher fluorescence compared to those subjected to optimization using high-frequency codons in E. coli. Earlier, the method was further extended to Bacillus subtilis and S. cerevisiae by the same study community (Zhang et al. 2018b). Among these techniques used for codon optimization, some may only provide some relevant aspects to optimize the gene of interest as the techniques themselves may not necessarily be appropriate. For instance, Claassens et al. investigated the expressed genes related to the membrane-integrated proteins of E. coli and compared them to their respective codon-harmonized and codon-optimized counterparts, and further discovered that no single methodology for protein synthesis continuously showed the best performance (Rehbein et al. 2019).

Protein modification of biosynthetic enzymes

To raise the level of understanding of the biosynthetic genes in the heterologous host, we can use the algorithms of codon optimization, whereas to create biosynthetic enzymes with desirable properties for natural product synthesis, computational protein engineering can be used. Numerous computational tools have been designed for de-novo protein structures or sequence design of the template and predictive models (Kuhlman and Bradley 2019).

Eng et al. (2018) recently created ClusterCAD, a computational method for the logical design of recombinant type I modular polyketide synthase (PKS) for the synthesis of polyketides, a large group of secondary metabolites exhibiting novel structures (Markov et al. 2019). A type I modular PKS comprises of various catalytic domains organized in a particular sequence and appears to work synergistically like an assembly line to generate polyketides in a step-by-step manner, enabling researchers to develop designer polyketides by reorganizing the catalytic sites in PKS (Alanjary et al. 2019). By evaluating the structural similarities between the PKS components and similarity in the frameworks of their cognate polyketide intermediates, ClusterCAD lets users recognize the better starting PKS that can produce the polyketide with the closest match to the modified structure and pick supporter modules in addition to the catalytic domain exchange. Researchers have designed a recombinant PKs using ClusterCAD for adipic acid production with tremendous results. Even though very few of these situations are important to triterpenoid synthesis, we envisage the use of analytical models to develop enzymes for biosynthesis with enhanced or modified activity for triterpenoid biosynthesis (Clomburg et al. 2015).

Identification of triterpenoid scaffolds

As mentioned above, the early pathways used to find new OSCs depend on expression-based strategies and progressively on genome mining, where the sequence of the genes is accessible. When the OSCs are cloned and sequenced, the enzymes will be functionally characterized. This is usually accomplished by expression in yeast using strains specially adapted for this function, either by utilizing auxotrophic sterols or efficient precursor engineering that generate high levels of 2,3-oxidosqualene (Pollier et al. 2013). Cowpea mosaic virus-based HyperTrans (CPMV-HT) has already been proven as a highly efficient technology for the rapid and short-lived expression of a number of structurally modified proteins, including vaccines, antibodies and empty viral particles (Sainsbury et al. 2009).

Tailoring enzymes

In addition to oxygenation and glycosylation, numerous other reactions can take place in triterpene scaffolds. It has been reported in oat that the antimicrobial triterpene glycosides synthesized in the roots at C21 undergo acylation either with N-methyl anthranilate or benzoate. Moreover, serine carboxypeptidase-like acyltransferase AsSCPL1 is involved during this process (Sun et al. 2019). O-glucose esters have been used for comparison with BAHD acyltransferases, which use coenzyme A (CoA) -thioesters as an acyl donor SCPL-like acyltransferase. AsSCPL1 utilizes two sugar donors, N-methyl anthranilate-O-glucose and benzoyl-O-glucose. N-Methyl anthranilate synthesizes anthranilate N-methyltransferase AsMT1 (Mitsuguchi et al. 2009). The glycosyltransferase class 1 (AsUGT74H5) then causes glycosylation of N-methyl anthranilate into N-methyl anthranilate-O-glucose and the related enzyme AsUGT74H6 converts benzoate into benzoyl-O-glucose. Remarkably, AsUGT74H5, AsSCPL1 and AsMT1 are directly next to each other and give rise to a portion of a significantly large biosynthetic cluster for avenacin synthesis, which also contains AsBAS1 β-amyrin synthase genes and P450 as CYP51H10 β-amyrin-modifying genes. C21 acylation is a typical feature in most of the cytotoxic glycosides of the triterpenes. Likewise, the useful biological function of avenacin includes acylation. These enzymes, which are involved in the production of acyl donors and the corresponding transition of the acyl group to the triterpene scaffold are also likely to be valuable tools for the alteration of triterpenoids (Zhan et al. 2016).

Strain improvement and medium optimization

Microbiologists have used many efficient methods in order to overproduce bacteria or fungi. Enhanced triterpenoid production can be carried out in the plants using some of these approaches. Selection of parent plant having the highest number of desired products for the induction of callus in order to gain high producing cell lines is the crucial point for strain improvement. Segregation and assortment methods are often needed throughout cultivation, since their inherent issues such as genetic instability and epigenetic instability is a significant problem with plant cells (Lu et al. 2011). Variability often results in the reduced efficiency of the subculture and attributed to genetic changes by culturing defects or epigenetic changes stemming from physiological conditions. Changes in the culture environment can reverse these and by scanning for a target community of cells in the heterogeneous population, characteristically found in the prepared cultures of plant cells. Knowledge on the factors which regulate secondary metabolism is just as critical as high-producer selection through secondary metabolite development in the cell lines (Fang et al. 2020).

Genetic engineering techniques in biosynthesis of triterpenoids

Genetic engineering includes isolating, characterizing, and rearranging genetic material and passing it to other organisms. There are two primary goals of genetic engineering, i.e., grouping of the properties found in diverse plants or cells within one organism, and the incorporation of specific and active regulatory mechanisms. Squalene and botryococcene are linear hydrocarbon triterpenes, which are obtained in transgenic Arabidopsis thaliana plants using a variety of engineering strategies (subcellular targeting and gene stacking) and used to assess the potential for these two compounds (Alanjary et al. 2019). In plants, various attempts have been focused at farnesyl diphosphate synthase (FPPS), 3-hydroxy-3-methylgulutaryl-CoA reductase (HMGR) and squalene synthase to increase the production of triterpenoids (Richter et al. 2015; Fu et al. 2019), but transgenic plants may exhibit growth inhibition rather than increased production (Shim et al. 2010). In fact, the co-overexpression of FPS and the HMGR catalytic domain alleviate the growth inhibition caused by the individual overexpression of FPS or HMGR catalytic domain (Manzano et al. 2004). Further elucidation of the isoprenoid biosynthetic mechanisms is required to improve triterpenoid productivity in plants. Artemisinin, a component of the combination therapy for malaria, is a sesquiterpenoid, which has been produced in genetically modified yeast (Lenihan et al. 2008) via the induction of farnesyl pyrophosphate production, a starter for artemisinin production. Similarly, Escherichia coli has also been engineered for the production of artemesinic acid by manipulation and introduction of the yeast MEV pathway for FPP production (Martin et al. 2003). Taxol is a diterpenoid, which exhibits strong anti-cancer properties has been recombinantly produced in Taxus plant cell cultures as well as in Escherichia coli via the manipulation and introduction of the taxadiene-producing pathway (Ajikumar et al. 2010). Likewise, carotenoids, which are tetraterpenoid pigments, have been genetically engineered in a variety of hosts from E. coli to plants (Giuliano 2014).

Bio-transformation techniques

Biochemical methods to convert cultivated plant cell compounds have great potential and are an essential approach to amending natural and synthetic chemicals. The strength of the enzymes in the cultivated cells of plants can fundamentally be used for bio-transformation. Enzymes in plants are capable of catalytic and stereospecific responses, which can be used for manufacturing specific compounds. Plant cell cultures have been employed in the analysis of model structures of such biosynthetic routes (Muffler et al. 2011). Glycosylation can be considered as a tool for detoxification in plant cell phytotoxic agents, which ends in the accumulation of glycosylated materials. Therefore, poisonous monoterpenes added exogenously are destroyed by cell cultures and metabolized without any particular areas of accumulation (Haralampidis et al. 2002). The bio-transformation and degradation products of exogenous terpenes can easily be obtained via rapid metabolism using cell suspension cultures. Moreover, the product can be stored in the resin or adsorbed in a non-polar organic phase, and the biotechnological process of obtaining the appropriate product in the prepared plant cell culture is conceivable (Parra et al. 2009).

Fig. 2
figure 2

Strategies or methods used to discover and improve the biosynthesis of triterpenoids at a bioinformatics and molecular level

Conclusions and future prospects

Triterpenoids and its biosynthesis occurs in different organs of an individual as a result of two different pathways: the mevalonate pathway (MVA) and the methylerythritol 4-phosphate/deoxyxylulose 5-phosphate (MEP) pathway. In this review, we summarized these two major types (MVA and MEP) with regard to their biosynthetic pathways. Furthermore, we concluded the methods for screening and identifying the genes involved in the pathway and highlight the appropriate strategies that are used to enhance their biosynthetic production, to facilitate the commercial process of triterpenoids through the synthetic biology method. Moreover, in this review, we elaborated the strategies from conventional to molecular and bioinformatics levels used for the biosynthesis of triterpenoids. In addition to designing improved models and algorithms to make the procedure more reliable and highly active, it will also be beneficial to consider the rate-limiting factors to improve a specific pathway structure. Due to their benefits, extensive research has been conducted on terpenoids. However, terpenoid derivatives and their mechanism of regulation remain to be discovered.