Background

As performers and embodiments of life activities, functional proteins are important in all aspects of human life. For example, medicinal proteins, food proteins, and industrial enzymes have had important impacts on modern society [1, 2]. Medicinal proteins, as the fastest-expanding area of the global health care business, will have a global market of approximately $400 billion by 2025 [3]. The market for industrial enzymes is approximately $7 billion and growing at a 4% annual rate [4]. In addition to meeting the food protein needs of nearly 7.7 billion people worldwide, this number is increasing at a rate of approximately 1.07% each year and is predicted to reach 10 billion by 2050 [5]. As the world population is growing and the protein market is increasing, additional issues arise because traditional plant and animal proteins are unable to supply increasing protein demand. New protein products and sustainable manufacturing techniques are in high demand. Microbial production of proteins is an important strategy for alleviating this dilemma due to the flexibility and efficiency of production [5]. The microorganisms used for the production of functional proteins range from bacteria to fungi, further increasing the availability of proteins. For example, yeast has been employed for producing approximately one-sixth of all pharmaceuticals licenced for human use and is especially important in the manufacture of insulin analogues and hepatitis vaccinations [6]. Significant achievements have been made in food production through the utilization of microbial fermentation as an alternative method. Fungal single-cell proteins can serve as a direct source for producing meat alternatives, and the recombinant proteins produced by fungi can be employed as technical additives in the production of meat substitutes [7, 8]. For example, mycoprotein from Fusarium venenatum was used instead of chicken breast tissue to make chicken nuggets [9], the filamentous fungus Aspergillus oryzae was employed for the production of hamburger patties [10], and S. cerevisiae-derived exogenous cytokines were used to promote the growth of porcine muscle satellite cells (MuSCs) for cultured meat production [7]. In summary, microbial proteins represent a dominant paradigm for future protein manufacture.

Yeast is a common protein-producing host. Among microorganisms, S. cerevisiae is a generally recognized as safe (GRAS) microorganism. S. cerevisiae has a clear genetic background and an abundance of molecular biology tools that facilitate the design of strains. It is also well adapted to industrial processing and has excellent resistance to chemicals and secondary metabolites [3]. In addition, with the enrichment and refinement of metabolic engineering techniques, the “Design-Build-Test-Learn” cycle of S. cerevisiae has already been substantially shortened and is increasingly used for the manufacture of heterologous proteins (Fig. 1). In particular, the invention and advancement of artificial intelligence techniques have significantly improved the ability to rationally construct genetic elements, modules, and metabolic pathway networks. For example, machine learning has been used to construct promoters [11] and genome-scale metabolic models [12]. The development of metabolic models has improved the capacity to precisely control gene expression and helped forecast S. cerevisiae behaviour in a variety of situations [13, 14]. The efficient gene editing tool CRISPR/Cas9 has also been applied as a revolutionary and versatile strategy for genome editing in S. cerevisiae [15]. In addition, the S. cerevisiae Genome Synthesis Project (Sc2.0) intends to develop a completely synthetic yeast genome [16]. The genome resynthesis of S. cerevisiae enables it to have new functional and evolutionary potential and has been employed to produce valuable metabolites (alkaloids, terpenoids, flavonoids, etc.) at a high level, laying the framework for efficient protein production [17]. As research progresses, several yeasts such as Pichia pastoris, Yarrowia lipolytica, and Kluyveromyces lactis, have also been developed for protein production. For instance, P. pastoris, Y. lipolytica, and K. lactis are Crabtree negative, while S. cerevisiae is Crabtree positive. P. pastoris has a shorter mannan chain than S. cerevisiae [18]. In terms of substrate utilization, P. pastoris can utilize pentoses, glycerol and methanol as carbon sources; Y. lipolytica can utilize lipids; and K. lactis can utilize lactose [19]. S. cerevisiae has been modified to grow on different substrates such as glycerol [20, 21], pentose [22, 23], and methanol [24]. Although many alternative yeasts have emerged more recently, research on these alternatives has been relatively limited, and metabolic tools for these yeasts are not as rich or complete as those for S. cerevisiae. Therefore, S. cerevisiae still stands as a major workforce for recombinant protein production.

Fig. 1
figure 1

Construction of S. cerevisiae cell factories. The advent of new technologies has paved the way for designing S. cerevisiae to become a perfect production platform, significantly reducing strain construction time and accelerating the entire design, build, test, and learning cycle

Advances in synthetic biology and systems biology have led to the development of molecular techniques and modification strategies for efficient protein synthesis in S. cerevisiae [25]. This article systematically summarizes the engineering strategies used to enhance protein production by S. cerevisiae. This review examines the benefits of S. cerevisiae as a host for protein synthesis and categorizes its main heterologous protein products. The strategies for constructing efficient protein-producing yeast strains are summarized and discussed, including the construction of protein hyperexpression systems, protein secretion engineering, glycosylation pathway engineering, and systems metabolic engineering (Fig. 2). Moreover, potential strategies for accessing high-yield proteins and ensuring their sustainable production by S. cerevisiae are also proposed.

Fig. 2
figure 2

A review of engineering strategies for improved protein production by S. cerevisiae, including the construction of a hyperexpression system, secretion engineering, glycosylation pathway engineering, and systems metabolic engineering

Advantages of producing heterologous proteins by S. cerevisiae

S. cerevisiae has many advantages for protein production. First, as a domesticated microorganism with a robust history of safety, it has been frequently employed to create a wide range of recombinant proteins. The recombinant protein products have been authorized by the Food and Drug Administration (FDA) and the European Medicines Agency (EMEA) [26], and the requirement for viral detection has even been eliminated for medicinal products based on risk assessment and procedure validation [3]. In addition, S. cerevisiae possesses sophisticated eukaryotic structures that enable appropriate protein folding and post-translational modifications (PTMs), including acylation, glycosylation, disulfide bond formation, and hydrolysis of signal peptides, during protein production [27]. In addition, it can release proteins into the extracellular environment, facilitating further purification. Additionally, the high degree of mannose-type N-glycosylation in S. cerevisiae can be reduced by genetic engineering, which results in recombinant proteins with humanized glycosylation patterns, such as active antibodies [6]. Based on these biological properties, S. cerevisiae is a potential host for heterologous protein production.

Moreover, S. cerevisiae can express heterologous proteins as up to 49.3% (w/w) of its own protein [5]. S. cerevisiae has been commonly documented to produce heterologous proteins, and its current status is summarized in Table 1. It has been widely used for producing medicinal proteins, food proteins, and industrial enzymes. For example, several medicinal proteins, such as monoclonal antibodies, hormones, and growth factors produced by S. cerevisiae, are already on the market [28]. S. cerevisiae is also a popular choice in eukaryotic membrane proteins biosynthesis because its translate and post-translation processing are rapid, easy, and inexpensive [29]. In the foreseeable future, microbial proteins will maintain their position as a prominent modality in industry, food, and medicine. However, wild S. cerevisiae still suffers from protein yields well below the theoretical values, inefficient secretory transport, and other issues. Current engineering strategies for increasing S. cerevisiae protein output include hyperexpression systems, protein secretion engineering, glycosylation pathway engineering, and systems metabolic engineering.

Table 1 Examples of recombinant proteins produced in S. cerevisiae

Construction of protein hyperexpression systems

Exogenous gene expression at a high level is a key step for protein production. Certain strategies have been developed to achieve protein hyperexpression in S. cerevisiae. For instance, codon optimization, increasing gene copy numbers, and transcriptional regulation (including promoter and terminator engineering) have been employed to increase protein expression levels (Fig. 3).

Fig. 3
figure 3

Strategies for protein hyperexpression systems, including codon optimization, increasing gene copy numbers, and transcriptional regulation

Codon optimization

Codon usage bias in organisms is mostly the result of genetic drift, mutation, and natural selection [52]. The efficiency of translation can be impacted by the high occurrence of rare codons. Therefore, codon-optimized gene expression is a common method for overproducing proteins and is described as “the in silico design of an optimal coding sequence for a given protein using a unique arrangement of alternative codons” [53]. The most common codon optimization strategy for matching host-specific codon use bias is to replace uncommon codons with more commonly occurring codons [54]. In addition, codon optimizations involve modifying the GC content, avoiding base repeats, eliminating restriction enzyme recognition sites, removing Chi-site extended recombination hotspots and SD ribosome binding site sequences, balancing CpG content (affecting transcription initiation), and other factors [55]. This strategy has significantly enhanced the yield of heterologous proteins in S. cerevisiae [45, 53, 54, 56]. Examples of these methods include the codon-optimized T. emersonii α-amylase variation (temA-Opt), which results in 1.6-fold extracellular activity more than natural temA, and the codon-optimized Talaromyces emersonii glucoamylase variation (temG-Opt), which results in 3.3-fold more extracellular activity than natural temG [53].

However, conventional codon optimization does not always improve protein expression. For example, the yields of both α-amylase and glycosylase were not improved with a codon optimization strategy [57]. Codon optimization techniques cannot guarantee optimum gene expression. There is certainly more information “hidden” in synonymous codons that is needed for protein synthesis and structure folding. Recent research has demonstrated that even changing a single codon has an impact on many processes, including the speed and accuracy of translation, the folding of proteins during cotranslation, and the protein secretion pathway [58,59,60]. In addition, translation efficiency can be affected by interactions among nearby codons and unsteady base pairing. To control ribosome speed and aid in protein folding, particular codons and combinations of codons are also thought to be essential [61]. Therefore, the effect of codon optimization on translational and post-translational levels should also be considered in subsequent protein production by S. cerevisiae [55].

Increased gene copy number

The frequently used protein expression plasmids in S. cerevisiae include integration plasmids (YIp), centromeric plasmids (YCp), and episomal plasmids (YEp) [62]. YIp is stable when integrated into the yeast chromosome due to the absence of a yeast replication initiation site, although only one copy of the target gene can be added. In addition, YCp has a yeast centromere (CEN) and an autonomous replication sequence (ARS) and has high mitotic stability and a low copy number. YEp contains a 2 μm plasmid replication source and distribution site (STB or REP3) with a high copy number but low stability [63]. The copy number is the essential factor for ensuring the required level of transcription. YEp is commonly used to obtain a high copy number, which can be maintained at 10–40 copies [64]. For instance, the yields of recombinant human albumin and albumin fusion proteins can reach 5 g/L in S. cerevisiae with 2µ-based vector expression systems [65]. In addition, through decreasing the expression of particular marker genes and reducing the stability of marker proteins, the plasmid copy number may be increased. Chen discovered that combining the ubiquitin/N-degron tag (ubi-tag) and promoter modification of a marker gene may result in more than triple the number of 2µ-based plasmid copies [65]. However, the genetic instability of plasmids, including separation instability and structural instability, has a substantial impact on the target product yield, particularly during lengthy and intensive industrial fermentation processes [66].

In addition, chromosomal integration expression is more stable than that of free plasmids [67]. However, the amount of expression is reduced by the low copy number produced by chromosomal incorporation, necessitating an increase in the number of copies. Human alpha-fetoprotein was successfully secreted into culture medium by S. cerevisiae when 5–7 copies of its gene were incorporated into chromosomes [68]. In addition, the expression of heterologous genes is impacted by epigenetic changes connected to chromosomal integration sites, such as altering gene expression as a result of regulatory element interference or gene stoppage after integration into the genome’s protein-coding region [67]. It is preferable to modify the genetic structure without compromising yeast growth [66]. For example, the endo-1,4-β-glucanase ENG1 from Aspergillus niger was efficiently and stably secreted by integrating its gene into the HO site of S. cerevisiae chromosome, whose deletion did not affect yeast growth [69]. Moreover, a more practical approach is to incorporate the recombinant protein-encoding gene constructs into the noncoding genomic region of yeast. The main multicopy sites commonly used for heterologous gene integration in S. cerevisiae are ribosomal DNA (rDNA) and the δ site. The nontranscribed spacer (NTS) of rDNA-based cassettes was utilized to create yeast strains that produce the capsid protein of red-spotted grouper necrosis virus (RG-NNVCP) in a copy number-dependent manner. Oral treatment with altered S. cerevisiae containing 30 copies of the integrated RG-NNVCP cassette elicited effective immunological responses in mice [70]. In addition, the highest level of β-galactosidase, with approximately 8 gene copies, was achieved when integrated into the δ sequence of the retrotransposon Ty1 in S. cerevisiae [71].

Transcriptional regulation

Promoter engineering

The transcription units (TUs) of a gene circuit include three biological elements: the promoter, the coding sequence (CDS), and the terminator. The functions of TUs are represented by substances encoded in CDSs, and the initiation and regulation of CDSs occur at the promoter level. Therefore, the promoter should be carefully designed and selected to ensure circuit operation as intended for synthesizing proteins [72]. The promoters most commonly employed in S. cerevisiae are strong glycolytic promoters and conditionally inducible promoters. Strong glycolytic promoters, such as pTDH3, pPGK1, pTPI1, and pADH1, have high levels of transcription. Conditionally inducible promoters, such as pGAL1, pGAL7, pGAL10, pPHO5, and pMET25, are also suitable for regulated protein expression [6]. Heterologous protein synthesis is usually expected in the late stages of S. cerevisiae fermentation because it can help prevent the unintentional selection of cells that grow more quickly and do not produce proteins or the production of damaging proteins [73]. The pMET25 promoter has been utilized in S. cerevisiae to generate high amounts of human serum albumin (HSA), human interleukin-2, human growth hormone, HSA-fused human glucagon, and human interferon-R in medium lacking methionine [73]. Furthermore, the nitrogen catabolite repressible GAP1 promoter has been employed to provide a high level of recombinant protein and allow for substantial biomass production in S. cerevisiae. This promoter has been used in yeast to produce both human membrane and soluble proteins [74].

However, the lesser availability, poor dynamic range, and insufficient orthogonality of natural promoters further limit their applications. Therefore, promoter engineering has been proposed for designing synthetic promoters with improved properties [75]. Synthetic promoters are mainly developed by altering the sequence of natural yeast promoters through random mutagenesis, minimization, and hybridization [75] (Fig. 4A-C). In addition, the total transcript level of intron-containing genes was much greater than that of non-intron-containing genes in S. cerevisiae [76]. S. cerevisiae introns act as regulators with a 100-fold expression range, broadening the toolbox for synthetic gene expression systems and offering a foundation for accurate and stable gene expression regulation [77]. Cui et al. [78] systematically investigated protein expression by fusing introns and promoters in S. cerevisiae and successfully expanded the dynamic range of promoter subsets. A model for predicting the strength of intron–promoter binding was further trained to improve protein production [78]. Based on the above methods, promoter engineering can be used to obtain a broader range of gene expression to facilitate protein production.

Fig. 4
figure 4

Promoter engineering for protein production in S. cerevisiae. (A) Random mutation and screening of promoter libraries. (B) Construction of the minimal promoter construct. (C) Combination of each element for hybrid promoters. (D) Machine learning procedures for promoter design

Moreover, designing promoters with customized strengths remains challenging because (i) the mutagenesis library is dependent on the transformation efficiency of the strain and (ii) selecting from a library is difficult at high throughput [75]. As a result, several forecast models have been developed to simplify promoter design and fine-tuning. Models can be used to effectively predict protein production from promoter sequences, allowing for the quick development of relevant promoters to aid in synthetic biology studies in this model organism (Fig. 4D). Kotopka et al. [11] tested more than 327,000 sequences in an inducible promoter collection and more than 675,000 sequences in a constitutive promoter pool for gene expression activity. Subsequently, an ensemble of convolutional neural networks was trained using the aforementioned two datasets, resulting in robust predictive accuracy (R2 > 0.79) across various sequence-activity prediction tasks. The model-guided design approach led to extensive collections of promoters exhibiting significant sequence diversity, demonstrating greater activity than that in the training data [11]. In addition, understanding the connection between promoter sequences and expression phenotypes can help predict promoter expression strength. To create deep neural network models that are universally applicable and have great prediction performance, Vaishnav et al. [79] assessed the expression levels of millions of randomly selected promoter sequences in S. cerevisiae. Apart from providing a basic framework for creating regulatory sequences and providing answers to fundamental concerns concerning regulatory evolution, a method of identifying expression selection characteristics from naturally occurring variation has also been suggested [79]. The rational design of these promoters offers additional tools for expressing heterologous proteins.

Terminator engineering

Eukaryotic terminators play a significant role in controlling the transcription process by affecting the stability, effectiveness, and localization of mRNAs [80]. There are only a few native terminators that are commonly employed in S. cerevisiae, such as CYC1t, TDH3t, and PGK1t [81]. Green fluorescence was used as an indicator of terminator activity to quantify the activities of 5302 terminators produced from nearly 90% of the genes in S. cerevisiae. The activity of the top five terminators was approximately 2.5-fold greater than that of PGK1t, while the activity of the weakest terminator GIC1t was only 0.04-fold that of PGK1t. The wide range of gene expression regulation suggested that terminators are important elements for protein expression. In comparison to native terminators, synthetic terminators have a number of advantages, including the ability to synthesize short sequences, a low degree of sequence homology, and equal or greater functional properties. Curran et al. [82] presented a set of synthetic terminators with short (35–70 bp) lengths that may be utilized to modulate gene expression in yeast. Compared to the native CYC1t terminator, the best of these synthetic terminators resulted in a 3.7-fold increase in protein production and a 4.4-fold increase in transcript levels [82]. Furthermore, the expression of EGFP with a short synthetic terminator (33–66 bp) was also increased by 5.57-fold compared with that of the native CYC1t terminator [83]. Combining strong terminators with weak promoters can yield similar results for strong promoters. Curran et al. [84] characterized more than 30 terminators in S. cerevisiae and indicated that a change in the mRNA half-life is the major cause of the variation in protein and transcript expression levels. They demonstrated that when coupled with a low-expression promoter, the relative difference in output between terminators is magnified, with a maximum difference of 35-fold compared to a construct lacking a terminator and a maximum difference of 11-fold between an expression-enhancing terminator and the parent plasmid terminator [84]. Therefore, terminator engineering will be an important strategy for heterologous protein production in the future [83].

Protein secretion engineering

The main step in protein secretion is protein transport from endoplasmic reticulum (ER) to Golgi and further transport to the extracellular space. Increasing protein secretion can significantly enhance protein production. Moreover, the secretory system of recombinant proteins in S. cerevisiae is beneficial for downstream purification and large-scale industrial production, avoiding costly cell rupture, denaturation, and repeatability [66]. However, its intrinsic secretory system has certain limitations, such as hyperglycosylation, incorrect folding, inefficient secretion, and abnormal proteolytic protein processing [66]. Many recombinant proteins in S. cerevisiae were produced only 1% or even less of their theoretical yield, which means that they cannot reach their full potential [85]. Therefore, protein yield and quality can be significantly improved by designing and engineering protein secretion pathways. Secretion signal engineering, ER folding engineering [86], and vesicle trafficking engineering are the major strategies used to modify S. cerevisiae protein secretion pathway system [87] (Fig. 5).

Fig. 5
figure 5

Protein secretion engineering in S. cerevisiae, including secretion signal engineering, ER folding engineering, and vesicle trafficking engineering

Secretion signal engineering

The early stages of the secretory process are influenced by the protein transport mechanism to the ER. One of the most effective approaches for promoting recombinant protein secretion is the use of secretory signal peptides [66]. The signal peptide sequences determine the secretory pathway of proteins, whether cotranslational translocation or post-translational translocation occurs in the ER, and whether the trans-Golgi network is involved [88]. However, even for recombinant proteins with minor sequence or structural differences, the secretory efficiency may vary significantly depending on the specific signal peptide sequence. It has been proven that native, exogenous, and synthetic signal peptides can guide protein secretion in S. cerevisiae [88]. The directed evolution of signal peptides was successfully employed to improve protein secretion. A mutant α-factor signal peptide, α9H2 leader, was attached to laccase, and its production was increased two-fold in S. cerevisiae [89]. Besides, the directed evolution of the signal peptide (MFa1pp) combined with strain engineering increased human IgG1 secretion by 180-fold [90]. Additionally, a modified version of α-factor (αOPT leader) was created by using a dual (top-down and bottom-up) design strategy to optimize signal peptides. This modified form of α-factor can significantly increase the secretion of ascomycete hydrolases (a sterol esterase and two β-glucosidases) and basidiomycete oxidoreductases (aryl-alcohol oxidase, two additional laccases, and versatile peroxidase). Generally, using αOPT leader increased enzyme expression levels from 2- to 20-fold higher than using αnat [91]. Compared to that of the wild-type α-factor, the secretion of the insulin precursor was increased by 2.5-fold with synthetic signal peptides generated through an iterative process of rational design and empirical optimization [92].

Along with the signal peptide, the fusion partner is another powerful tool for promoting protein secretion because it increases the solubility of fusion proteins in the ER and facilitates their transport to the Golgi [93]. Protein secretion has been improved by the use of several fusion partners. These partners mainly include the cell wall proteins Scw4p and Pir4p, the cellulose-binding domain (CBD), the mitochondrial inner membrane protein UTH1, and the ER protein Voa1p [94]. Scw4p has been designed as a universal fusion partner for heterologous protein secretion in S. cerevisiae. Three target proteins (hGH, exendin-4, and hPTH) were fused with the C-terminally shortened Scw4p to boost their secretion, notably yielding approximately 5 g/L of exendin-4 fusion protein [93]. Lipase was effectively released with close to 90% efficiency by employing the cell wall protein Pir4 as a fusion partner, which results in approximately 400 IU of lipase activity per millilitre of cell supernatant [95]. Furthermore, a CBD from Trichoderma harzianum endoglucanase II (THEG) was used to promote the synthesis of Bacillus stearothermophilus L1 lipase, and the secretion of CBD-linker-L1 lipase increased by 7-fold, reaching approximately 1.3 g/L [96]. A N-terminal 98-amino acid domain of the mitochondrial inner membrane protein UTH1 was also employed to secrete Rahnella aquatilis levansucrase (RaLsrA) into the culture medium with a 63% secretion efficiency [46]. Besides, the C-terminally shortened Voa1p, an ER protein that participates in the construction of V0 sector of V-ATPase, was further developed to release small proteins, the amount of human parathyroid hormone produced was multiplied by 5-fold [94].

ER folding engineering

Protein folding is the subsequent stage in the secretory process after translocation to the ER. The ability of the ER to fold proteins is one of the key factors restricting the secretion of recombinant proteins [66]. The unfolded protein response (UPR) is further activated by incorrectly folded peptides or an overabundance of secretory proteins which can result in luminal burden and ER stress. Multiple protective cellular processes can be induced by the UPR, such as ER-associated degradation (ERAD) of misfolded proteins and regulating protein folding [66]. Several approaches of manipulating the ER environment, including enhancing the ER folding capacity and activating the UPR, have been employed to improve protein folding ability in S. cerevisiae [97].

Protein folding in the ER is commonly thought to be a regulatory step in the secretion process. The overexpression of numerous folding chaperones is a basic technique. The yields of human erythropoietin and bovine prochymosin were increased by 5-fold and 26-fold, respectively, when the ATPase Hsp70 family member chaperone BiP was used to induce protein secretion in S. cerevisiae [98]. The collaboration of folding partners creates a diverse set of interactive networks. For example, by binding exposed hydrophobic sequences, Kar2p serves as a folding chaperone and an ER cleaner throughout the ER-associated degradation (ERAD) process. Pdi1P can catalyse the synthesis and isomerization of disulfide bonds and participate in the folding or degradation of non-disulfide proteins. The coexpression of ER chaperone Kar2p and disulfide isomerase Pdi1p could synergistically enhance the secretion of β-glucosidase by 3-fold [99].

The UPR is a widespread, coordinated reaction that eliminates misfolded proteins and enhances the oxidative environment in the ER, increasing the capacity for protein secretion. Hac1p is the main transcription factor regulating this pathway, and its overexpression can activate the entire UPR pathway and boost ER chaperone expression, thus enhancing the efficacy of heterologous protein secretion. For example, the secretion of α-amylase was increased by 2.4-fold by overexpressing Trichoderma reesei-derived HAC1 in S. cerevisiae. The overexpression of native S. cerevisiae HAC1 also enhanced the secretion of endogenous invertase (2-fold) and recombinant α-amylase (0.7-fold) [100]. In addition, Ire1p, a transmembrane protein that controls Hac1p synthesis by regulating mRNA splicing, is also a key component of the UPR pathway. It is essential for ER stress perception and response. Recently, the overexpression of Ire1p in mutant S. cerevisiae increased hepatitis B small antigen (HBsAg) production by 2.12-fold compared to that in the wild-type strain [101]. Moreover, expanding the ER appears to be a sensible course of action to prevent the detrimental consequences of protein overexpression stress and the related production of an unsaturated protein response. The deletion of the lipid-regulating gene OPI1 resulted in an expansion of the ER in S. cerevisiae as well as a 4-fold increase in full-length antibody production [102].

Vesicle trafficking engineering

The protein secretory routes involve vesicle-mediated transport processes such as protein trafficking through the ER, Golgi, trans-Golgi network, endosome, and cell membrane. The target membrane and the membrane of transport vesicles fuse at each phase of trafficking, allowing the delivery of cargo proteins [103]. The secretion of heterologous proteins has been successfully enhanced via vesicle trafficking engineering. Coat protein complex II (COPII)-encapsulated vesicles transport recombinant proteins from the ER to the Golgi. The expression of the peripheral protein Sec16 in the ER can increase ER-Golgi flux and cause additional ER membrane proteins to be directed to Golgi anterograde vesicles, resulting in a decrease in the number of ER membranes [104]. For instance, the secretions of human insulin precursor and α-amylase were increased by 34% and 16%, respectively, after Sec16 overexpression [103]. Additionally, these fundamental components are transported from the Golgi to the ER for continual anterograde transport by vesicles encased by coat protein complex I (COPI). In a background strain expressing Sec16, the GTP-activating proteins (GAPs) Gcs1 and Glo3 promoted retrograde transport from the Golgi to the ER. The increased protein secretion was resulted from the recovery and reintroduction of these components into the ER [104]. These findings suggested that the secretory route depends on a proper balance between anterograde and retrograde transport. Another secretory engineering strategy for enhancing heterologous protein synthesis in S. cerevisiae is to improve protein transport from the Golgi to the plasma membrane [103]. For instance, Sso1 or Sso2 is a yeast synaptic fusion protein that is involved in the fusion of Golgi-derived vesicles with the plasma membrane. The overexpression of Sso1 and Sso2 resulted in 4-fold higher α-amylase secretion [105]. Additionally, lowering intracellular retention can boost protein synthesis. Huang et al. [106] decreased intracellular heterologous protein retention and boosted the protein production capacity of yeast by 5-fold through combinatorically altering known gene targets that are involved in the secretory and trafficking pathways, as well as the histone deacetylase complex. The altered S. cerevisiae could produce 2.5 g/L fungal α-amylase with less than 10% of the recombinant protein was retained within the cells. Several studies have proved that selecting damaged VPS mutants in vacuoles can be used to produce various recombinant proteins more effectively, indicating that preventing errors in positioning into vacuoles can improve secretion [107]. The deletion of VPS4, VPS8, VPS13, VPS35, or VPS36 that encode vacuolar proteases, resulted in increased production of insulin-containing fusion proteins [108]. Similarly, the deletion of VPS10 (sorting receptor coding solution bubble hydrolase) and PEP4 (coding solution vacuolar protease A) reduced the targeting of haemoglobin to vacuoles and protein degradation. When combined with other gene mutations, haemoglobin production was increased, and accounted for approximately 18% of its total protein content [109]. In addition, some proteins may undergo proteasome-based protein degradation [110]. For example, the deletion of extracellular protease Ski5p increased the secretion of killing toxin by approximately 10-fold [111]. The Yap3p protease as an important factor in the degradation of secreted heterologous proteins, its disruption generated a significant increase in products quality including recombinant human albumin (rHA) and rHA-human growth hormone fusion protein (rHA-hGH) [112].

Glycosylation pathway engineering

Glycosylation occurs mainly in the ER and Golgi, and can affect protein activity, stability, and secretion [113]. Introducing or eliminating glycosylation sites at specific locations has become an important strategy for improving the production or catalytic performance of recombinant proteins [114]. Aza et al. [89] introduced N-glycosylation into laccase derived from Pleurotus eryngii, which improved its expression and activity in S. cerevisiae [89]. Glycosylation sites can also be added to increase protein secretion [115]. For instance, the secretion of keratinase was increased 5- and 1.8-fold by introducing glycosylation sites in the N-terminal and C-terminal regions, respectively [116]. Additionally, controlling N-glycan production and trimming might boost protein output. For example, the overexpression of glucosidase CWH41, which is crucial for the precise regulation of protein folding, led to a 40% increase in the amylase titre [117].

Although S. cerevisiae can modify proteins through glycosylation, its degree is relatively high, which can impact protein activity or lead to high allergenicity, especially for humanized proteins [114]. The extension of α-1,6-mannose can lead to hypermannosylation in S. cerevisiae [89]. Therefore, inhibiting the addition of the first mannose to initiate the outer chain is considered a key step in preventing hypermannosylation in S. cerevisiae. OCH1 is a key gene responsible for the initial transfer of α-1,6-mannose to the outer chain [118]. Tang et al. [119] demonstrated that the deletion of OCH1 significantly enhanced the secretion of β-glucosidase, endoglucanase, and cellobiohydrolase. The elimination of mannosylphosphates from glycans is also important for the production of humanized proteins in S. cerevisiae. The MNN1, MNN4 and MNN14 genes have been identified as being involved in mannosyl phosphorylation [120]. Kim’s study showed that in the S. cerevisiae OCH1Δ MNN1Δ MNN4Δ MNN14Δ strain, all mannose phosphorylation was abolished, which can be used to produce humanized proteins [121]. In addition, S. cerevisiae can also perform sugar engineering by disrupting genes encoding specific mannosyltransferases, such as ALG3 and OCH1 [122]. However, ALG3 and OCH1 mutations cause underoccupancy of N-glycosylation sites. The overexpression of RHO1, which encodes the Rho1p small GTPase, was confirmed to partially reverse a growth defect in S. cerevisiae. Therefore, RHO1 can be used for the production of humanized proteins [123].

Systems metabolic engineering

Reprogramming cellular activity is essential for reducing metabolic burden and ensuring recombinant protein production (Fig. 6) [124,125,126]. In addition to regulating protein expression and transport systems, efficient protein production also requires corresponding energy and precursors, which requires the analysis and engineering of metabolic pathways. The human interferon-α2a protein concentration was elevated to 276 mg/L when the key precursor adenine was uniformly introduced into the basal medium at a rate of 2 µg/mL in medium/h for 10–20 h of fermentation [127]. Additionally, Payne et al. [128] identified a S. cerevisiae mutant with high recombinant rHA production, which presented high gene expression levels of LHS1, SCJ1, SIL1, and JEM1, which are involved in regulating the ATPase cycle of the ER chaperone Kar2p. When these target genes were overexpressed individually or jointly, S. cerevisiae displayed clear advantages in the production of granulocyte–macrophage colony-stimulating factor, recombinant human transferrin, and rHA [128].

Fig. 6
figure 6

The construction of a high-protein-producing yeast assisted by systems metabolic engineering, including improving substance and energy metabolism for protein synthesis, reducing oxidative stress, and rationally engineering metabolic pathways guided by multiomics data and constrained metabolic network models

Furthermore, the overexpression of heterologous genes may result in a redox imbalance. For example, some available carbohydrates may be diverted from intended protein synthesis to unwanted byproducts [129]. As a result, growth kinetic parameters, including biomass yield, growth rate, and a particular substrate consumption rate, may be significantly influenced. Because it does not secrete numerous endogenous proteins and purifying the secreted target product is simple, S. cerevisiae is a desirable workhorse for the manufacture of recombinant proteins. High rates of recombinant protein synthesis place tremendous metabolic burden on yeast cells, which leads to oxidative stress and ultimately reduces their ability to produce protein. Increasing the metabolic rate of S. cerevisiae by overexpressing the endogenous transcription factor HAP1 can reduce the detrimental effects of reactive oxygen species buildup associated with protein folding and consequently boost protein output [130]. Additionally, the creation of misfolded proteins or protein aggregates, which can trigger cellular oxidative stress reactions and hence restrict large-scale production, is frequently a barrier to the high-level synthesis of recombinant proteins in industrial microbes. Therefore, reducing oxidative stress can boost recombinant protein synthesis. The yield of α-amylase was successfully increased by 18.7-fold through reducing oxidative stress through enhancing membrane lipid biosynthesis, and inhibiting methionine and arginine biosynthesis [131].

Moreover, the metabolic burden is related to the additional energy cost associated with recombinant protein synthesis or the constrained transcriptional and translational resources needed to compete for the ability to produce and secrete proteins [129]. Metabolic burden regulation is a cell-wide endeavour, and modifying a single pathway has limited effects. Therefore, a metabolic network model combined with large-scale datasets (omics) has been employed to guide and regulate restriction nodes for improving protein production. For example, Ishchuk et al. [132] identified 84 genetic targets for regulating biomass and enhancing haemoglobin production based on the genome-scale metabolic model (GEM) Yeast8 of S. cerevisiae. In the trials, 76 genes were individually deleted or overexpressed, and 40 of these genes could enhance haemoglobin synthesis. The enzyme-constrained Yeast8 model (ecYeast8) was subsequently utilized to improve the model simulations and assess the combinatorial impact of the gene targets. Compared to the control strain, the engineered S. cerevisiae with 11 genetic changes generated 70-fold more intracellular haemoglobin [132]. The proteome-constrained genome-scale protein secretory model of S. cerevisiae (pcSecYeast) was generated by Li et al., allowing them to mimic and explain the phenotypes caused by a restricted capacity for secretion. This approach was also used to predict the targets of the overexpression of numerous recombinant proteins. Many predicted targets for high α-amylase production were validated, demonstrating the application of pcSecYeast as a computational tool to guide the efficient production of recombinant proteins [133]. Furthermore, the combination of high-throughput and omics techniques can further explain the relationship between high protein production and cellular metabolism processes. For example, Wang et al. [134] used RNAi combined with high-throughput microfluidic single-cell screening to obtain strains with improved protein secretion. The results showed that recombinant protein production can be impacted by genes involved in cell metabolism (YDC1, AAD4, ADE8, and SDH1), protein modification and degradation (VPS73, KTR2, CNL1, and SSA1), and the cell cycle (CDC39). Huang et al. [135] used RNA-seq to study the whole-genome transcription response of mutant yeast strains to protein secretion. The results indicated that the changes in energy metabolism could cause a decrease in respiration and an increase in fermentation, as well as a change in the balance between amino acid biosynthesis and thiamine biosynthesis. Huang et al. [136] also utilized high-throughput microfluidics to screen yeast libraries produced by UV mutagenesis. Microfluidic screening combined with whole-genome sequencing was further used to map mutations associated with increased protein secretion, identifying new engineering targets and promoting the design of new cell factories.

The invention and progression of artificial intelligence technologies have significantly enhanced the capacity for the rational construction of gene expression elements and metabolic pathways, providing valuable tools for improving protein production. The N-terminal coding sequence (NCS) is a rate-limiting step in translation and an important element in gene regulation. Wang et al. [137] applied a multiview learning strategy for NCS synthesis in S. cerevisiae. Two models were developed through model training and used to upregulate and downregulate gene expression. Synthetic NCS has greater than 65% accuracy, and its application has proven effective in upregulating the expression of protein-coding genes. Despite the lack of a comprehensive mechanistic understanding, the combination of big data and machine learning can facilitate the modelling of regulatory networks for gene expression and the cellular metabolome. Machine learning has recently been used to map enzyme expression patterns and utilize them to predict metabolite concentrations [138]. Modulations in enzyme expression can influence metabolite levels through synergistic interactions. Zelezniak et al. [139] demonstrated the feasibility of employing machine learning to chart regulatory enzyme expression patterns, and predict the metabolome of kinase-deficient cells using the enzyme expression proteome. Their research quantified the impact of enzyme abundance on metabolic regulation, unveiling the potential of machine learning for comprehending intricate metabolic regulatory processes.

Conclusions

S. cerevisiae has been widely employed to produce heterologous proteins due to its biological advantages. However, the yield of proteins in wild yeast is much lower than the theoretical value. To achieve efficient biosynthesis of target proteins, the coordination between heterologous protein genes and S. cerevisiae chassis is particularly critical. Currently, researches on high protein production by S. cerevisiae are being conducted in the areas of expression systems, secretion engineering, glycosylation engineering and systems metabolic engineering. However, most S. cerevisiae protein products are produced at low yields, and there is still a large gap between theoretical and actual application. With the rapid progress in synthetic biology and the idea of “carbon neutral” method, S. cerevisiae strains with high yields of protein might be generated in the future in the following respects.

The development of novel gene editing tools has greatly improved the speed and efficiency of S. cerevisiae genome engineering. CRISPR-based systems have exhibited advantages in gene editing and heterologous metabolic pathway assembly, allowing simultaneous multiple gene editing without screening markers, greatly reducing the cycle time for heterologous metabolic pathway introduction and gene-targeted mutations, and enabling the optimization of individual genes or combinations of gene metabolic networks. With the development of the CRISPR system, the exploitation of Cas proteins has made this technology more promising for protein synthesis in S. cerevisiae. In addition, the genome design and reconfiguration of S. cerevisiae can enable the cell factory to obtain new functions and potentially evolve, suggesting the possibility of rapidly constructing efficient cell factories. For instance, synthetic chromosome recombination and modification by LoxP-mediated evolution (SCRaMbLE) can introduce genome rearrangement events in S. cerevisiae. An effective method for high protein production is made possible by this technology, which enables researchers to investigate the interactions of various rearrangements, the contribution of gene position throughout the genome, and gene copy number.

In addition, because metabolism and other biological processes are complicated, the construction of highly accurate dynamic cellular models remains a major challenge. With the development of high-throughput systems biology data, the generation of high-quality yeast experimental datasets will further promote our understanding of S. cerevisiae behaviour at the quantitative and dynamic levels. The combination of genomics, transcriptomics, proteomics, and metabolomics can provide comprehensive biological information to reveal the cell state under different conditions, determine important nodes limiting protein production, and allow reasonable tailoring of the gene network. Furthermore, deep learning (DL) is essential for the systematic analysis of heterologous data that cannot be discovered by histological techniques. This approach enables a better understanding of the underlying biological processes. Furthermore, an increasing number of DL-based computing strategies are being developed via specialized platforms. As a result of the development of these accurate, data-driven models, they may be used to create efficient systems for S. cerevisiae heterologous protein synthesis.

In terms of feedstock, S. cerevisiae can utilize not only sugar as the first-generation (1G) feedstock, but also industrial and agricultural waste as the second-generation (2G) feedstock, and single carbon compounds as the third-generation (3G) feedstock. The 1G feedstock comprises glucose, arabinose [140], and xylose [20, 21], among others. In addition to sugary waste, 2G feedstocks include discarded glycerol [22, 23], cellulose [141], etc. Importantly, 3G feedstocks are more abundant, less expensive, and carbon neutral, contributing to alleviating the energy crisis and reducing greenhouse gases. S. cerevisiae has been engineered to utilize CH3OH [24] for cell growth. Therefore, engineering S. cerevisiae to utilize 3G for protein production is an important direction that will not only help reduce protein costs but also benefit carbon-neutral targets. It is possible to engineer S. cerevisiae as a strong biological chassis for effective protein synthesis.