Introduction

Synthetic biology, also called engineering biology, is a principle of biology for creating synthetic organisms that can perform specific functions [1]. This field involves assembling and integrating standardized biological parts or modules to construct these organisms, which can serve as microbial cell factories, live therapeutics, or diagnostics [2,3,4]. Before the 2000s, synthetic biology often relied on techniques like random mutagenesis and simple gene overexpression or deletion. On the other hand, the advancement of DNA sequencing and synthesis technologies has led to rapid and inexpensive construction of complex gene clusters that can be introduced into target cells. Additionally, the development of genome engineering tools has enabled precise genome editing as well as the incorporation of programmable genetic circuits (e.g. toggle switches and biological logic circuits), enhancing the capability to design and control biological systems with unprecedented precision [5, 6].

Among diverse applications of synthetic biology, the construction of microbial cell factories capable of producing value-added chemicals and materials from renewable resources is of great interest. To construct efficient cell factories, the metabolic and regulatory pathways of the host strain should be optimized to balance cell growth with maximum production of the desired product. Feedback inhibitions and toxic intermediate accumulation should be resolved as well. Advances in gene expression modulation tools have enabled precise tuning of enzyme activity and the high-throughput identification of optimal gene expression level for enhanced product biosynthesis. In addition, the emergence of genome-scale metabolic models for model microorganisms and the automated design of synthetic genetic circuits has revolutionized the construction of programmable cell factories. These advanced tools offer precise control over gene expression, enabling the creation of functionalities that could not have been achieved by the conventional knockout and overexpression methods.

This review delves into the recent advancements in gene expression regulation tools and strategies, specifically focusing on their applications in bacteria. First, recent tools and strategies corresponding to the control of different stages within the central dogma are reviewed with accompanying examples. Subsequently, the review will discuss the utilization of these tools in constructing large libraries for high-throughput screening and their collaborative use with artificial intelligence (AI) in designing predictable cell factories. Finally, the review concludes with future perspectives. It is also worth noting that the readers are guided to other excellent reviews for more details on the tools and strategies for genome engineering and/or gene expression control throughout the article [7].

Transcriptional control of gene expression

The central dogma of molecular biology, which describes the flow of genetic information within a cell, is a fundamental principle that applies across all organisms. Therefore, it is important to review various tools that can control each level of the central dogma—transcription, translation, as well as post-translational processes. A primary approach for transcriptional regulation involves promoters of varying strengths to modulate the rate of transcription initiation. In a recent study, a computational model that can accurately predict the strength of any σ70 promoter sequence was developed using biophysics and machine learning along with massively parallel assays (Fig. 1a) [8]. This model was particularly useful for designing promoters with specific transcription initiation rates [8]. Transcription factors (TFs) are also responsible for the target gene expression modulation by either repressing or activating the recruitment of sigma factors and RNA polymerase to the promoter region. The interaction with a chemical ligand can alter the three-dimensional structure of TFs, affecting their promoter binding and enabling the modulation of gene expression in response to various environmental stimuli. Exploring TFs sensitive to specific chemicals or using mutated TF libraries enhances the ability to regulate gene expression dynamically, leading to the versatile design of synthetic genetic circuits or biosensors (Fig. 1a) [9, 10]. A recent report found that eukaryotic transcriptional activator QF can also be used for target gene activation in Escherichia coli BL21(DE3) with the use of Q-system upstream activating sequence (QUAS), showcasing the potential of broadly applicable genetic devices [11].

Fig. 1
figure 1

Gene expression modulation tools classified according to their mechanisms. a Tools for transcriptional modulation. Strategies for promoter design, transcription factor design, dCas9-based modulation, and anti-termination using sRNA are shown. b Tools for translational modulation. Strategies for RBS design, riboswitch-based modulation, dCas13-based gene repression or activation, and synthetic sRNA-based gene repression are shown. c Other tools for gene expression modulation. Strategies for DNA methylation-based modulation and protein degradation-based modulation are shown. Abbreviations are: 5’UTR, 5’ untranslated region; crRNA, CRISPR RNA; CRISPRa, CRISPR activation; CRISPRi, CRISPR interference; IF3, translation initiation factor 3; MTase, methyltransferase; ORF, open reading frame; PDT, protein degradation tag; RBS, ribosome-binding site; RNAP, RNA polymerase; sRNA, small RNA; TF, transcription factor

Quorum sensing effectively regulates gene expression transcriptionally in response to changes in cell-population density. This mechanism leverages transcription factors (TFs), such as LuxR, which are activated by autoinducers produced by the host cell once their concentrations exceed a certain threshold [12]. Therefore, quorum sensing is useful for the redirection of metabolic pathways based on cell growth phases. For example, achieving high-level production of poly-γ-glutamic acid (γ-PGA) in B. subtilis was challenging due to the strong viscosity of γ-PGA that impedes cell growth. By employing a quorum sensing system to activate biosynthetic genes once cell growth surpasses a specific threshold, γ-PGA production in B. subtilis was significantly increased to 6.73 g/L [13]. In another example, a bacteriophage lysis gene (φX174 E) was expressed under the luxI promoter in E. coli, enabling periodic cell lysis upon reaching the quorum threshold. This strategy proved particularly effective for the timed release of drugs to specific tissues [14].

Trans-acting transcription modulating tools are useful for fine-tuning target gene expression. Clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system has been the most widely used tool of this kind. The use of catalytically inactive Cas9 (dCas9), which binds to DNA sequences under the guidance of single-guide RNA (sgRNA), enables both repression and activation of gene expression (Fig. 1a). CRISPR interference (CRISPRi) involves the binding of dCas9 to the transcription initiation region, preventing the binding of RNA polymerase and thereby repressing transcription [15]. A combined CRISPR/CRISPRi system utilizing both Cas9 and dCas9 has facilitated multiplex control of gene expression, allowing for targeted gene knockout, knockdown, and knockin. This approach was applied to increase succinate production while reducing byproduct formation in E. coli [16]. To mitigate dCas9-associated toxicity in bacteria, a variant of dCas9 was engineered without the protospacer adjacent motif (PAM) binding sequence and combined with the PhlF repressor. This modified dCas9 exhibited decreased nonspecific binding and reduced toxicity [17]. In another study, an sgRNA mutant library was constructed and tested for fine-tuning gene expression levels, achieving a > 45-fold dynamic range in gene expression [18]. This system was used for the enhanced production of violacein [18]. On the other hand, CRISPR activation (CRISPRa) involves dCas9 fused with transcription activators to enhance gene transcription [19]. Since eukaryotic activators like VP64 are ineffective in bacteria, bacterial proteins, bacteriophage/transposon-derived effectors, or RNA polymerase subunits have been utilized instead. Among these, the E. coli regulator SoxS has shown significant efficacy in activating target genes in E. coli [20]. Such CRISPR-based tools offer adaptable multiplex transcriptional modulation, achieving desired levels of gene expression. For further information on CRISPR tools for gene expression regulation, readers are encouraged to refer to additional reviews [21, 22].

Small RNAs (sRNAs) are another tool for modulating gene expression, binding complementarily to the target mRNA and thereby regulating its translation. Interestingly, sRNAs also play a role in transcriptional regulation by inhibiting transcription termination. While Rho factors are generally known to bind to the 3’-ends of mRNAs for transcription termination, they can also initiate premature transcription termination by binding to the 5’ untranslated region (5’UTR). Some sRNAs (e.g. DsrA, RprA, ArcZ) in E. coli have been found to interact with or near Rho utilization sites, facilitating anti-termination and thus activating target genes (Fig. 1a) [23]. However, the practical application of these tools for microbial cell factory development has not yet been reported. Small transcription-activating RNA (STAR) can also be used to activate the transcription of target genes [24]. STAR regulators consist of an RNA with a terminator hairpin placed upstream of the target gene, along with a complementary STAR. Typically, the terminator hairpin halts transcription of the target gene, but in the presence of STAR, which binds to the hairpin, transcription is turned on. Recently, STARs tailored for E. coli were effectively adapted for various Gram-negative bacteria, with multiple STARs being fused together (resembling the CRISPR array) for adjustable control of gene expression [25].

Synthetic genetic circuits, empowered by transcriptional control tools, offer dynamic regulation of gene expression [26, 27]. For example, to enable the automated design of complex genetic circuits on demand, a computational tool was developed by connecting various transcriptional gates that are regulated by diverse transcriptional controlling tools such as TFs responsive to different ligands, recombinases, and the CRISPR/Cas system [28]. In another example, a TF responsive to a specific target product was employed to trigger the expression of an essential gene. This boosted the growth of the host strain, particularly in environments with high concentrations of the target product. The two essential genes folP (encoding tetrahydropterin synthase) and glmM (encoding phosphoglucosamine mutase) were expressed under the PBAD promoter, enabling transcriptional control in response to the mevalonate-responsive TFAraCmev, a variant of AraC originally responsive to arabinose [29]. Implementing this approach led to stable production of mevalonate over 95 generations, demonstrating robust industrial applicability. Employing CRISPRi and/or CRISPRa also allowed the construction of complex multi-layer cascades and feedforward loops in E. coli, proving the potential of using such transcriptional modulation tools for the development of programmable synthetic bacteria [30].

Translational control of gene expression

The advancement of translational modulation tools has introduced an additional dimension for controlling gene expression in bacteria. A conventional method involves the development and utilization of ribosome-binding sites (RBSs) with varying strengths to regulate translation efficiency. Recently, computational tools were developed to predict RBS strengths and design RBSs with desired translation efficiencies, allowing for more refined control over translation (Fig. 1b) [31]. Another approach involves regulating RNA stability and degradation. For instance, hairpin-forming RNA sequences (i.e. degradation-tuning RNAs) were inserted at the 5’-end of the target genes for predictable adjustment of RNA degradation rates, achieving a 40-fold range in transcript stability modulation in E. coli [32]. These methods offer more secure and adjustable control of target gene expression.

Translation efficiency largely depends on the 3D conformation of the mRNA 5’-ends. Therefore, cis-regulatory RNA sequences, known as riboswitches, can be inserted at the 5’-end of target genes to fine-tune translation efficiency and modulate gene expression in response to specific nucleic acids or chemicals. One of the most representative examples is the toehold switch, where the RBS is flanked by hairpin structures (Fig. 1b). These structures unfold upon binding with short, partially complementary RNA sequences, activating gene expression. Toehold switches have been refined to detect single nucleotide variations in target mRNA through precise thermodynamic design [33]. Furthermore, toehold switches can enhance the production of full-length proteins or enzymes. Inserting a trigger RNA sequence at the 3’-end of the target mRNA leads to circularization upon activation, enhancing transcript stability and promoting the production of full-length enzymes [34]. Introducing this system in E. coli resulted in enhanced production of 3-hydroxypropionic acid (2.1 g/L), violacein (2.19 mg/gCDW), and lycopene (1.52 mg/gCDW) [34]. Despite these advances, riboswitches often result in low gene expression levels [35] and exhibit high context sensitivity, posing challenges for predictable gene expression [36]. These limitations highlight areas for further development in this field.

Trans-acting translation modulation tools provide versatile regulation of target gene expression and facilitate high-throughput screening of target chemical overproducers. While the CRISPR/Cas9 system is responsible for transcriptional modulation by binding to target DNAs, the CRISPR/Cas13 system is responsible for translational modulation by binding to target mRNAs (Fig. 1b). A key advantage of Cas13 over Cas9 is its independence from PAM sequences for target recognition, offering greater flexibility in designing target-binding sequences [37]. Following the development of CRISPRi, a catalytically inactive version of Cas13 (dCas13) was engineered to inhibit the translation of specific genes. The construction and introduction of 102 guide RNAs into the E. coli strain harboring lycopene biosynthetic gene clusters has enabled the screening of lycopene overproducers [38]. In another study, a translation initiation factor IF3 was fused to dCas13 to enhance the translation rate of target genes in bacteria [39], presenting a novel approach for screening overexpressed target genes that would lead to enhanced production of target chemicals.

Synthetic sRNAs are another tool that allows translational knockdown of target genes [40]. sRNA binding with the target mRNA is facilitated by the sRNA scaffold (a 3’ hairpin structure providing stability and functionality of sRNA) and the Hfq protein (an RNA chaperone that facilitates sRNA-mRNA interaction) (Fig. 1b). For fine-tuning knockdown efficiency, sRNAs were expressed under promoters of varying strengths, leading to enhanced production of L-proline (33.8 g/L) and putrescine (42.3 g/L) in E. coli [41]. Also, simultaneous knockdown of multiple target genes was achieved using different plasmids with compatible antibiotic markers and origins of replication, resulting in increased production of L-proline (54.1 g/L) and L-threonine (22.9 g/L) in E. coli [42]. For fine-tuning gene expression in a predictable manner, a protocol for designing sRNA sequences was developed, utilizing changes in the free energy of mRNA-sRNA complex formation (ΔGCF) and the mismatch percentage in the target binding region. This approach has been successfully validated across both Gram-positive and Gram-negative bacteria, and applies to target genes located on both plasmids and chromosomes [43].

sRNAs have also been applied in diverse bacterial species including Pseudomonas putida [44], B. subtilis [45], or Synechococcus elongatus [46]. However, since the conventional sRNA platform originated from E. coli, it showed limited effectiveness in Gram-positive strains such as Corynebacterium glutamicum [47]. To overcome this, a new sRNA platform combining the B. subtilis RoxS scaffold and Hfq was developed, demonstrating efficient gene knockdown across 15 different bacterial strains, indicating its wide-ranging applicability. Additionally, the use of circularized sRNA, generated from a self-splicing RNA strand, has resulted in more durable and stable modulation of target gene expression [48].

Bacterial gene expression often occurs through operons, enabling the simultaneous expression of multiple genes from a single transcript. While this characteristic implies that transcriptional modulation affects all genes within an operon simultaneously, modulation at the level of translation permits fine-tuning of expression levels for individual genes. Thus, employing both transcription and translation modulation tools enhances the flexibility and precision in engineering microbial strains. For instance, the combined use of dCas9 and dCas13 facilitates multiplex modulation of complex biosynthetic gene clusters [49]. Such dual application can selectively target genes within an operon for repression or activation, or adjust the expression of the entire operon, offering a more precise approach to control gene expression.

Other types of gene expression control

Gene expression can also be regulated through methods beyond transcription and translation modulations. Epigenetic engineering offers a novel approach to modulating gene expression profiles. In bacteria, which lack histones and nucleosomes present in eukaryotes, DNA methylation stands out as a key mechanism of epigenetic engineering. For instance, the methylation of the first cytosine in TCTTC motifs to 4-methylcytosine by a DNA methyltransferase from Helicobacter pylori has been shown to change the expression of 102 genes. Such gene expression modulation resulted in altered phenotypes such as decreased adherence to human gastric adenocarcinoma cells and decreased natural transformation efficiency [50]. The reversible nature of DNA methylation enables the design of synthetic genetic circuits for environmental monitoring. A notable example involves a circuit with an operon containing the ccrM gene (encoding a methyltransferase) and a reporter gene (e.g. egfp), initially repressed by a zinc finger repressor targeting the promoter [51]. Here, CcrM methylates the promoter, blocking the repressor and thus activating gene expression (Fig. 1c). The circuit also includes the mf-Lon protease, which degrades CcrM under specific conditions, leading to promoter inactivation and decreased fluorescence, effectively recording the environmental change.

Post-translational control of gene expression is also useful for the construction of dynamic metabolic pathways. Degrons, peptides responsible for eliciting protein degradation, can be used to regulate protein degradation for fine-tuning target protein levels [52]. For instance, variants of the SsrA degron, each with a different rate of proteolysis, allow for the precise tuning of gene expression [53]. An advantage of using this system is that it can be applied to diverse bacterial strains including B. subtilis [53], E. coli, and Lactococcus lactis (Fig. 1c) [54]. Furthermore, by attaching two degradation tagsNIa and SsrAto the C-terminus of a protein, intricate control over gene expression can be achieved [55]. The SsrA tag initiates continuous degradation by ClpXP and ClpAP proteases. Activation of the NIa gene, however, cleaves off the SsrA tag, stopping the degradation process and thus facilitating the accumulation of the protein. This conditional degradation strategy has been utilized to increase poly-3-hydroxybutyrate production in E. coli by separating the production phase from the growth phase [55]. The same approach was used in P. putida, to lower the basal gene expression from the strong Pm promoter by degrading the encoded protein when the promoter is inactive [56]. Although post-translational control of gene expression is resource-intensive due to the necessity of initial protein synthesis, it enables precise and rapid regulation of gene expression. This is achieved by rapidly modifying protein activity and levels, bypassing the delays inherent in transcriptional or translational regulation.

Employing high-throughput screening and artificial intelligence for gene expression modulation

High-throughput genetic engineering methodologies have facilitated the creation of diverse genomic modifications and individual gene variants compared to conventional single-gene engineering. This approach can be leveraged to build efficient microbial cell factories capable of producing value-added products with high titer, yield, and productivity (Fig. 2a). Notably, the development of a new synthetic sRNA platform harboring the RoxS scaffold and the Hfq chaperone from B. subtilis has shown wide applicability in a broad host range of bacteria [47]. By introducing an sRNA library targeting ~ 3,000 genes in C. glutamicum, the identification of a strain producing indigoidine at high levels, reaching 54.9 g/L, was demonstrated. Another notable advancement involves modular loop engineering within the crRNA, which enables precise single-gene repression with exceptional efficiency (92% knockdown) and specificity for single base-pair mismatches [38]. This system was successfully applied for identifying new sRNA targets aimed for enhanced lycopene production (6.21-fold increase compared to the control) in E. coli, proving its potential in metabolic engineering applications.

Fig. 2
figure 2

Applications of gene expression modulation tools. Various tools described in this paper can be further applied to (a) the production of valuable chemicals, (b) therapeutics development, and (c) basic research for fundamental understanding of genetic regulatory networks. Abbreviations are: CRISPR, clustered regularly interspaced short palindromic repeats; DL, deep-learning; HTS, high-throughput screening; ncRNA, non-coding RNA; RBS, ribosome-binding site; sgRNA, single-guide RNA; uASPIre, ultradeep Acquisition of Sequence-Phenotype Interrelations

Furthermore, the integration of gene expression modulation tools with high-throughput platforms has opened new avenues in genome-scale functional genomics research. For example, researchers have utilized a genome-scale CRISPRi sgRNA library comprising 55,671 sgRNAs to identify essential genes in E. coli (Fig. 2b) [57]. By mapping the relationships between phenotypes and non-coding RNAs, they have not only identified genes for toxic chemical tolerance, but also gained valuable insights into the intricate metabolic networks governing bacterial physiology. Additionally, the utilization of random sgRNA libraries has facilitated the discovery of phage-resistance-related genes, paving the way for the development of industrially robust microbes and novel phage therapies [58].

Advancements in AI have further enhanced our capabilities in genetic engineering and functional genomics. AI-powered predictive models, such as convolutional neural networks (CNNs), have revolutionized the prediction of on- and off-target activity of CRISPR tools targeting RNA. These sophisticated algorithms leverage a dataset including millions of guide RNAs and outperform existing models in predicting on-target and off-target activities [59]. An improved prediction of on-target activity of the prokaryotic Cas9/sgRNA system in E. coli was recently demonstrated by using CNN with five convolution layers (CNN_5layers), demonstrating superior predictive performance [60]. Another Cas9/sgRNA on-target activity prediction model, trained on existing small datasets, was also developed [61]. This model demonstrated enhanced accuracy in predicting sgRNA target site sequence-associated activities of TevSpCas9 and SpCas9 in both Salmonella enterica and Citrobacter rodentium, showcasing its broad applicability across different bacterial strains. Translational control of the target gene can be also deciphered with the aid of deep learning (DL) model, called DeepTESR [62]. The researchers developed this new framework for predicting translational efficiency for 139,954 unique transitional elongation short ramp (TESR) sequences, and the 4307 TESRs of the of E. coli K-12 MG1655 were analyzed, providing the list of scores for heterologous gene expression.

Moreover, recent AI strategies have enabled an in-depth understanding of DNA regulatory mechanisms underlying gene expressions and the rapid design of synthetic regulatory systems. In one study, researchers utilized a DL approach for the creation of synthetic promoters, resulting in promoters that closely mimic their natural counterparts (Fig. 2c) [63]. Remarkably, over 70.8% of these synthetic promoters were confirmed as functional, with some exhibiting greater activity than the most potent mutants of natural promoters. Another notable method involved the utilization of DNA-based phenotypic recording to evaluate sequence-function pairs for 300,000 RBSs in E. coli [64]. This approach, coupled with DL techniques, resulted in the generation of a dataset corresponding to more than 2.7 million sequence-function pairs. Such big data allowed significant improvements in predictive accuracy, offering unprecedented insights into RBS activities.

Conclusions and future perspectives

In this paper, we review the recent advancements made in gene expression modulation tools for bacteria and their applications in synthetic biology. We delve into the latest advancements in controlling transcription and translation levels of the central dogma, offering illustrative examples. With the emergence of in silico design tools and high-throughput screening methods, achieving dynamic and precise control over biomolecular systems has become feasible. The tools and examples discussed herein hold promise for diverse biological applications, spanning synthetic biology, biomedical research, and metabolic engineering.

However, despite the progress, the translation of these technologies into commercial or industrial-scale applications remains limited. For example, optimizing regulatory mechanisms, accurately capturing target gene responses, and timing gene expression precisely within complex metabolic networks remain challenging, which are often exacerbated by the limitations of natural regulatory systems. Moreover, the availability of gene expression modulation tools for non-model bacteria is still highly limited, underscoring the need for the development of a generally applicable engineering toolkit. For enhanced precision in gene expression control, multi-omics analyses will play more important roles in bacterial synthetic biology. By combining genomics, transcriptomics, proteomics, and metabolomics data, researchers can gain a comprehensive understanding of cellular responses to genetic manipulations, enabling the development of more precise and efficient control strategies.

Moreover, the integration of machine learning and AI techniques holds promise for the design and optimization of gene expression modulation tools. By combining large-scale omics data and computational modeling, AI-driven approaches can effectively guide the rational design of genetic circuits and predict their performance in silico. This synergistic combination of experimental and computational methods will facilitate the development of more predictive and efficient control of gene expression.

Beyond conventional industrial applications, the collaborative work of synthetic biologists, metabolic engineers, and computer scientists is poised to significantly advance the development of biological parts and modules, paving the way for novel applications in the biomedical and environmental sectors. More specifically, these tools can be leveraged for the development of next-generation therapeutics, diagnostics, bioremediation, and the production of biofuels and bioplastics. In summary, a deeper understanding of the fundamental aspects of gene regulation, combined with cutting-edge technologies, will open new avenues for the innovative applications of synthetic biology, contributing to the bio-based future and the advancement of science.