Introduction

A protein encoded in recombinant DNA and expressed in a heterologous host is a recombinant protein (Young et al. 2012). In 1973, Herbert Boyer and Stanley Cohen pioneered the use of recombinant DNA in cloning and gene expression processes. In the 1970s, the use of recombinant DNA expression for the production of recombinant proteins increased, and today the production of recombinant proteins has dramatically increased, and the techniques used in this science are available in all countries. Nowadays, some vaccines applied against COVID-19 disease are based on recombinant DNA technology (Francis et al. 2022; Ghiasi et al. 2021) and this technology will be a promising method to fight against cancers and incurable diseases.

Choosing a suitable heterologous host for the expression of heterologous genes is an important step. Researchers have used different prokaryotic and eukaryotic organisms as heterologous hosts, such as Escherichia coli (Rosano et al. 2014), Bacillus subtillis (Schumann 2007), Pseudomonas fluorescens (Zhang et al. 2018), filamentous fungi (Ward 2012), yeasts (Çelik and Çalık 2012), insect cells (Ikonomou et al. 2003), mammalian cells (Lai et al. 2013), and plant cells (Hellwig et al. 2004). Each of these hosts has its own merits and demerits (Shanmugaraj et al. 2020). The selection of suitable expression systems depends on many criteria such as the facilities of the laboratory and local expertise (Ma et al. 2013), final production cost (Gifre et al. 2017), the characteristics of the recombinant protein (Demain et al. 2009), the intrinsic feature of the expression host (Palomares et al. 2004), the source of the heterologous gene (Deng et al. 2017), easy optimization of heterologous expression (Kaur et al. 2018), regulatory considerations (Hellwig et al. 2004), the purpose of recombinant protein production (Puetz and Wurm 2019), etc. Therefore, before the production of recombinant protein in the heterologous host, the ideal host for the expression of the gene of interest should be selected according to the purpose of the experiment and the characteristics of the host (Aune 2008; Khan 2013).

According to the marketsandmarkets website https://www.marketsandmarkets.com/ report (Karyolaimos et al. 2019), the production of recombinant proteins in mammalian cells had the highest income among other systems in 2021, and the income from this system was equal to 41.70% of total revenue from recombinant proteins. Also, expression based on the bacterial systems was in second place in terms of income (Fig. 1). However, disadvantages such as high cost and relatively low protein yield are more common in mammalian expression systems than in prokaryotic expression systems (Portolano et al. 2014). Prokaryotic expression systems are suitable hosts for the expression of recombinant proteins, and these organisms are known as the affordable and simplest host. The E. coli expression system is the most extensively used bacterial expression system and is the first choice for laboratory research (Koehn and Hunt 2009). Considering the vital importance of recombinant protein production in this era, this review aims to study the optimal expression conditions for recombinant proteins and the effect of various factors on recombinant protein expression in E. coli. Here, we also review the bottlenecks and workarounds that researchers have reported for this host.

Fig. 1
figure 1

Comparison of heterologous hosts by income: In total, revenue from recombinant proteins which are produced in all heterologous hosts is very considerable as in 2020 and 2021 this amount is 358.4 and 556.6 million dollars, respectively. Also, the bacteria-based expression system is the second most profitable after mammalian cells (Karyolaimos et al. 2019)

Recombinant protein expression in E. coli

E. coli is one of the important hosts for the production of recombinant proteins. It was first isolated by Theodor Escherich in the late nineteenth century. It is a gram-negative rod-shaped proteobacterium and mammalian intestinal pathogen (Overman et al. 2014). The genetics of E. coli has been studied more than any other gram-negative bacteria. Due to recent advances in the understanding of transcription and translation processes, the mechanism of protein folding, and the availability of improved genetic tools, the use of E. coli to express recombinant proteins has increased (Demain and Vaishnav 2009).

Bacteria are commonly used to produce recombinant proteins less than 30 kDa (Demain and Vaishnav 2009) and production of larger sizes is problematic, but the production of far larger proteins has also been reported in this host. Chloroplast is a promising heterologous host for the production of recombinant proteins in the future. Comparing the potential of E. coli and plant chloroplast to produce recombinant protein can be a good comparison. In the chloroplast, the integration of heterologous genes into the genome is achieved by means of homologous recombination (Burnett et al. 2020), and polycistronic mRNA is produced in this organelle. Introduced and integrated heterologous genes into the chloroplast genome can have 10,000 polyploidy in a young leaf cell (Verma and Daniell 2007), which is unmatched by any other expression system. Also, some plant-derived recombinant therapeutic proteins can be used as oral drugs (Kurup and Thomas 2020) without the need for a purification process, which sharply decreases the cost of the final product. In addition, plants can grow in the presence of soil, water, and light, which is cost-effective compared to the growth media of other expression systems (Burnett and Burnett 2020). This expression system has some drawbacks. One of the main problems is the process of integration of heterologous genes into the chloroplast genome, which is mostly done using a gene gun device (Burnett and Burnett 2020), which is expensive, as well as optimization of chloroplast transformation is a time-consuming and ultimately production of the homoplasmic transplastomic cell lines from produced primary heteroplasmic cells lines is necessary. Moreover, the metabolic burden is another reported disadvantage of this host which can lead to retardation of plant growth followed by biomass reduction and low recombinant protein yield (Oey et al. 2009).

Easy manipulation of the genome of unicellular hosts such as bacteria is their main advantage over multicellular organisms. In addition, unlike multicellular organisms, the transformation of bacteria is simple, highly efficient, and cost-effective (Baneyx 1999). In general, the cost of production, the complexity of the work path, the different patterns of post-translation modifications, the high protease property of hosts, the inability to secrete recombinant proteins, the contamination with pathogens and toxins, the solubility of recombinant proteins, and the unrelated codon preference can be a major bottleneck in the expression of recombinant proteins in various sorts of hosts (Bock 2015). To improve recombinant protein expression in the E. coli system we can apply some strategies such as using different promoters to regulate the over-expression of genes, utilizing different strains of E. coli, reducing the growth temperature, secretion of protein to extracellular space, changing the culture medium, using tags, and simultaneous expression of recombinant proteins with chaperones or foldases to help the correct folding of the proteins (Baeshen et al. 2015; Wang et al. 2007).

Essential components of E. coli-based expression systems

Several factors are involved in the expression of recombinant proteins in E. coli. To optimize and increase the expression of recombinant proteins, some of these factors can be applied or modified, and here we describe each of these factors.

Vector

To call a DNA molecule as a vector, it must demonstrate specific characteristics. Each vector must be able to independently replicate inside the host cell so that multiple copies of the recombinant DNA can be produced and transferred to the daughter cells. Also, a cloning vector should be relatively small (less than 10 kb in size) because larger molecules are susceptible to breaking during extraction and are more difficult to be manipulated. Two types of DNA molecules with these characteristics can be found in bacterial cells: plasmids and bacteriophage chromosomes (Brown 2020). For the expression of recombinant proteins in E. coli, various expression plasmids and different strains have been introduced. In recent years, various expression plasmids have been designed to bypass various obstacles in the recombinant protein expression pathway. The most common expression plasmids have multiple vital elements such as the origin of replication, promoter, an affinity tag, a coding sequence for removal of tags, a multiple cloning site (MCS), a selectable marker gene (to screen recombinant bacteria), and a terminator sequence (Georgiou and Segatori 2005) (Fig. 2). Today, there are many expression plasmids, and the choice is made according to the purpose of the experiment (Rosano and Ceccarelli 2014). In most cases, a derivative of pBR322 plasmid is used as an expression plasmid in E. coli (Klumpp 2011). Elements of expression plasmids have a great influence on various aspects of recombinant proteins that are expressed in E.coli. Therefore, a complete and accurate understanding of these elements is essential. Here we state most of these elements.

Fig. 2
figure 2

Important elements of expression plasmid in bacteria. The distance between different elements is not to scale

Replicon

A replicon is a complete segment of DNA that can replicate independently through a single origin of replication. A replicon contains an origin of replication with its associated cis-control elements (Ekundayo and Bleichert 2019). One of the factors for selecting a vector for protein expression is the copy number of plasmids, which is controlled by the replicon. The copy number of plasmids that are commonly used to construct expression vectors in E.coli are shown in Table 1. It should be noted that increasing the copy number of a plasmid does not always mean increasing the efficiency of recombinant protein production (Lozano Terol et al. 2021). The relationship between recombinant protein production and plasmid copy number is a delicate balance, as it has been identified that the metabolic burden increases by 0.063% for each additional plasmid, and maintaining a balance between host growth and recombinant protein yield is necessary (Rouches et al. 2022). In addition to the copy number of plasmids, their compatibility with each other is another important factor to be considered. This compatibility is related to the replication mechanism of plasmids. Overall, plasmids that are closely related to each other are more competent to be incompatible (Thomas 2021). It is impossible to maintain two different plasmids that use the same mechanism for their replication, in the same cell. When two plasmids in the same group are incompatible, it means that they cannot replicate together in the same cell, but compete with each other to recruit replication factors (Velappan et al., 2007). For example, pMB1 and colE1 plasmids are in the same group and are incompatible (Gao 1986; Minton 1984; O’brien et al. 2006).

Table 1 Copy number of plasmids that are generally utilized for constructing expression vectors in E. coli

Promoter

Promoter and enhancer elements are regulatory sequences. Enhancers are short nucleotide sequences that increase the transcription rate (Palstra and Grosveld 2012). Promoters control the binding of RNA polymerase to DNA, so a promoter controls when and where a gene is expressed in an organism (Blazeck and Alper 2013). An efficient expression vector requires a strong promoter with a high affinity for the RNA polymerase to increase the transcription of the heterologous gene [16]. A strong and adjustable promoter can control the transcription of the target gene in a certain period. These promoters are regulated using inhibitors or activators. A promoter used in an expression vector should have some specific features for the ideal expression of recombinant proteins. They must be strong and have a high affinity for polymerase and produce recombinant proteins at 10–30% or even more of the total cellular proteins, as well as promoter induction should be easy and affordable (Makrides 1996). The characteristics of some promoters used in E. coli are listed in Table 2. In general, promoter strength and plasmid copy number are important factors in the yield of recombinant proteins, and regulation of these two factors can bypass the problem of metabolic burden in E.coli, which reduces the yield of recombinant proteins (Lozano Terol et al. 2021).

Table 2 Common promoters used to drive the recombinant protein expression in E. coli

Fusion tags

Tags are molecules composed of amino acids chain which can have multiple effects on the target proteins (Lichty et al. 2005). Because of their critical function in the pathway of recombinant protein production, we discuss them thoroughly. Affinity tags are unique proteins/peptides which are added to the N- or C-terminus or both ends of recombinant proteins to primarily help the purification of target proteins. It is usually more useful to place tags at the N-terminal than the C-terminal. When tags are placed at the N-terminal, it is easier to purify the protein and also causes effective translation initiation (Costa et al. 2014). Several factors should be considered when designing and placing tags in the N or C terminus of the target proteins. For example, the added tag should not disturb the function of the target protein, also during protein purification these tags should not be out of reach (Booth et al. 2018; Wingfield 2015). In general, tags have numerous important functions in the expression of heterologous proteins. These functions include helping to optimize protein expression, increasing the solubility and stability of proteins, helping protein purification, (Rosano and Ceccarelli 2014) and reducing the toxicity of recombinant proteins on host cells (Arnau et al. 2006; Costa et al. 2014). In addition, they can be used to confirm and quantify the expression of recombinant genes in heterologous hosts.

Affinity tags are in two forms, peptide tags (FLAG, poly-His, poly-Arg, Stre-taq, c-Myc, S-taq), which interfere less with protein of interest (POI) due to their small size (Bozarth et al. 2009). Sometimes they may cause negative effects on the tertiary structure of the POI (Khlebnikov and Keasling 2002). It has been discovered that poly-His tag can reduce the solubility of the POI (Woestenenk et al. 2004). The other one is in the form of larger proteins which, in addition to helping to purify the POIs, may also increase their solubility (Tochio et al. 2015). Tags that play a role in increasing protein solubility include maltose-binding protein (MBP), N-Utilization protein A, thioredoxin, glutathione s-transferase (GST), ubiquitin, Fh8, and SUMO (Costa et al. 2014; Rosano and Ceccarelli 2014). Recent studies show that the Fh8 fusion tag is the best solubility enhancer among the tags (Costa et al. 2014). In addition, various researchers have successfully applied several non-commercial and uncommon tags (Han et al. 2018; Morris et al. 2016; Nguyen et al. 2019; Ojima-Kato et al. 2017).

Removing fusion tags is usually necessary to achieve intact and active POIs. Enzymatic or chemical cleavage can be used to remove tags. To enzymatically remove tags, the coding sequence for tag removal is inserted into the expression vector that encodes a protease cleavage site. In choosing a suitable protease for removing tags, factors such as the number of unwanted amino acid residues remaining after cleavage, the ease of removing the protease after enzymatic cleavage, and the cost and characteristics of the protease should be considered. Note, the buffer required for protease activity should not affect the properties of the POI. These proteases include tobacco etch virus (TEV) protease, thrombin, enterokinase, and factor Xa (Blommel and Fox 2007; Jenny et al. 2003). In chemical cleavage, a series of chemical reagents are used to remove fusion tags. Cheapness is the advantage of this method, but the reaction conditions are harsh. One of the chemicals used to remove tags is Cyanogen bromide (CNBr). Methionine is an infrequent amino acid in the structure of proteins. CNBr is used to make cleavage at the C-terminal of the methionine residue. We can then use this reagent to release the target proteins from the tags, where the methionine is only present just before the N-terminus of target proteins (Andreev et al. 2010). The presence of methionine in the structure of tags is not important in this respect, but it must not be present in the structure of the target proteins except at their beginning. As proteins increase in size, the frequency of methionines increases throughout the protein sequence, then the application of this method to smaller target proteins such as antimicrobial peptides will be more feasible. Recently, the cleavage of peptide bonds using metals, especially nickel or palladium, which could be used to cleave the tags, has been established (Andreev et al. 2010; Hwang et al. 2014).

In the above-mentioned affinity tags, we need some instruments and specific solutions to separate fusion proteins from the total proteins of hosts. Purification of recombinant proteins based on cost-effective physico-chemical methods does not require special tools such as chromatography to separate recombinant proteins from other cellular extracted substances (Wang et al. 2020). In these methods, purification is based on thermal, salinity, and pH changes or a combination of these factors. The elastin-like polypeptide (EST) (Gong et al. 2022; Pereira et al. 2021), four-helix bundle protein DAMP4 (Wang et al. 2017) and the cell surface protein B (CspB) (Nonaka et al. 2019) tags fall into this category. In the future, the introduction of new purification methods based on physico-chemical methods is highly felt. Overall, non-chromatographic-based purification methods plus non-enzymatic elimination of tags would be double cost-effective strategies for large-scale production of recombinant proteins. In one experiment, scientists used a non-chromatographic and self-cleavable ELP-intein tag for the purification of specific antimicrobial peptides (AMPs) (Camps) that heterologously expressed in E. coli strain BL21. In this tag, the ELP has been applied with the goal of promoting aggregation of target protein in the special condition followed by rapid and cost-effective isolation of AMP and the intein has been applied for releasing this tag from AMP (Sousa et al. 2016). In another study, researchers utilized a system called the TSGIT to solve the problems associated with the cleavages of tags (unwanted amino acid residues) and purifying intact recombinant proteins without any contamination with truncated recombinant proteins. In this system, three types of tags include purification, solubilization, and cleavable tags were added to both termini of the POI. Purification tags were two hexa-histidine tags in a row in the N terminal and two consecutive Strep tags in the C-terminus of the POI. The solubilization tag in both termini of the protein was thioredoxin. They used the SUMO cleavage tag in the N terminal and the intein cleavage tag in the C terminus. These tags released the POI without any unwanted extra N- or C-terminal amino acids and in intact form. The limitation of this system was the relatively high concentration of thiol-containing reductant which was required for the cleavage of the C-terminal cleavage tag (Raducanu et al. 2021).

Fusion tags sometimes cause structural changes in protein function, so when the goal of recombinant protein production is biopharmaceutical applications, removing the fusion tag without any amino acid residue in the goal protein is necessary to ensure the safety and efficiency of protein therapy. In 2020, researchers introduced a Caspase-2 cleavage site (a pentapeptide site that includes VDVAD amino acids) as a specific site for cleavage by the cpCasp2 enzyme which lead to an authentic N terminal in the recombinant protein (Cserjan-Puschmann et al. 2020). Introducing new types of tags with novel attractive features to circumvent obstacles in heterologous protein synthesis in E. coli is an ongoing process, and it is recommended to stay up-to-date to select the appropriate tags based on your projects.

Codon preference

Many factors are involved to increase recombinant protein expression in heterologous hosts. The relative abundance of different tRNAs for a particular amino acid can vary greatly between species and results in species-specific preferences (Terpe 2006). The expression of heterologous proteins in E. coli is affected by codon usage bias (Rosano and Ceccarelli 2009). Codon preference is a common phenomenon among organisms, and differences in the pattern of codons for a specific amino acid or differences in the pattern of tRNAs abundance can lead to it. Two strategies are used to avoid this problem. In the first scenario, researchers substitute the gene of interest codons sequence with the preferred codons in the host organism, and finally, the modified gene is ordered to be artificially synthesized. In the second option, researchers use a genetically modified strain of E. coli that can translate all codons well. Modification of codon usage can have a distinct effect on protein expression levels. Some studies have shown that codon optimization increases protein expression even more than 1000-folds (Mauro 2018). Codon optimization increases the rate of translation speed by overcoming limitations related to species-specific differences in codon usage and tRNA abundance (Lipinszki et al. 2018; Portolano et al. 2014). In one study, a high level of production of human growth hormone was achieved in E. coli through the codon optimization strategy, and it was found that codon optimization is an important solution to avoid the problems forced by rare codons or unwanted secondary structure formation of mRNA (Ghavim et al. 2017).

E. coli strains have a fundamental role in heterologous protein expression

Several strains of E. coli are used for recombinant protein production depending on the intrinsic properties of the POIs. E. coli K12 and BL21 (DE3) and many of their derivatives are the most commonly used hosts for heterologous expression (Sahdev et al. 2008; Tegel et al. 2010). The genome sequence of E. coli K12 was published in 1997, (Blattner et al. 1997) and the reduction of its genome with the removal of unnecessary regions improved its properties (Posfai et al. 2006). E. coli strains differ based on specific characteristics acquired through the genetic engineering of their genomes. In general, they were engineered to acquire features such as higher mRNA stability, ability to form disulfide bonds in the cytoplasm, the absence of proteases, suitability for eukaryotic gene expression, suitability for toxic gene expression, having stable plasmids, stringent control on heterologous overexpression, etc. Characteristics of E.coli strains are demonstrated in Table 3 (Baeshen et al. 2015; Hayat et al. 2018; Terpe 2006). In the E. coli expression system, one of the problems is the lack of obtaining recombinant protein with high yield, which is due to the metabolic burden. Metabolic burden causes growth decrement and cell density reduction in E. coli, which leads to a decrease in recombinant protein yield. Studies have shown that this imbalance can be controlled by regulating heterologous gene expression by considering the copy number of a plasmid and the adjusting promoter strength (Lozano Terol et al. 2021). pET-based expression plasmids were widely used for research and commercial production (Zhang et al. 2022). This plasmid has a T7 promoter that drives the heterologous expression of recombinant proteins. The T7 promoter was derived from λ prophage, as well as the T7 RNA polymerase gene of this virus was integrated into the genome of E.coli BL21 (DE3) strain (Studier and Moffatt 1986). In addition, the gene of this RNA polymerase exists in the genome of other E. coli strains (Robichon et al. 2011; Vikström et al. xxxx). This RNA polymerase transcribes genes eight times more than the native RNA polymerases of E.coli (Iost et al. 1992). This RNA polymerase sometimes reduces the level of recombinant proteins through the deleterious effects that it causes, such as metabolic burden, toxic issues, and low host cell density. To overcome these issues, researchers developed strains of E. coli which have adjustable promoters for the RNA polymerase gene that can tightly control its expression. Scientists by redesigning some parts of the pET28a vector increased its efficacy in the production of recombinant proteins (Shilling et al. 2020). In another study, researchers knocked down the genes related to blocking stress responses in E. coli, and these mutated E. coli revealed a great potential to express recombinant proteins in higher amounts in comparison to non-mutated counterparts (Sharma et al. 2020).

Table 3 E. coli strains used for recombinant protein production

Culture conditions

Another effective factor in the expression of recombinant proteins is the conditions of bacterial cultures. Researchers use shake flasks for the optimization of growth conditions. These shake flasks do not require any special instruments (Ladner et al. 2019). However, unlike bioreactors, standard cultivation of E. coli in shake flasks is done in batches. Batch growth in flasks has several limitations such as lack of pH control, low aeration, and nutrient shortage (Weuster-Botz et al. 2001). In aerobic cultures, oxygen supply is vital. Under oxygen-restricted conditions, the metabolism of a microorganism is substantially altered, and unwanted by-products like organic acids such as acetate and ethanol may be composed (Losen et al. 2004). In E.coli, high cell density followed by the formation of acetate leads to toxicity during the culture period. Controlling oxygen levels can greatly control toxicity (Johnson et al. 2010). Acetate production and its toxicity can be solved by continuously feeding the culture with glucose and maintaining the growth rate, especially before entering the acetate production stage (Millard et al. 2021).

Different types of bioreactor systems have been introduced for large-scale production of recombinant proteins in E.coli (Blunt et al. 2018; Lopes et al. 2014; Muhamad Ali et al. 2015). In one experiment, researchers used a new device (gas-vortex bioreactor) for large-scale production of recombinant proteins in E. coli and claimed their method outperformed the stirred bioreactor (Savelyeva et al. 2017). Culture conditions (such as temperature, oxygen supply, and pH) were confirmed to have a significant effect on the large-scale production of recombinant proteins in E. coli (Zare et al. 2019). In one research, nerve growth factor (β-NGF) protein was heterologously produced in E. coli DE3 strain, and it was reported that the 30% dissolved oxygen and post-induction temperature of 28.5 °C were the best values to produce this protein in the bioreactor (Hajihassan et al. 2019). In another study, improved growth performance of E. coli was reported by manipulating the expression of small non-coding RNAs. These small RNAs play a role in regulating stress conditions such as fluctuations in temperature, pH, dissolved oxygen, glucose, and salt concentration. These stress conditions can occur especially in large-volume bioreactors (Negrete and Shiloach 2017). Different amounts of recombinant proteins have been produced in E. coli. During a batch bioreactor culture, the concentration of Cas9 protein that heterologously was expressed in E. coli was approximately 420.1 mg/L, obtained 6 h after the induction time (Carmignotto and Azzoni 2019). In another study, the produced recombinant TGase enzyme in bioreactor cultivation was 17.5 g/L (Duarte et al. 2021). One of the problems in the high-volume production of recombinant proteins is the risk of bioreactor contamination. The heatinducible-based expression system is generally used at the industrial level for the synthesis of various recombinant proteins, which does not require the addition of chemical inducers and thus minimizes the risks of contamination (Restrepo-Pineda et al. 2021).

The introduction of auto-induction of heterologous gene expression methods is a great advance in heterologous gene expression. Various auto-induction of promoters based on nutrient manipulation (Lu et al. 2020; Tahara et al. 2020) and genetic modification of expression vectors (Shang et al. 2021) has been reported. These auto-induction systems have several advantages over classic ones. They do not need the inducers such as the isopropyl-β-D-thiogalactoside (IPTG) (which is toxic to E. coli (Dvorak et al. 2015; Kosinski et al. 1992; Malakar and Venkatesh 2012) and more expensive than its natural analog, lactose (Khani and Bagheri 2020)) and periodic measurements of the OD600 to add inducers at proper times. Moreover, they are cost-effective and convenient methods for heterologous protein expression. Researchers have reported that by applying the auto-induction method, the amount of recombinant proteins and E. coli biomass has increased in comparison to using an inducer to over-expression of the recombinant proteins (Gao et al. 2010; Lu et al. 2020; Sarduy et al. 2012). Moreover, this strategy was introduced as a suitable method for the production of soluble recombinant proteins (Fathi-Roudsari et al. 2018). In the auto-induction procedure based on nutrient manipulation, researchers utilized a pretty defined medium such as the ZYM-5052 (Sarduy et al. 2012) which lactose in these media functions as a heterologous expression inducer. In auto-induction media based on nutrient manipulation, researchers use a balanced mixture of glucose, lactose, and glycerol as carbon substrates (Li et al. 2011). E.coli first uses glucose as a carbon source, and around the mid-to-late log phase, which is the best time for induction of expression, glucose (the repressor of the lac promoter) is depleted and lactose is used as a carbon source (Greicius et al. 2022). Lactose also acts as a lac promoter inducer, which drives the recombinant protein expression. Glycerol is consumed along with lactose to provide adequate carbon source support (Ukkonen et al. 2013). Adjusting the appropriate proportions of these elements in auto-induction media can be a critical step to produce the desired amount of recombinant proteins.

Localization of recombinant proteins

According to the location of synthesized heterologous proteins in E. coli, they can be classified into different groups. Heterologous proteins can be localized in three different spaces including cytoplasm, periplasm, and extracellular space (secretion into the culture medium) (Fig. 3) (Kleiner‐Grote et al. 2018). Accumulation of recombinant proteins in the cytoplasm is the most common strategy for recombinant protein production in E. coli, which requires the simplest expression system and demonstrates the highest yield (Makrides 1996). The formation of inclusion bodies (IBs) is one of the bottlenecks in the accumulation of overexpressed recombinant proteins in the cytoplasm (Carrió and Villaverde 2001; Ventura and Villaverde 2006). In this case, there is a need to solubilize and refold the aggregated proteins and as a result reduce the yield and bioactivity of the recombinant proteins (Bhatwa et al. 2021). E. coli cytosol has a reducing environment and is not a suitable environment for the expression of recombinant proteins with disulfide bridges. The periplasm of E.coli is a suitable space for proteins that contain disulfide bonds in their structures. These bonds are essential for protein folding and activity (Paraskevopoulou et al. 2022b). Recombinant protein secretion into the periplasm or in the culture medium has many advantages over the intracellular production of recombinant proteins in the form of IBs. It helps in the natural folding and stability of proteins and also helps in the production of active and soluble proteins at a reduced cost (Mergulhão et al. 2005). To translocation of the recombinant proteins into the periplasm from the cytosol, various signal peptides have been proposed and utilized (Karyolaimos et al. 2019; Maleki and Hajihassan 2021; Santos et al. 2019). The presence of a small amount of protease and protein in the periplasm as compared to the cytoplasm facilitates the purification of recombinant proteins. One of the most important aspects of partitioning recombinant proteins in the periplasm is protein purification without needing cell lysis and through osmotic shock (Ramanan et al. 2010). In general, recombinant protein accumulation in the periplasmic space has many advantages over the cytoplasm. The main advantage is the simplicity of purification, but the translocation of recombinant proteins to the periplasmic space requires a more complex expression system, which creates limitations for the production of recombinant proteins (Fig. 3). Moreover, the periplasmic environment is more oxidizing, and hence disulfide bond formation is favored, also the protease activity in the periplasm is lower and as a result the probability of protein stability increases (Makrides 1996; Mergulhão et al. 2005).

Fig. 3
figure 3

Translocation of recombinant proteins in E. coli. a Recombinant proteins in the cytoplasm can be soluble or insoluble b Translocation of recombinant proteins into the periplasmic space through signal peptides. c Secretion of recombinant proteins into the extracellular space

Inclusion bodies are like a double-edged sword

Recombinant proteins may be soluble or insoluble after expression under normal growth conditions. E. coli proteins are stable and soluble, but many proteins, especially those of eukaryotic origin, often cannot be expressed in soluble form in E. coli. In addition, some recombinant proteins may be toxic to E. coli cells when express in an active form and therefore produce in the form of IBs (Saïda et al. 2006; Singh et al. 2005). Recombinant proteins that are produced as IBs in E. coli are inactive, insoluble, and usually have misplaced intramolecular and intermolecular disulfide bonds and free cysteine molecules are unusual in these particles (Fischer et al. 1993). Moreover, these submicron proteinaceous bodies (typically from 50 to 800 nm) are non-toxic to E. coli (Rinas et al. 2017).

IBs are proteins aggregated inside the cytoplasm (Palmer et al. 2012). These bodies are resistant to proteases, and it is easy to separate these bodies from other cellular components. In addition, recombinant proteins that are toxic to the heterologous host can be produced in the form of IBs that are non-toxic to the host cells. The most important problem of IBs is obtaining the active target protein from these bodies, which requires different steps such as denaturation of IBs and refolding and recovery of active target proteins. These steps are very cumbersome and expensive (Singh and Panda 2005) and do not work for all IB types. To obtain active proteins, IBs must be recovered from the cell, which is called the disaggregation process or the solubilization process and involves the disruption or destruction of the secondary and tertiary structures of the proteins. High concentrations of chemical denaturants such as urea and guanidine hydrochloride are used to dissolve IBs (Chew et al. 2020). These materials result in the complete denaturation of secondary structures and often lead to the aggregation of proteins during the refolding process (Singh et al. 2015). Because of these bottlenecks, researchers like to produce recombinant proteins in soluble and active forms rather than IBs. In this section, we describe all the negative and positive features of these bodies.

IBs are insoluble and aggregated forms of recombinant proteins. When researchers want to express heterologous proteins in E.coli, the formation of these bodies is a common phenomenon. Studies have shown that usually six factors are involved in the formation of IBs including, 1- Charge average 2- Turn forming residues 3- Proline fraction 4- Cysteine fraction 5- Hydrophilicity index 6- Total number of residues. The first two factors are more fundamental (Diaz et al. 2010; Hirose and Noguchi 2013). The speed of recombinant protein synthesis is another vital factor in the formation of IBs, which has a direct correlation with the formation of IBs.

There are strategies to reduce the formation IBs and to help the proper folding of recombinant proteins. These approaches include bacterial growth at lower temperatures, selection of suitable bacterial strains, co-expression with a chaperone, amino acid substitution, growth and or induction under osmotic stress conditions in the presence of sorbitol and glycyl betaine, growing in an environment containing non-metabolized sugars, adjusting pH and using strains lacking thioredoxin reductase (Gatti‐Lafranconi et al. 2011; Malakar and Venkatesh 2012). Here. We discuss some of these factors.

Molecular chaperones

Several reasons such as lacking post-translational modifications (PTMs), misfolding, lower solubility capacity, and host metabolic burden are involved in creating IBs (Zhang et al. 2022). One of the strategies to overcome the creation of IBs during the over-expression of recombinant proteins is the co-expression of the target proteins with the chaperone molecules. These chaperones can alleviate IBs construction by helping the proper folding of proteins during or post the translation phase (Jo 2022). Chaperones are molecules present in the cell as intermediates of protein folding and also play a role in maintaining the quality and stability of the proteome (Bhatwa et al. 2021). Sometimes, in stressful conditions such as heat shock, the expression of chaperones is increased. According to the function, chaperones can be divided into three groups: folding, holding, and disaggregating chaperones (Sahdev et al. 2008). In the cytoplasm of E. coli, trigger factor along with GroEL, GroES, DnaK, DnaJ, and GrpE play a role in protein refolding, and GroEL and DnaK are folding chaperones and work by consuming ATP (Mokry et al. 2015; Wang et al. 2018; Xu et al. 2018). Heat shock proteins are named according to their molecular weight (Li and Srivastava 2004). ClpB is one of the disaggregate chaperones that prevent the aggregates of proteins under stress conditions (Deville et al. 2017).

Fusion tags

To increase the solubility of recombinant proteins in E. coli, they can be co-expressed with fusion tags with high solubility features, but it should be kept in mind that not all tags with high solubility are suitable for increasing target protein solubility (Song et al. 2011). A protein called NusA was used as a tag to increase the solubility and proper folding of recombinant proteins (Li et al. 2013). Antigen-binding fragments (Fabs) for production in E. coli must be secreted into the periplasmic space, where oxidizing conditions allow the proper formation of disulfide bonds. So far, various studies have been done to increase the expression of Fabs protein in the periplasmic space in E. coli (Gundinger et al. 2022). In 2019, Luo et al. used the alkaline phosphatase (phoA) promoter in the pAT system and concluded that this system is better than the commonly used T7-based pET system for overexpressing Fabs in E.coli (Luo et al. 2019). The use of peptide tags to increase the solubility of recombinant proteins has been reported in numerous studies (Paraskevopoulou and Falcone 2018). In one study, researchers found that adding a peptide tag called a hexalysine tag (6 K) to the C-terminus of the target protein increased the yield, solubility, and integrity of the target protein. This target protein was directed to the periplasmic compartment (Paraskevopoulou et al. 2022a). In another study, these properties of hexalysine were not observed, but it improved the thermal stability of the recombinant protein (Paraskevopoulou et al. 2019). These results show that the addition of peptide tags can sometimes show different responses based on the target protein. We think that in-silico analysis of the effects of tags on the target protein should be considered before recombinant DNA construction.

Decreasing the growth temperature

During the production of recombinant proteins, low temperature reduces the accumulation of proteins and is one of the ways to reduce the production of IBs. Studies have shown that a growth temperature between 15 and 25 °C can be a suitable temperature when the host is facing the problem of IBs formation (San-Miguel et al. 2013). Very low temperatures may slow down the growth rate and thus decrease the concentration of proteins, in addition, chaperones may not function properly and protein folding may be affected (Strocchi et al. 2006). Studies have shown that low temperature is effective in increasing the solubility of some complex proteins. Hydrophobic bonds, which are effective in creating IBs, are highly temperature dependent, so protein expression under low-temperature conditions leads to high stability and proper folding of proteins (Vera et al. 2007).

Disulfide bond formation in the cytoplasm

Proteins in the form of IB are mostly inactive and insoluble and have to be refolded. Production of proteins with many disulfide bonds is problematic, and refolding of aggregated proteins is difficult (Berkmen 2012; Palmer and Wingfield 2012). The cytosol of E. coli is a reducing environment and triggers the misfolding of proteins that contain intradisulfide bonds and stimulates the construction of IBs (Singh and Panda 2005). The limiting factors in the formation of intracytoplasmic disulfide bonds include the presence of reductase enzymes that inhibit the formation of disulfide bonds as well as the absence of periplasmic oxidase enzymes such as disulfide bond oxidase (DsbA) and disulfide bond isomerase (DsbC) which help the disulfide bonds formation (Hatahet et al. 2014). Recently, engineered strains such as Shuffle and Origami have been released that contain mutations in two genes, thioredoxin reductase (trxB) and glutathione reductase (Gordon et al. 2008). These mutations make the cytoplasm a suitable space for creating disulfide bonds (Hayat et al. 2018).

The Good face of inclusion bodies

Sometimes the production of recombinant proteins in the form of IBs is the best option. The purification of the IBs is easy, and they are an ideal strategy for the production of recombinant proteins that are toxic to the heterologous hosts. Moreover, the production of recombinant proteins in IBs form is much more efficient in terms of yield. It has been identified that the addition of the ssTorA tag can induce IBs formation even in highly soluble proteins such as MBP and thioredoxin (Jong et al. 2017), so this tag can be applied to produce recombinant proteins in the form of IB.

Effect of post-translational modification in recombinant protein expression

The lack of the translational modification mechanism and the formation of IBs are significant challenges during recombinant protein production in E. coli. The engineering of glycosylation pathway and other post-translational modifications in E. coli has great progress in recent years. E. coli is unable to glycosylate expressed proteins, but the discovery of N-glycosylation in gram-negative bacteria, Campylobacter jejuni, and transferring this mechanism into E. coli genome has made it possible to use E. coli to express glycoproteins (Feldman et al. 2005). Another issue of E. coli system is the lack of an acetylation mechanism and the inability to acetylate eukaryotic proteins. Studies have shown that in eukaryotic cells, nearly 98% of proteins are acetylated at the amino terminus (Johnson et al. 2010). Acetylation affects various cellular processes such as DNA replication, repair and recombination processes, protection of genome stability, cytoskeleton dynamics, metabolism, signal transmission, and protein folding. Therefore, the correct performance of acetylation will be necessary to produce active recombinant eukaryotic proteins (Neumann et al. 2008). Moreover, recent studies have shown that E. coli can perform protein phosphorylation through intracellular kinases using adenosine triphosphate as a phosphoryl group donor (Mijakovic et al. 2006).

Applications of recombinant proteins produced in E. coli

Recombinant proteins have many applications. They can be used in vaccine production, drug delivery system, construction of antibodies and enzymes for the treatment of diseases, diagnostic studies, assessment of protein–protein interactions, understanding of biological structures, etc. E. coli is the prevalent host for the production of recombinant proteins (Gupta et al. 2017; Werkmeister and Ramshaw 2012). Myriad of heterologous proteins such as insulin, interferon, Preotact (human parathyroid hormone), Nivestim (filgrastim, rhGCSF), and Preos (parathyroid hormone) have been produced in this organism (Baeshen et al. 2015). Developing novel therapies based on recombinant DNA technology and recombinant proteins for many chronic diseases such as cancer and other rare diseases is under investigation. The worldwide heterologous proteins market size was estimated at USD 1.4 billion in 2022 and is forecasted to reach USD 2.4 billion at a compound annual growth rate (CAGR) of 11.4% by 2027 (https://www.marketsandmarkets.com, 26 Jan 2023). In recent years, due to the emergence of antibiotic-resistant microbes, the production of recombinant antimicrobial peptides is one of the promising strategies to overcome this problem. Many studies have been conducted on the expression of these peptides in E. coli. It has been reported that the production of antimicrobial peptides in E. coli is restricted due to their toxicity to host cells and susceptibility to proteolytic degradation. One of the strategies to circumvent this hindrance in E. coli is the co-expression of these peptides with other proteins (Li 2011; Zorko and Jerala 2010). For the expression of antimicrobial peptides that are rich in disulfide bonds, E. coli Origami or Rosetta-gami strains are suitable because they have mutations in genes that are responsible for reduction reaction that this type of reaction is a hindrance in the formation of disulfide bridges (Wibowo and Zhao 2019). It is expected that the number of recombinant proteins produced in E. coli in the market will be increased with the increase of researchers’ knowledge about the effective factors in the expression of recombinant proteins and the introduction of new strains of E. coli and various expression vectors.

Conclusion and future perspective

The production of various sorts of proteins with different applications in heterologous hosts has been one of the most important achievements of biotechnology and genetic engineering in recent years. Recombinant proteins are used in various industries such as pharmaceuticals, cosmetics, food, agriculture, etc. Nowadays, there is a great demand for the production of recombinant proteins to prevent, diagnose and treat human diseases. Overall, to express a recombinant protein, a DNA molecule containing the open reading frame of a particular protein is identified, isolated, cloned into a vector, and finally expressed in a suitable host. A heterologous gene can be expressed in different types of heterologous hosts, therefore, choosing a suitable host is a critical step. Mammalian cells, bacteria, yeasts, plants, and insects are hosts that have been introduced for heterologous expression of a gene or genes of interest. The bacterial host is the most used expression system due to its simple expression method and low cost. E. coli strains are the most utilized bacterial host for the expression of recombinant proteins. Known and unknown factors are involved in enhancing the expression of recombinant protein in E.coli. These known factors include the selection of suitable expression vectors, the choosing of the adjustable promoter, the picking up of the competent strain of E. coli, codon optimization, concentration of the inducer in the induction time, induction period, the translocation of heterologous proteins into the periplasm or extracellular space, application of proper affinity tags, types and condition of the culture medium, etc. Moreover, E.coli as a heterologous host has strengths and weaknesses. Identification of every aspect of E. coli can be a useful strategy to boost its efficiency in recombinant protein production. Here, we discussed the important elements that are involved in heterologous gene expression in E. coli. One of these elements is the IBs formation. Over-expression of recombinant proteins in E.coli is very likely to form IBs. These bodies are resistant to proteases and harvesting them from lysed cells can be done through moderate-speed centrifugation as well as in the case of proteins that are toxic to the host, IBs are the best solution to overcome this challenge. These characteristics of the IBs are a shining clue that directs us to exploit their properties. So, the introduction of one or several common, efficient, cost-effective methods for solubilization and refolding of any type of aggregated recombinant proteins would be a major breakthrough in this science. Today, with the sharp progress in omics data and the existence of engineered endonucleases, it is possible to identify the function of genes by heterologous expression or knocking down or knocking out of them in the native host. This information will help us better understand the behavior of the E. coli host in different situations and exploit this data to adapt it. One of the limitations in recombinant protein expression in E. coli is the lack of an extracellular secretion system. A promising method for this hindrance would be the introduction of genes of non-indigenous extracellular secretion pathways into E. coli. In general, progress is limitless and as scientists, we should accelerate it.