Recombinant membrane protein biosynthesis

Membrane proteins (MP) constitute 20–30% of all proteins encoded by the genome of various organisms (Lantez et al. 2015) and perform a wide range of essential biological functions, thus representing the largest class of protein drug targets (Bernaudat et al. 2011). However and despite their biological relevance, most of these targets still do not have any assigned function (Bernaudat et al. 2011), as reflected by the relatively low number of MP structures recorded in Stephen White’s laboratory database (http://blanco.biomol.uci.edu/mpstruc/)—876 unique MP structures in March 2019. Indeed, determining the structure of a MP is quite complex, mostly due to problems arising from MP low natural abundance, their toxicity when overexpressed in heterologous systems, and difficulties in purifying stable functional proteins and obtaining well-diffracting crystals (Gul et al. 2014; Lantez et al. 2015). To cope with MP low natural abundance that limits subsequent structural and functional studies, four different approaches have been proposed (Popot 2018), namely (1) overexpression in vivo and in situ; (2) overexpression in vivo in inclusion bodies; (3) cell-free expression (CFE) in vitro; (4) chemical synthesis for short MP or MP fragments. Here, we will generally address the first two approaches based on the following host cells: Escherichia coli (E. coli), Pichia pastoris (P. pastoris), also known as Komagataella phaffii, mammalian cell lines. The process to obtain a recombinant protein involves the synergy of three key elements—a gene, a vector, and an expression host—(Bernaudat et al. 2011) and, at least at the theoretical level, is straightforward (Rosano and Ceccarelli 2014). In practice, many things can go wrong, and distinct problems can be found including poor growth of the host, inclusion body formation, or lack of protein biological activity (Rosano and Ceccarelli 2014). Indeed, targeting an overexpressed MP to a membrane in such a way they can insert and achieve its native structure is far from being an easy task, once they tend to be toxic, leading to low expression yields of often misfolded and aggregated MP (Popot 2018; Rajesh et al. 2011). Moreover, the high diversity of structures and physico-chemical properties displayed by MP makes unfeasible to accurately predict if a protein of interest will express well, be easy to purify, and be biologically active or crystallize in any given experimental protocol (Bernaudat et al. 2011). Based on the exposed, the development of improved strategies in the recombinant MP production pipeline foreseeing to increase their expression yields in a correctly folded form is crucial in MP research. The evaluation of purified protein quality is crucial in any protein production process and should be accurately performed to avoid irreproducible and misleading observations in the subsequent studies (Raynal et al. 2014). After production, MP need to be efficiently solubilized (recently reviewed by Hardy et al. 2018 and Popot 2018) and purified (Pandey et al. 2016), from which their quality in terms of purity, homogeneity, activity, and structural conformity should be assessed (Oliveira and Domingues 2018; Raynal et al. 2014). In this review, generic guidelines and host characteristics aiming an accurate choice of the host expression system that better suits particular needs will be initially overviewed in this review, and then we discuss important advances reported at the level of the upstream stage of recombinant MP production processes using E. coli, P. pastoris, and mammalian cell lines, representative of major expression systems used for protein expression. Subsequently, general techniques to perform the quality control of the target protein are presented and at the end, insights and directions for a successful MP production pipeline are shown.

Economics vs complexity: guidelines to choose the right host

The most common systems for MP overexpression are microbial (bacteria or yeasts) or higher eukaryotes (insect or mammalian cells) [reviewed in (Bernaudat et al. 2011; Fernández and Vega 2016; He et al. 2014; Midgett and Madden 2007; Wagner et al. 2006)]. There is no such a perfect host that suits all MP expression projects once they all have advantages and limitations, as highlighted in Table 1. Moreover, the reasons why some MP are overexpressed but others are expressed at low levels are not fully known, although it can be related to how difficult is to fold MP into a functional state (Andréll and Tate 2013).

Table 1 Major advantages, limitations, and general characteristics of recombinant membrane protein expression systems

In terms of increasing complexity, the expression systems can be grouped as follows: bacteria < yeasts < insect cells < mammalian cell lines. With an increasing complexity, there is generally an increase in the ability of the host cell to perform native post-translational modifications (PTM). As such, heavily glycosylated proteins are expected to be produced in a more native and folded form from mammalian cell lines, and those obtained from yeasts may not present the native glycosylation profile. On the other hand, simpler hosts such as bacteria allow high productivities, and combine the speed with easiness of operation at a lower cost.

Requirements in terms of specific PTM or a near-native-like environment for some mammalian MP are usually the factors dictating the choice of mammalian cell lines, which usually makes use of human embryo kidney (HEK) and Chinese hamster ovary (CHO) cell lines and both cell lines can be applied in stable and transient transfections (He et al. 2014; Lyons et al. 2016). The process of recombinant protein production by transient expression involves the generation of plasmid, transfection in log phase, optional feeds from 24 h onwards, and then harvest from 48 h to 14 days, depending on the target protein, cell line, and culture conditions applied (McKenzie and Abbott 2018). Contrasting with transient expression, stably transfected cell lines takes more time (months) and usually requires the stable integration of the recombinant DNA into the host cell genome. Since the expression vector has a gene conferring resistance to an antibiotic, stable integrants can be identified by antibiotic selection; moreover, the integration of the gene into the host genome may be random or the host cell can be engineered to contain a specific sequence recognized by a recombinase that allows targeted integration. Selection of clonal cells is additionally required to identify highly expressing cell lines that are stable under prolonged culture (Andréll and Tate 2013). Transient transfection is quick but after scaling-up, batch-to-batch variability in the amount of protein expressed is often observed; on the other hand, although stable gene expression is initially slower and more technically challenging once a clonal cell line is generated, long-term overexpression can be much more consistent, and the purification of large quantities of supercoiled plasmid DNA for transient expression is not required (Chaudhary et al. 2012; Andréll and Tate 2013). Despite the slow growth rate and usually higher cost, the number of MP structures generated based on such systems has considerably increased, being foreseeable that with the increasing use of cryo electron microscopy for structure determination wherein lesser amount of sample is required (e.g., in comparison with crystallographic studies), mammalian systems will be more frequently used (Lyons et al. 2016).

Other interesting features to be considered when selecting a host: (1) native intracellular localization of the target protein; proteins that function in specific eukaryotic organelles such as mitochondria, chloroplasts, and peroxisomes will generally benefit from expression hosts that possess such organelles (Fernández and Vega 2016); (2) types of lipids of host membranes; hydrophobic mismatch may occur due to differences in lipid bilayer composition and thickness between hosts, as highlighted for the overexpression of eukaryotic MP in bacteria, where the absence of sterols, sphingolipids and poly-unsaturated fatty acids in E. coli bilayers poses additional challenges to their proper folding (Snijder and Hakulinen 2016); (3) Construct size; proteins larger than 120 kDa are difficult to be efficiently expressed in E. coli, and are typically obtained in very low yields, as inclusion bodies or proteolytically degraded (Fernández and Vega 2016).

To overcome the limitations displayed by these in vivo expression systems—toxicity, limited membrane space for MP functional folding and inefficient transport, and membrane insertion mechanisms—CFE systems have been reported, which rely on the use of prokaryotic and eukaryotic protein synthesis machinery and related elements to direct protein synthesis from added DNA or mRNA templates (He et al. 2011; Henrich et al. 2015; Zheng et al. 2014). In a different way, the preparation of highly hydrophobic peptides representing functional parts of MPs foreseeing their application onto structural and functional studies can be attained via chemical synthesis (Baumruck et al. 2018). Previously, Fernández and Vega (2016) reported some recommendations on which expression host use for a particular protein.

Upstream strategies to improve membrane protein expression levels and/or folding

Membrane protein research strongly relies on recombinant production, which is vital for obtaining high quantities of properly folded proteins for further biophysical and functional testing. While it is difficult to define a set of guidelines generally applicable to all MP, here, we review distinct strategies (according to Fig. 1) that have been used to increase MP expression and/or folding using E. coli, P. pastoris, and mammalian cell-based systems (Summarized in Tables S1, S2 and S3 in Electronic Supplementary Information).

Fig. 1
figure 1

Overview of the topics included in this review amenable to optimization and, thus, relevant for obtaining a successful strategy for recombinant MP biosynthesis

Escherichia coli

Escherichia coli expression systems have been largely investigated for recombinant protein production processes, although with a lower success rate for membrane proteins than for soluble proteins. Aiming to reverse this trend, researchers have driven their efforts to develop enhanced upstream stages encompassing optimizations at the genetic-level, strain engineering, or culture conditions, which are reviewed in Table S1 (Electronic Supplementary Information).

Genetic-level strategies

The expression of proteins outside their original context can pose additional constraints since they might contain codons that are rarely used in the desired host, come from organisms that use non-canonical code, or contain expression-limiting regulatory elements within their coding sequence. The genetic code contains 61 nucleotide triplets (codons) to encode 20 amino acids and 3 codons to terminate translation, and such degeneracy enables many alternative nucleic acid sequences to encode the same protein. Moreover, the frequencies with which different codons are used vary significantly between different organisms, between proteins expressed at high or low levels within the same organism, and sometimes even within the same operon (Gustafsson et al. 2004; Welch et al 2011). Indeed, each organism seems to prefer a different set of codons over others, a phenomenon termed as codon bias (Quax et al. 2015). Based on these observations, metrics for the frequency of optimal codons were proposed, such as the commonly used codon adaptation index (CAI). The CAI for a certain organism is based on the codon usage frequency in a reference set of highly expressed genes, such as the ones encoding ribosomal proteins and the CAI for a specific gene can be determined by comparing its codon usage frequency with this reference set (Sharp and Li 1987; Quax et al. 2015).

Different codon biases are also correlated with the amount of the corresponding tRNAs, which vary between organisms; for example, eukaryotes commonly use the AGG codon for arginine, although it is rarely used in E. coli (Gustafsson et al. 2004). If this exerts a negative effect on heterologous gene expression, then the use of the use of E. coli strains overexpressing rare tRNAs (which are commercially available) can improve the yields of target proteins, as previously shown for different constructs of connexin carboxyl-terminal domains attached to their 4th transmembrane domain (Kopanic et al. 2013).

Moreover, the more codons that a gene contains that are rarely used in the expression host, the less likely it is that the heterologous protein will be expressed at reasonable levels and low levels are exacerbated if the rare codons appear in clusters or in the N-terminal. A strategy to overcome this problem involves sequence re-design by changing the rare codons to codons that more closely reflect the codon usage of the host without modifying the amino acid sequence of the encoded protein (Gustafsson et al. 2004). Automated codon optimization algorithms have been developed to design coding sequences optimized for increased expression in certain hosts and codon optimization services are currently offered by DNA synthesis companies, which often rely on confidential algorithms. These algorithms optimize codon usage by maximizing a gene’s CAI to match that of the expression host, along with optimizing for some sequence features such as GC content and avoidance of repeats and motifs such as ribonuclease recognition sites, transcriptional terminator sites, Shine-Dalgarno-like sequences, and sequences that lead to strong mRNA secondary structures (Quax et al. 2015). On-line tools to gene design such as the OPTIMIZER (http://genomes.urv.es/OPTIMIZER/) (Puigbò et al. 2007) or to analyze codon usage including the CAIcal (http://genomes.urv.cat/CAIcal/) (Puigbo et al. 2008) are currently available, among many others which make use of distinct optimization parameters (reviewed by Angov 2011; Gould et al. 2014; Parret et al. 2016).

Based on the rationale that changes in protein structure and function can occur after synonymous codon replacement and that protein structure is DNA sequence-dependent, alternative approaches for synonymous codon design such as the “codon harmonization algorithm” have been proposed, which adapts the codons in a way that the original codon landscape of the gene in the original host is maintained in the expression hosts (Angov et al. 2008; Quax et al. 2015). The authors considered that protein synthesis and folding in E. coli is co-translational and that nucleotide sequence-dependent modulation of translational kinetics might influence nascent polypeptide folding. Therefore, in this approach, synonymous codons from E. coli were selected that match as closely as possible the codon usage frequency used in the native gene, unless empirical structure calculations show that the codons are associated with putative link/end segments which therefore should be translated slowly (Angov et al. 2008). Claassens et al. (2017) studied the performance of this codon harmonization algorithm and compared with the wild-type variant and optimized gene variants (resorting to proprietary GeneOptimizer algorithm from GeneArt) using different proton-pumping rhodopsins and enzymes from archaea, bacteria, and eukarya. Codon harmonization was performed using a codon harmonizer tool (http://codonharmonizer.systemsbiology.nl) based on the harmonization algorithm initially proposed by Angov et al. (2008), and uses the codon usage frequency tables for the native and expression hosts, based on all codons in the protein-coding genes annotated in NCBI genome assemblies as inputs. The “codon frequency landscapes” were generated and were evaluated quantitatively based upon a proposed Codon Harmonization Index (CHI), in which a value close to 0 indicates a well-harmonized gene; all harmonized variants have a CHI < 0.1 while all codon-optimized and wild-type variants deviate further from the native codon landscape and consequently present CHI higher than that of harmonized variants (> 0.183). It was additionally observed that transcriptional tuning (in this case by changing the concentration of L-rhamnose) generally improves heterologous production of the distinct variants, although the concentration of rhamnose frequently differs among different codon usage variants of the same protein. In general, harmonization is beneficial for increasing membrane-embedded production compared to wild-type variants for some proteins, for which in this study, the wild-type CHI score is also highest (as in the case of leptosphaeria rhodopsin, CHI = 0.279). Moreover, when the codon landscape of the wild-type gene in E. coli largely deviates from the landscape in the native hosts, harmonization seems to be a promising approach for increasing MP production (Claassens et al. 2017). Recent developments point out that irrespective the algorithm used, using a bicistronic design (in comparison with a monocistronic design) does improve protein production in E. coli as it may eliminate the translation initiation as the rate-limiting step of the translation process (Nieuwkoop et al. 2019). It should be also remarked the importance of using updated codon usage tables. In this way, Athey et al. (2017) reported a database (available at hive.biochemistry.gwu.edu/review/codon) aiming to present and analyse codon usage tables for every organism with public available sequencing data, and which is being routinely updated to keep up with the continuous flow of new data.

Instead of whole sequence optimization, synonymous codon substitutions in the region adjacent to the AUG start may lead to significant improvements in expression, thus circumventing the need to consider whole sequence optimization (Nørholm et al. 2013). Indeed, codon usage optimization of the N-terminal guarantee an efficient translation start, which have been proved to enhance human tetraspan vesicle protein/TVP) Synaptogyrin 1 expression in E. coli (Löw et al. 2012). Recently, Saladi et al. (2018) developed a data-driven statistical predictor named “IMProve,” which combines a set of sequence-derived features resulting in an IMProve score. As this value increases, there is also an increase in the probability of success, i.e., selecting a MP that expresses in E. coli. Currently, the characterization of an integral MP involves the identification and testing of multiple homologs or variants for expression and the predictive power of “IMProve” enables to enrich for positive outcomes by 2-fold by providing a low-barrier-to-entry (Saladi et al. 2018).

Throughout the years, codon optimizations have been performed on a first screening basis aiming an increase in the yields of properly folded MP, and with much success without noticeable changes in protein structure and function. However, the increasing understanding of the principles of codon bias and mechanisms of translation have been unveiling yet unknown features. In fact, synonymous codons are known to potentially affect protein expression at various levels and increasing evidences have been showing that translation is affected, leading to dramatic alterations in the conformation and processing of some proteins (Mauro 2018). Overall, codon optimization seems appropriate for some applications, e.g., protein evolution and increasing the expression and/or activity of industrial enzymes; however, for recombinant expression of proteins for therapeutics, we should also aim to maintain the conformation and processing of the natural protein sequences (Mauro 2018).

In E. coli and due to the higher copy number of the target gene usually achieved with plasmid-based systems, recombinant proteins are typically expressed in E. coli from medium to high plasmid copy number (PCN) based on a Col1E-derived origin of replication, (Baneyx 1999). The PCN is correlated with the recombinant gene dosage and can be accurately determined by quantitative Polymerase-Chain Reaction (qPCR) procedures (Lee et al. 2006; Martins et al. 2015). A recent study by Jensen et al. (2017) provided a systematic approach to identify gene disruptions that increase MP expression in E. coli and can be used to improve expression of any protein that poses a cellular burden.

Based on the combination of some the above-mentioned strategies, namely, “codon harmonization,” use of low copy number vectors with moderate strength, suitable leader sequences, and optimization of cell culture conditions, increased targeting to E. coli outer membrane of Chlamydia trachomatis major outer membrane protein was observed and the formation of inclusion bodies avoided (Wen et al. 2016). On the other hand, prokaryotic expression vectors using the rhaB promoter which are almost completely repressed until induced can be suitable for the expression of toxic proteins (Giacalone et al. 2006).

Strain engineering

Remarkable enhancements in MP expression from E. coli-based systems have been achieved with engineered strains due to their improved ability to cope with MP-induced toxicity, more efficient chaperone pathways, different substrate uptake rates, or reinforced integrity of intracellular structures, e.g., periplasmic space. Earlier observations have shown that protein (including but not limited to MP) overexpression driven by the T7 RNA polymerase in E. coli BL21 (DE3) cells can be limited or prevented by cell death (Miroux and Walker 1996). In this regard, by plating E. coli BL21 (DE3) cells expressing toxic proteins (oxoglutarate-malate carrier protein from mitochondrial membranes and subunit b of bacterial F-ATPase) in agar plates containing IPTG (for a review of these methods, please refer to Schlegel et al. 2017), Miroux and Walker (1996) were able to isolate two survivors, the mutant host strains C41 (DE3) and C43 (DE3), which have become known as the “Walker strains” and widely used for MP overexpression. Latter studies showed that mutations in the lacUV5 promoter governing expression of T7 RNA polymerase are the key to improved MP overexpression characteristics of C41 (DE3) and C43 (DE3) strains (Wagner et al. 2008). The rationale behind the application of BL21 (DE3) for protein production was that T7 RNA polymerase transcribes faster than E. coli RNA polymerase and more mRNA results in more overexpressed protein. However, for most MP, strong overexpression leads to the production of more protein than the Sec translocon can process, thus impairing their insertion into the membrane, which thereby highlights the need to tune MP expression aiming to avoid Sec saturation (Wagner et al. 2008). Based on these observations, Wagner et al. (2008) engineered a new BL21 (DE3) derivative strain designated Lemo21 (DE3) wherein the activity of the T7 RNA polymerase can be precisely controlled by its natural inhibitor T7 lysozyme, which plasmid was under the control of the well-titratable rhamnose promoter (Wagner et al. 2008; Schlegel et al. 2012). The expression of insertase YidC fused to GFP in the cytoplasmic membrane of Lemo21 (DE3) strain was maximal at 1000 μM rhamnose, and was additionally demonstrated that this strain is compatible with auto-induction media (Schlegel et al. 2012). More recently, Baumgarten et al. (2017) isolated the mutant56 (DE3) [Mt56 (DE3)] from BL21(DE3) expressing YidC C-terminally fused to GFP, which allows to evaluate if the produced proteins are being targeted to the cytoplasmic membrane. The authors found that this strain produced several MP in higher levels than C41 (DE3), C43 (DE3), or BL21 (DE3), and its improved performance attributed a mutation in the gene encoding T7 RNA polymerase in position 305 (C:G–A:T transversion), leading to a single amino acid exchange in T7 RNA polymerase (A102D). Rather than lowering T7 RNA polymerase levels [as with C41 (DE3) and C43 (DE3)], the A102D mutation weakens the binding of the T7 RNA polymerase to the T7 promoter governing target gene expression (Baumgarten et al. 2017).

Envisaging an increase in the amount of membrane-embedded and correctly folded mammalian GPCRs (G protein-coupled receptor), Skretas et al. (2012) screened libraries of genomic fragments using two different flow cytometric assays, namely, by monitoring the binding of a fluorescently labeled ligand to active GPCR and the fluorescence of GPCR-GFP fusions. These screens allowed the isolation of the genes nagD (encoding the ribonucleotide phosphatase NagD), nlpDΔ (encoding a C-terminal truncation of the putative outer membrane lipoprotein NlpD), and the three-gene cluster ptsN-yhbJ-npr (encoding three proteins of the nitrogen phosphotransferase system) and was additionally proved that their co-expression leads to a marked increase of membrane-integrated and well-folded GPCR and also a prokaryotic MP (Skretas et al. 2012). In general, it seems that the enhanced effect is not due to a direct interaction of these genes with the target proteins, but instead by indirect effects, namely, induction of stress responses or changes in the composition of the bacterial periplasm (Skretas et al. 2012). Foreseeing the identification of genes whose co-expression can supress MP-induced toxicity, a genome wide screen identified two potent suppressors, namely, djlA (encoding the membrane-bound DNAk cochaperone DjlA) and rraA (encoding RRaA), an inhibitor of the mRNA-degrading activity of the E. coli RNase E (Gialama et al. 2017). E. coli strains co-expressing djlA or rrA, referred as SuptoxD and SuptoxR, respectively, strains were found to have a consistent behavior regarding an enhancement production of distinct MP, namely, from mammalian and bacterial origin and with different topologies, and perform better than other commercially available strains (Gialama et al. 2017).

Another method to mitigate the toxic effect of overexpression is “restrained expression,” in which the production of T7 RNA polymerase and the target gene are controlled by distinct promoters, respectively, the arabinose promoter and T7lac promoter (Narayanan et al. 2011). Under “restrained expression” conditions, namely, addition of minimal quantities of arabinose (0.01%) to produce low levels of T7 RNA polymerase and omission of IPTG, aiming to explore the occasional derepression occurring at the lac operator site of T7lac promoter, an increase of 5- to 25-fold in the expression of homologs of cardiac Na+/Ca2+ exchanger were obtained, in comparison with IPTG-induction. Moreover, improvements were also found per unit of OD600 nm of cells, indicating that “restrained expression” is associated with decreased cellular toxicity. In general, by reducing the frequency of transcription initiation, protein production is slower, which is unlikely to saturate the biogenesis machinery, thereby providing the explanation for the decreased cytoplasmic aggregation and the attendant cytotoxicity when comparing “restrained” and “rapid” (induction with arabinose and IPTG) expression (Narayanan et al. 2011). Nannenga and Baneyx (2011) reported the expression of MP in Δtig strains [Transcription factor (TF) deficient] which due to TF inactivation, the signal recognition particle (SRP) has unimpeded access to the nascent transmembrane segment, thus resulting in targeting of MP to the inner membrane, while Yidc overproduction promotes MP insertion and folding in the lipid bilayer.

A distinct approach aiming an enhancement in the production of soluble integral membrane spanning proteins relied on engineering E. coli wild-type AF1000 to reduce the growth rate/substrate uptake rate, accomplished by deletions in the phosphoenolpyruvate carbohydrate:phosphotransferase system (PTS), which is responsible for the uptake of various sugars in E. coli (Backlund et al. 2011). Distinct mutant strains unable to take up glucose were obtained, and characterized as follows: a defective enzyme IIABMan, which unspecifically controls the uptake of mannose but also allows glucose passage (ptsM); a defective enzyme IIBCGlc (ptsG), specific for glucose uptake, and the double mutant (ptsG, ptsM). As a result of the removal of ptsG, these mutants display a reduced growth rate at high glucose concentrations but they can grow to high cell densities [although more slowly than BL21(DE3)] since they produce no acetic acid. In general, these strains were able to produce some of the MP in study in relatively larger quantities than BL21 (DE3) but whether this enhanced ability is due to the low growth rate or the lack of acetic acid production was not totally clarified (Backlund et al. 2011).

Finally, based on the previously published protocols used for MP structure determination, Bruno Miroux research group (Hattab et al. 2015) revealed the preferences of E. coli strain-vector combinations for an optimal use of this expression system and successful production of MP. At that time (June 2014), they found that for the determination of 141 unique non E. coli MP structures, 163 expression vector/bacterial hosts were applied, from which T7 promoter was dominant (63%), followed by the arabinose, tac, and T5 promoter-based expression systems (17%, 9%, and 7%, respectively). Moreover, within T7-based expression systems, the host BL21 (DE3) was the most employed, followed by the mutants C43 (DE3) and C41 (DE3), accounting with 40, 18, and 16 MP structures, respectively. Overall, this study shows that C41 (DE3) and C43 (DE3) mutants together with the parental host BL21 (DE3) have contributed significantly for the success of bacterial expression systems in structural biology of MP, in which the mutants have been preferably applied for the production of difficult to express MP. Additional remarks show that IPTG concentration and growth temperature are important parameters complementary to the choice of a bacterial host, and that a high copy number vector should be used with C41 (DE3) to take advantage of the strength of the T7-based expression system, whereas for more difficult MP, the mutant C43 (DE3), especially with low copy number plasmids allows to attenuate the transcription of the target gene (Hattab et al. 2015).

Protein fusion methodologies

Aiming to increase MP solubility and folding or to easily track their expression levels, MP have been expressed with distinct fusion partners (tags) such as SUMO (small ubiquitin-related modified), MBP (maltose-binding protein), or GFP (green fluorescent protein), synthesized either as translational (Zuo et al. 2005; Liu et al. 2012) or transcriptional fusions (Marino et al. 2015). In translational fusions, the N-terminal fusion partners are part of the same protein chain of the membrane protein and can be cleaved off after protein production if any proteolytic cleavage site is introduced. On the other hand, transcriptional fusions exploit the presence of an additional RNA sequence upstream of the mRNA sequence of the target MP, leading to a bicistronic mRNA (Marino et al. 2017). As a result, the ribosome produces two distinct protein products during translation, thereby eliminating the need to enzimatically remove the fusion protein during purification (Marino et al. 2015). As opposed to translational fusions, transcriptional fusions do not lead to a physical linkage of the fusion protein and MP, which eliminates potential interference of the fusion partner in proper folding and functionality of the target protein (Marino et al. 2015; Marino et al. 2017). Distinct solubility enhancer tags such as SUMO, MBP, TrxA (thioredoxin), or GST (glutathione-S-transferase) with sizes ranging from 7 to 495 amino acids have been reported (Costa et al. 2014). Based on the knowledge that ubiquitin exerts chaperoning properties on fused proteins, translational fusions with the ubiquitin-like protein SUMO were successfully explored toward an enhancement of the solubility and biological activity of the severe acute respiratory syndrome coronavirus (SARS-CoV) MP and 5-lipoxygenase-activating protein (FLAP) (Zuo et al. 2005). An additional advantage is that SUMO fusion can be cleaved with high specificity by SUMO protease 1 and generates a protein with the native N-terminal (Zuo et al. 2005). On the other hand, Liu et al. (2012) evaluated different constructs resorting translational fusions of selenoprotein K envisaging its overexpression in E. coli better results were achieved with cytoplasmic MBP over periplasmic MBP and SUMO (Liu et al. 2012). In addition to the chaperoning properties displayed by MBP and SUMO, these fusion partners also protect the target proteins from degradation by promoting their translocation from the cytosol to the cell membrane (MBP) and nucleus (SUMO) where less protease content exists (Costa et al. 2014). Noteworthy, beyond an increase in the target protein solubility—solubility enhancer—the natural affinity of MBP toward immobilized amylose resins can also be explored as a purification tool; however, this binding is highly dependent on the nature of the target protein as it can block or reduce the amylose interaction (Costa et al. 2014). Translational fusions encompassing a solubility enhancer tag—MBP—and an affinity tag—His-tag—to accomplish the dual purpose of increasing the solubility of MP while exploring their high affinity onto specific affinity chromatographic matrices for purification are feasible, as previously reported for selenoprotein K (Liu et al. 2012). A distinct strategy envisaging to target proteins to E. coli inner membrane reported by Luo et al. (2009) is based on the fusion of a novel partner (P8CBD) to prokaryotic and eukaryotic MP. P8CBD was carefully designed and the DNA encoding 58 amino acid residues of E. coli Signal peptidase to provide a second transmembrane segment aiming to extend the protein fusion junction into the periplasmic space, which was selected based on its ability to efficiently establish the desired orientation within the inner membrane (Luo et al. 2009). A chitin binding domain was also engineered to act as an optional affinity tag or detection epitope while at the fusion junction, an enterokinase cleavage site and corresponding FLAG epitope were also incorporated. Overall, by making use of the Signal Recognition Particle (SRP) membrane targeting pathway, the expression and membrane translocation of P8CBD fusion proteins is enhanced (Luo et al. 2009). The location of translational fusions is an important factor since they can promote different effects when placed at the N-terminus or C-terminus (Costa et al. 2014). This is better exemplified by the attachment of affinity oligohistidine tags to the periplasmic terminus of E. coli transporters, which is detrimental for their expression (Rahman et al. 2007). A possible explanation for this relies on a possible interference of oligohistidine sequences with the proper translocation of the adjacent segments of the protein across the membrane during biosynthesis once the charge distribution across transmembrane segments is known to have a profound effect on their orientation (Rahman et al. 2007). The optimum location of the tag is also influenced by the topology of MP. Although Nin-Cin topologies dominate the membrane proteomes of most organisms, one or both termini of a substantial fraction of MP are located on the extracellular or periplasmic side of the membrane, for which tandem Strep-tag II sequences or oligohistidine tags fused to MBP and a signal sequence should be applied (Ma et al. 2015).

Unlike translational fusions, there is no need to proceed to the enzymatic removal of transcriptional tags once there is no physical linkage between the target MP and the fusion tag (Marino et al. 2017). Marino et al. (2015) compared the expression of different proteins using translational and transcriptional fusions of genes coding for the fusion proteins Mistic (membrane-integrating sequence for translation of inner membrane proteins from Bacillus subtilis), SUMO, and a shorter version of YBeL respectively, mstX, sumo, and ybeL. They created bicistronic mRNA cassettes where the stop codon of the preceding gene (mstX, sumo, or ybeL) overlaps with the start codon of the target protein, thereby mimicking a common genetic organization observed for bacterial operons (Marino et al. 2015). They observed an enhanced expression of MP via transcriptional fusions with mstX and ybeL, and the cause of this effect cannot be atributted to re-initiation of ribosomes, but instead is most likely atributted to the enhanced translation initiation by a more favorable secondary structure in the transcript (Marino et al. 2015).

Another major breakthrough within this field in many expression systems was made through fusion of fluorescent reporters such as GFP to the target MP (Drew et al. 2001; Goehring et al. 2014; Gul et al. 2014), which behaves as a folding indicator of the target MP and allowing to infer on their expression levels. This process usually relies on fusing GFP to the C-terminal of proteins; since GFP only becomes fluorescent if the MP integrates in the cytoplasmic membrane, it allows to distinguish between MP overexpression in the cytoplasmic membrane and in inclusion bodies at any stage during overexpression, solubilization, and purification (Drew et al. 2001; Drew et al. 2006). In addition, GFP will only become fluorescent if the MP has a Cin topology, i.e., the C-terminus is cytoplasmic (Drew et al. 2006). Noteworthy, fluorescence in whole cells can be detected with a detection limit as low as 10 μg of GFP per liter of culture, and can also be determined in standard SDS polyacrylamide gels with a detection limit of less than 5 ng of GFP per protein band (Drew et al. 2006). Also, based on the use of GFP as a fusion partner, Nji et al. (2018) recently reported a fluorescence detection size exclusion chromatography-based thermostability assay (FSEC-TS) that allows measuring apparent melting temperatures (Tm) of MP in the absence and presence of distinct lipids, which can be helpful to identify which lipids can have a stabilizing effect for a particular target.

In addition to GFP, Gul et al. (2014) reported the translational fusion of the erythromycin resistance protein (23 S ribosomal RNA adenine N-6 methyltransferase, ErmC) (in tandem with GFP) to the C-terminus of different bacterial MP wherein GFP fluorescence was applied to report the folding state of the target protein and ErmC to select for increased expression. Evolved strains termed NG were selected in increasing concentrations of erythromycin which carry out a mutation in hns gene, and the degree of MP expression correlates with the severity of hns mutation, although its deletion resulted in an intermediate expression. Overall, in each NG strain, the amount of fluorescent (folded) protein and the ratio of folded over misfolded protein increased up to 10-fold relative to the parental strain BW25113B (Gul et al. 2014). Another approach to easily detect the expression levels of MP was reported by Hsu et al. (2013) which is based on the use of mutated bacteriorhodopsin from Haloarcula marismortui as a fusion partner, and which unlike GFP, MP overexpression can be detected by naked eye or by directly monitoring their optical absorption.

Aiming to select mutants of E. coli that improve MP expression, Massey-Gendel et al. (2009) reported an approach that relies on fusing the targeted MP to a C-terminal selectable marker that confers a drug resistance phenotype (Massey-Gendel et al. 2009). The rationale behind this strategy is that the production of the selectable marker and survival on selective media is linked to expression of the targeted MP, namely, when the c-terminus is in the cytoplasm. After the selection of the mutants, curing of isolated mutants is performed by in vivo digestion with the homing endonuclease I-CreI (Massey-Gendel et al. 2009).

Recently, Mizrachi et al. (2015) developed a technique called SIMPLEx (Solubilization of Integral MP with high Levels of Expression), which allows the direct expression of soluble products in living cells by fusing the target MP with the carboxyl terminal of apolipoprotein A-1 (ApoAI*). In addition, a highly soluble “decoy” protein from Borrelia burgdorferi, namely, the outer surface protein A (MBP lacking its N-terminal signal peptide can also be used) was fused to the N-terminus to prevent the E. coli secretory pathway to introduce the protein in inner membrane. Acting as an amphipathic proteic “shield” which sequester MP from water, ApoAI* promotes the solubilization of structurally diverse MP (bitopic α-helical, polytopic α-helical, and polytopic β-barrel) and yields of EmrE-solubilized dimers and tetramers (EmrE basic functional units) ranged between 8 and 10 mg/L of culture after Nickel affinity chromatography. ApoAI*-solubilized EmrE (E. coli ethidium multidrug resistance protein E) was amenable to structural characterization including negative staining electron microscopy, dynamic light scattering, and SAXS (Small angle X-ray scattering) data collection (Mizrachi et al. 2015).

Pichia pastoris

Genetic-level strategies

Yeasts and particularly P. pastoris are highly attractive alternatives for MP expression as they represent low-cost cultivation and high-quantity production platforms, meeting the demand for criteria of safety and authentically process proteins (Emmerstorfer-Augustin et al. 2019). Pichia pastoris systems usually rely on the use of integrative plasmids containing the gene of interest which are integrated into the yeast genome, generating stable production strains (Dilworth et al. 2018). Moreover, protein production is usually accomplished resorting the alcohol oxidase promoter (AOX), which is inducible by methanol and depending on the functionality of 1 or both aox genes, recombinant strains may present a MutS or Mut+ phenotype exhibiting different growth behaviors (in methanol) and different methanol requirements for induction. Other commonly used promoter is the constitutive glyceraldehyde-3-phosphate (GAP) dehydrogenase promoter (Gonçalves et al. 2013; Ramón and Marín 2011).

In the last years, studies have shown that distinct recombinant gene dosages and codon usage optimizations greatly influence MP expression levels in P. pastoris. As mentioned above, P. pastoris expression systems usually rely on expression plasmids that are integrated into the yeast genome and multi-copy clones—the so-called “jackpot clones”—can be selected experimentally by screening several colonies in increasing concentrations of antibiotic (Dilworth et al. 2018). Nordén et al. (2011) performed a two-step antibiotic selection, initially with 100 μg/mL zeocin and then with higher concentrations, from which they isolated multi-copy clones and observed that the expression of different aquaporins strongly respond to an increase in recombinant gene dosage, independently of the amount of protein expressed from a single gene copy clone. However, despite higher recombinant gene dosages can lead to higher titers of recombinant proteins, this correlation is not always linear and strains with low copy number may be preferred (Aw and Polizzi 2013; Dilworth et al. 2018). Aiming to exclude possible false-positives while establishing accurate correlations, along with the levels of the target protein, the recombinant DNA levels must be evaluated, for which qPCR protocols have been reported using pPICZ vectors (Nordén et al. 2011) and resorting to SYBR Green or TaqMan (Abad et al. 2010). Another way to improve human aquaporins expression in P. pastoris is based on the optimization of the nucleotide sequence around the initial ATG based on the use of mammalian Kozak’s sequence consensus (Oberg et al. 2009). The prevalence of a guanine at the first position of the second codon after ATG encodes small amino acids such as alanine (GCN) or on a smaller extent glycine (GGN), which are crucial to ensure an efficient cleavage of the initiator methionine (Oberg et al. 2009). In most cases, this has a positive impact on aquaporins expression, while the opposite seems to be observed when a thymine is at position + 6 (Oberg et al. 2009).

The codon bias problem in MP production from P. pastoris have also been addressed. Considering that the translation efficiency of more highly expressed genes may be especially sensitive to codon usage, Bai et al. (2011) generated a codon usage table specific for highly expressed genes in P. pastoris and adjusted the sequence of P-glycoprotein-encoding mdr3 gene, taking into account relative codon frequencies for each amino acid, as well as optimizing GC content and controlling for mRNA instabilities. Using the optimized gene construction, the authors obtained an increase of three-fold in the expression yields in comparison with the wild-type gene of P-glycoprotein and similar secondary and tertiary structures between the proteins from the different constructs, emphasizing the effectiveness of the gene optimization approach developed (Bai et al. 2011).

Expression resorting fusion partners has been applied since the early beginning of MP expression in P. pastoris. Talmont et al. (1996) expressed the μ-opioid receptor fused with S. cerevisiae α-mating factor aiming to facilitate the translocation of the receptor to the membrane. Distinctly, it was shown that the presence of the α-mating factor can be detrimental for the expression of human histamine H1 receptor in P. pastoris (Shiroishi et al. 2011), which can be due to incomplete processing by the endogenous Kex2 protease, leading to a heterogenous population. A way to overcome this problem is by introducing a proteolytic cleavage site upstream of the gene (Byrne 2015).

The application of GFP as a fusion partner has been extensively used to screen for high-yield expressing clones spanning the most popular hosts for MP production including P. pastoris. Brooks et al. (2013) reported a fluorescent-based induction plate assay aiming the simultaneously screening of P. pastoris clones for the expression of aquaporin 4 and homologs of ER-associated MP phosphatidylethanolamine N-methyltransferase in which 50 and 48 clones were respectively screened. The plates were imaged under blue light and the colony fluorescence quantified using Mean Gray Values and revealed a distribution of fluorescence related to protein expression, ranging from background to high, being additionally demonstrated that there is a good correlation between plate expression and liquid culture expression (Brooks et al. 2013).

In addition to secreted proteins, MP can also enter the secretory pathway but unlike them, MP remain in the ER, Golgi or the plasma membrane (Vogl et al. 2014). Due to MP overexpression, unfolded and misfolded proteins can accumulate in the ER, thereby triggering the unfolded protein response (UPR). The UPR signaling pathway involves the kinase/RNase Ire1 that when activated, initiates an unconventional splicing reaction of the HAC1 mRNA that ends with the removal of the intron and subsequent translocation of Hac1p to the nucleus (Guerfal et al. 2010). Guerfal et al. (2010) showed for the first time the beneficial effect of co-expressing Hac1p with the adenosine A2A receptor, namely, in terms of a better processing of the alpha-mating factor, thus improving the homogeneity of the obtained MP fractions. Later, Vogl et al. (2014) performed a transcriptomic analysis of P. pastoris CBS 7435 overexpressing different classes of MP (mitochondrial, ER/Golgi and plasma membrane localized) and found that proteins targeted to the mitochondrial membrane mainly alter the energy metabolism while the gene coding for Hac1p was upregulated in strains expressing the CMP-Sialic acid transporter, which localizes to ER and Golgi. Interestingly, they found that the overexpression of the spliced variant of Hac1 led to an increase of 1.5-fold to 2.1-fold in the expression of ER-resident MP tested (Vogl et al. 2014)

Strain engineering and improved processing conditions

Pichia pastoris expression strains are derivatives of NRRL-Y 11430 (Northern Regional Research Laboratories, Peoria, IL, USA) (Cregg et al. 2000) encompassing distinct genotypes/phenotypes, and generally, most of them have been applied for MP production, namely, X33 (wild-type/Mut+) (Oberg et al. 2009), KM71H (arg4aox1::ARG4/MutS Arg+) (Bai et al. 2011), GS115 (his4/Mut+ His) (Guerfal et al. 2010), and also protease deficient strains such as SMD1163 (pep4 prb1 his4/Mut+ His) (André et al. 2006).

The requirement of association with cellular membranes and the type of membranous lipids can be critical for successfully achieving the goal of producing a recombinant MP in a functional active form, given their close spatial interactions (Emmerstorfer-Augustin et al. 2019). Plasma membranes are generally constituted by a mixture of lipids including phosphatidylcholine, phosphatidylethanolamine, phosphatidylinositol, phosphatidylserine, phosphatidic acid, sphingolipids, and sterols (Van der Rest et al. 1995). As the composition and molecular properties of the lipids differ from lower to higher eukaryotes, the distinct type of sterols in yeasts and mammalians, respectively, ergosterol and cholesterol, can represent a bottleneck for the heterologous expression of mammalian proteins in yeasts (Emmerstorfer-Augustin et al. 2019; Hirz et al. 2013). Therefore, aiming an improvement in the functional expression, stability and translocation of Na+/K+ ATPases α3β1 isoform, Hirz et al. (2013) reprogrammed P. pastoris (strain CBS7435 Δhis4 Δku70) to mainly produce cholesterol instead of ergosterol. This was accomplished by replacing ERG6 (encodes the sterol C-24 methyl transferase) and ERG5 (encodes the sterol C-22 desaturase) by constitutive DHCR7 and DHCR24 (dehydrocholesterol reductases) overexpression cassettes, envisaging an efficient conversion of cholesta-5,7,24(25)-trienol to cholesterol (Hirz et al. 2013; Emmerstorfer-Augustin et al. 2019). The authors found that the expression levels of the target ATPase significantly increased with induction time in the cholesterol-forming strain compared to the wild-type strain, indicating a positive influence of the altered sterol composition on the stability of the synthesized MP (Hirz et al. 2013). Another example of “humanizing” P. pastoris for the expression of human proteins consists of the disruption of an endogenous glycosyltransferase gene (OCH1) and the stepwise introduction of heterologous glycosylation enzymes, envisaging to largely eliminate the fungal N-type N-glycosylation while avoiding a considerable heterogeneity in the produced protein and their rapid clearance if therapeutics is the main goal (Jacobs et al. 2009; Laukens et al. 2015). This strategy is generally referred as GlycoSwitch® and can be applied in wild-type strains (e.g., GS115) or GlycoSwitch® Man 5 strain wherein the first glyco-engineering step was already introduced, and encompasses distinct glyco-engineering steps based on the transformation of P. pastoris with GlycoSwitch® vectors under previously reported protocols (Jacobs et al. 2009; Laukens et al. 2015). Currently, these vectors are commercially available from BioGrammatics (Carlsbad, USA) under the license from Research Corporation Technology (RCT).

Envisaging to prevent a possible inhibition of the AOX promoter by glycerol, Pichia pastoris AOX-based bioprocesses usually encompass an initial stage of growth in glycerol followed by methanol induction, which is often cumbersome especially when glycerol consumption cannot be monitored (Lee et al. 2017). Earlier observations with KM71H strains demonstrating that leaky expression is not a critical factor once the target expression per cell mass is mostly dependent on the starting glycerol concentration of the media and to a lesser degree by yeast nitrogen base (YNB) and biotin concentrations. Moreover, as even in the presence of a methanol concentration higher than the glycerol concentration no target expression was detected until about 24 h of incubation, Lee et al. (2017) developed the Buffered extra-YNB Glycerol Methanol (BYGM) auto-induction media (100 mM potassium phosphate pH 6.0, 2.68% w/v YNB, 0.4% v/v glycerol, 0.5% v/v methanol and 8 × 10−5% w/v biotin). This auto-induction method avoids the traditional media-swabbing step and it is additionally claimed that it can be applied to MutS and Mut+ strains and distinct MP without compromising their expression yields (Lee et al. 2017). The use of additives in culture media have also been reported to increase MP expression levels. André et al. (2006) reported increased expression levels of functional GPCR resorting the optimization of growth temperature and supplementation of culture media with specific GPCR ligands, histidine, and dimethylsulfoxide (DMSO). As DMSO can modify the physical properties of membranes and upregulates genes involved in lipid synthesis (Murata et al. 2003), it can have a positive effect on MP in yeast and is additionally pointed out that by permeabilizing membranes, it can have an indirect effect by facilitating the entry of other ligands to intracellular compartments where they reach the receptor populations (André et al. 2006). The beneficial effect of DMSO is not restricted to GPCR as Pedro et al. (2015) reported an increase of 1.8-fold in the enzymatic activity of human membrane-bound catechol-O-methyltransferase (MBCOMT), achieved by adding 5% v/v DMSO. Subsequently, the artificial neural network modelling of the methanol induction phase, accomplished by tailoring the temperature, DMSO concentration, and methanol constant flow-rate allowed an improvement of 1.53-fold in the enzyme activity over the best conditions performed in the DoE step (Pedro et al. 2015). In addition, the direct solubilization of MP whole cells (yeasts protoplasts) may help to decrease the amount of misfolded and/or aggregated proteins that are co-extracted with the properly folded protein (Hartmann et al. 2017).

Mammalian cell lines

General approaches and factors for successful optimization of mammalian-based systems for recombinant protein production have been reviewed elsewhere (Andréll and Tate 2013; Almo and Love 2014; Hacker and Balasubramanian 2016; McKenzie and Abbott 2018). In this sub-section, we will generally focus our attention in strategies that have been proved to be particularly useful for MP, foreseeing improved expression and/or folding and also those enabling biochemical and functional studies of these relevant drug targets (summarized in Table S3).

Distinct mammalian cell lines have been applied for MP production such as HEK293, baby hamster kidney cells (BHK-21), monkey kidney fibroblast cells (COS-7), and CHO (Andréll and Tate 2013), but HEK293 and CHO are more commonly applied, either in transient or stable transfection (Lyons et al. 2016).

The levels of expression of MP in transiently transfected mammalian cell lines are affected by the plasmid size, the amount of plasmid used per transfection, the strength of the promoter, the cell type, the efficiency of the transfection, and potentially, the toxicity of the transfection reagent (Andréll and Tate 2013). Using design of experiments, Bollin et al. (2011) optimized the yields of an antibody resorting to transient gene expression and found that the DNA concentration can be maintained at relatively low concentrations (1 mg/L range). Indeed, envisaging functional expression of a MP in the plasma membrane, the ratio of plasmid DNA added per reaction can be a crucial factor (particularly if a strong promoter is used), once too much plasmid can lead to intracellular accumulation of the protein and potentially misfolded (Andréll and Tate 2013). Both CHO and HEK cell lines have been extensively used in transient transfection, advances in serum free media formulations allow their growth to high cell densities, which can greatly facilitate the purification of target proteins (Almo and Love 2014; McKenzie and Abbott 2018). An alternative approach increasingly applied as a gene delivery methodology for protein production is based on the use of lentivirus, owing to their ability to transduce a broad range of cell types (Bandaranayake and Almo 2014). Aiming to combine the ease and speed of transient transfection with the robust expression of stable cell lines, Elegheert et al. (2018) constructed a lentiviral plasmid suite around the transfer plasmid pHR-CMV-TetO2 that is designed for large-scale protein expression from HEK293 cell lines and allows subcloning of cDNA from the plasmid PHLsec usually applied for transient transfection. This approach was tested in both soluble and MP, and in general, the typical lead time for protein production using this strategy is of 3–4 weeks and approximately three to tenfold improvement in protein production yield per cell was obtained, in comparison with transient transfection (Elegheert et al. 2018).

Unlike transient transfection, stable gene expression requires the screening of clonal cell lines, which is typically achieved through limited dilution involving serial dilution of recently transfected cells and seeding on tissue culture plates with antibiotic-resistance media. Subsequently, different colonies are individually transferred to 24-well plates and scaled-up (Andréll and Tate 2013). For a review of selection methodologies, please refer to Browne and Al-Rubeai (2007).

Along the years, aiming to easily ascertain the quality and level of expression of target MP, methodologies resorting to GFP fusions have been reported. Particularly, the expression of GFP fused to the termini of MP have been applied to directly monitor in whole cells for their subcellular locations by fluorescence microscopy (Goehring et al. 2014). A slightly different approach was reported by Mancia et al. (2004), where the production of the target MP and GFP is based on a bicistronic mRNA, thus leading to the production of two separate proteins wherein the high-yielding clones are selected based on a fluorescence-activated cell sorting procedure.

Given the relevance of MP as drug targets for a variety of human diseases, advances in mammalian cell-based systems have allowed performing functional studies that otherwise could be highly hampered. Baculovirus-mediated gene transduction of mammalian cells (BacMam) has been widely used due to its compatibility with a variety of mammalian cell lines and the possibility of co-infecting with multiple BacMam viruses to express protein complexes (Lyons et al. 2016). Shukla et al. (2012) exploited this strategy toward the development of a transient expression system for co-expression of two drug transporters (ABCB1—P-glycoprotein—and ABCG2) in mammalian cells, which is useful to determine their contribution to the transport of a common anticancer drug substrate. Moreover, both transporters were functionally active when co-expressed (Shukla et al. 2012). A distinct approach involves the codon optimization of the sequence of the human sodium/iodide symporter (NIS) based on the highest usage frequencies in humans, while RNA instability motifs, very high (> 80%) or very low (< 30%) GC content regions and cis-acting motifs were also removed (Kim et al. 2015). As a result, the CAI was highly improved (0.79 vs 0.97 for wild-type and optimized sequences) and from transfected cancer cells, it was found that the levels of NIS were enhanced as well as the radioiodine uptake. These results show the importance of codon usage optimizations in the development of more efficient reporters and efficient therapeutic genes, distinct goals than improving MP heterologous expression (Kim et al. 2015).

To facilitate MP production for structural analysis relies on the use of HEK293S GnTI- (lacking the gene N-acetylglucosaminyl transferase I—GnTI) and a tetracycline-inducible promoter (Chaudhary et al. 2012). If on one hand, the lack of GnTI restricts N-linked glycans to a homogeneous Man5-GlcNac2, since N-linked glycosylation is often regarded as a barrier toward structure determination via X-ray crystallography due to the heterogeneity and conformational flexibility of these glycans, the inducible promoter allows the establishment of high-density cell cultures which are not always achieved if the target protein tends to be cytotoxic (Chaudhary et al. 2012). Alternative approaches have been suggested to overcome toxicity issues associated with MP overexpression. Ohsfeldt et al. (2012) designed an anti-apoptosis strategy involving co-expression of Bcl-xL gene (encodes for an anti-apoptotic protein) aiming to prevent cell death by bioreactor stresses, nutrient depletion, toxin accumulation, and stresses due to folding and processing requirements for complex proteins such as MP. The authors observed that cell death are diminished due to the co-expression of the anti-apoptotic gene and transient production of two different receptors were improved (Ohsfeldt et al. 2012).

Protein quality control

The purity and integrity of purified protein samples are usually evaluated by electrophoresis (native or denaturant) coupled with detection methods with varying sensitivities (Oliveira and Domingues 2018; Raynal et al. 2014). On the other hand, isoelectric focusing and capillary electrophoresis have also been used to distinguish the protein of interest from closely related undesired subproducts or contaminants (Raynal et al. 2014), while UV-Visible spectroscopy is useful to detect nucleic acid contamination (Oliveira and Domingues 2018).

Mass spectrometry (MS) has been widely applied to measure molecular weights of proteins while allowing protein identification by peptide mass fingerprinting (PMF) and based on MS/MS spectra (Zhang et al. 2010). By detecting mass changes introduced by post-translational modifications, MS can also be used to analyze these modifications (Zhang et al. 2010). MS-compatible detection methods enable MS analysis after electrophoresis (Raynal et al. 2014). Despite such analysis are usually performed after purification, Gan et al. (2017) reported a native MS approach that allows the characterization of overexpressed recombinant proteins directly in crude E. coli lysates, allowing obtaining information on its identity, solubility, oligomeric state, overall structure, and stability without purification. Cells were lysed in a buffer supplemented with 1 M ammonium acetate to ensure compatibility with MS. Spectra were acquired for distinct proteins with molecular weights ranging from 19 to 47 kDa, and revealed highly resolved peaks, narrow charge state distributions, and the anticipated stoichiometry, thereby confirming that at least for these proteins, purification is not a prerequisite (Gan et al. 2017).

In addition to the integrity and purity of the protein sample, homogeneity is also crucial to infer on the correct oligomeric structure of the protein. Dynamic light scattering (DLS) and more accurately analytical size exclusion chromatography (SEC) are useful to these determinations (Oliveira and Domingues 2018; Raynal et al. 2014). In quality control methodologies, studying the secondary and tertiary structure of proteins is important to infer about their folding and monitor protein conformational changes. A range of spectroscopic techniques has been developed for such task, being circular dichroism particularly useful to determine the secondary structures and folding properties of recombinant proteins (Oliveira and Domingues 2018). Based on several generic or protein-specific functional assays which depend upon catalytic and binding properties of the protein of interest, it is also important to determine the activity of the target protein samples (Raynal et al. 2014). Additional details of distinct analytical methods used for the characterization of therapeutic proteins including advantages and drawbacks as well as the type of information delivered from each technique can be found in the recent review by Fuh et al. (2016).

Insights for better decision-making processes in the upstream stage of membrane proteins

In this review, we addressed the first stage and, more specifically their (bio)synthesis by recombinant production processes. E. coli, P. pastoris, and mammalian cell lines were selected, given their wide applicability and to cover hosts with different inherent complexities. Based on the information here reviewed, general insights to understand which host may better fit in a specific project are presented in the next paragraphs and summarized in Table 2 and Fig. 2. E. coli is probably the better characterized host for which there are many genetic tools available. It is more suitable for low molecular weight MP and is capable to grow easily to high cell densities at a relatively low cost. Unlike E. coli, mammalian cell lines allow the production of larger MP and protein complexes with proper PTM including glycosylation patterns, although in this regard, the performance of mammalian cell lines is best. However, obtaining recombinant proteins which better resemble their native counterparts comes with a cost and these systems are more technically challenging and this process can be lengthy. The methylotrophic P. pastoris gathers characteristics from both prokaryotic and the other higher eukaryotic hosts. Particularly, direct and indirect evidences point out the importance of P. pastoris host membranes wherein the type of lipids can influence the expression yields and overall folding of heterologous human MP while inducing membrane proliferation (HAC1 overexpression and possibly the use of DMSO as an additive in culture media). The identification of genes limiting MP overexpression resorting systems biology approaches based on -omics approaches may present additional contributions to improve recombinant MP production processes in P. pastoris.

Table 2 Critical assessment of major parameters affecting the upstream stage of recombinant MP structural biology projects for a good decision-making process.
Fig. 2
figure 2

Schematic diagram of MP structure determination pipeline focusing relevant parameters to optimize their upstream stage and techniques used to protein quality control

Aiming to overcome the cellular burden caused by MP overexpression, researchers have been driving their efforts toward the isolation and/or engineering of host cells, which have proven to be efficient in many cases. In addition, codon usage optimizations have been shown to be an effective strategy toward the improvement of MP expression but researchers should be aware that synonymous mutations can affect protein function. The application of fusion partners is helpful to increase MP solubility or to easily detect their expression levels and the advent of transcriptional fusions show that particularly for solubility-enhancing tags, it seems that a physical linkage between target MP and fusion may not be necessary for the desired effect, thus simplifying the overall process.

Overall, the increasing understanding of MP biogenesis and the host physiological response to MP recombinant production has allowed important advances in this field. However, while it remains difficult to set general rules for a successful MP production process, the information gathered in this review can help researchers with their own MP targets.