The recombinant expression systems for structure determination of eukaryotic membrane proteins

Eukaryotic membrane proteins, many of which are key players in various biological processes, constitute more than half of the drug targets and represent important candidates for structural studies. In contrast to their physiological significance, only very limited number of eukaryotic membrane protein structures have been obtained due to the technical challenges in the generation of recombinant proteins. In this review, we examine the major recombinant expression systems for eukaryotic membrane proteins and compare their relative advantages and disadvantages. We also attempted to summarize the recent technical strategies in the advancement of eukaryotic membrane protein purification and crystallization.


INTRODUCTION
It is estimated that approximately 30% of the protein-coding genes are for integral membrane proteins (IMPs) in human (Overington et al., 2006;Murray et al., 2012). IMPs are critical players for many important physiological processes including metabolism, signal transduction, and energy conversion and utilization (Krogh et al., 2001). Aberrant expressions and activities of IMPs are associated with a variety of diseases such as cancer, Alzheimer's disease, and metabolic diseases (Ishikawa et al., 2004;Sanders and Myers, 2004;Overington et al., 2006;Aisenbrey et al., 2008;Bkaily and Al-Khoury, 2014). IMPs constitute more than 50% of the US Food and Drug Administration (FDA)approved drug targets (Russell and Eggleston, 2000;Yildirim et al., 2007). Structures of eukaryotic membrane proteins are actively pursued for structure-based drug development.
In contrast to their physiological and pathophysiological significance, the progress on the structure biology of IMPs, particularly eukaryotic IMPs, has been relatively slow. By the end of March 2014, in total 466 unique membrane protein structures have been reported (Snider and Stephen, 2014), the majority of which are of prokaryotic origins. With respect to eukaryotic IMPs, more than half of the determined structures are for proteins obtained from endogenous sources (Bill et al., 2011). These proteins, exemplified by the electron transport chain complexes (Tsukihara et al., 1996;Xia et al., 1997;Sun et al., 2005), ATP synthases (Abrahams et al., 1994;Liu et al., 2004;Amunts et al., 2007), and photosystems (Kurisu et al., 2003;Liu et al., 2004;Amunts et al., 2007), usually exist in abundance and are biochemically stable, hence representing ideal candidates for structural analysis. However, the total types of endogenously abundant eukaryotic IMPs are limited. The majority of IMPs exist in low copies in the host species. Therefore, structural determination of most eukaryotic IMPs requires recombinant expression of the target proteins. The first atomic-resolution structure of a eukaryotic IMP obtained through recombinant expression, Kv1.2, was reported in 2005 (Long et al., 2005). Ever since, less than seventy structures have been obtained for eukaryotic IMPs generated through recombinant expression systems (Fig. 1).
Out of the many challenges facing structural study of eukaryotic IMPs, production of sufficient quantities of wellbehaved recombinant proteins represents the real technical bottleneck. Embedded in lipid bilayers, the structural integrity and proper functions of IMPs rely on the interactions with surrounding lipids (Phillips et al., 2009), which stabilize membrane proteins, provide lattice contacts, and in some occasions function as indispensable co-factors (van Meer et al., 2008). Recombinant expression of membrane proteins therefore requires a proper membrane environment. Whereas Escherichia coli proved to be the best host for most of prokaryotic IMPs of known structures, eukaryotic IMPs, with very few exceptions, requires eukaryotic expression systems including yeast, baculovirus-infected insect cells, and mammalian cells (Bill et al., 2011;Snider and Stephen, 2014).
In this review, in the hope of extracting some general principles on the expression and crystallization of eukaryotic membrane proteins, we examine the expression systems for the eukaryotic IMPs whose structures are obtained, attempt to summarize and compare the advantages and disadvantages of the representative recombinant expression systems, and delineate the detailed information in eukaryotic membrane protein purification and crystallization (Table 1).

RECOMBINANT EXPRESSION SYSTEMS FOR EUKARYOTIC MEMBRANE PROTEINS
The recombinantly expressed eukaryotic IMPs of known structures were obtained from four systems: E. coli, yeasts (Pichia Pastoris and Saccharomyces cerevisiae), insect cells, and mammalian cells. These expression systems have their respective advantages and disadvantages. The choice of an appropriate expression system remains empirical, largely depending on the biochemical and biological properties of the target proteins (Bernaudat et al., 2011). Among the recombinantly expressed eukaryotic IMPs whose structures have been solved, 4 were expressed in E. coli, 20 in yeast, 35 in insect cells, and 3 in mammalian cells. Below we will discuss these four expression systems.

E. coli
As the most frequently exploited recombinant expression system, E. coli BL21 (DE3) has the obvious advantage of rapid replication, time-saving operation, inexpensive cost, and mature and easy genetic manipulations (Sahdev et al., 2008). E. coli C43 (DE3) and C41 (DE3) strains were developed for over-expression of membrane proteins (Miroux and Walker, 1996;Dumon-Seignovert et al., 2004). Indeed, these E. coli strains were employed to over-express the large majority of prokaryotic IMPs whose structures were finally obtained. However, as the prokaryotic expression systems, they may lack the essential lipids, molecular chaperons, and post-translational modifications that are required for the correct membrane insertion, folding, and function of eukaryotic IMPs (Sahdev et al., 2008). As a result, only 4 structures were obtained for eukaryotic IMPs expressed in E. coli (Table 2). Despite the challenge to express eukaryotic membrane proteins in E. coli, researchers attempted to overcome these hurdles with codon-optimization (Burgess-Brown et al., 2008) and protein fusion with Mistic or GlpF tag to promote protein expression (Ae-geanSoftware, 2005;Drew et al., 2006;Neophytou et al., 2007), and co-expression of post-translational machineries to facilitate protein folding (Mironova et al., 2005;Mijakovic et al., 2006). Regardless of the effort, E. coli may not be an ideal system for eukaryotic IMP expression.

Yeast
Among the many yeast species, Pichia Pastoris (Pichia) and Saccharomyces cerevisiae (S. cerevisiae), which have been genetically well characterized, are the major systems to overexpress eukaryotic IMPs (Strausberg and Strausberg, 2001;Bornert et al., 2012). Schizosaccharomyces pombe is also employed for overexpression of IMPs, but not as widely

REVIEW
Yuan He et al.

REVIEW
Yuan He et al.
as Pichia and S. cerevisiae (Yang et al., 2009). During the past thirty years, yeast has proved to be a useful expression system: 15 eukaryotic IMP structures have been determined for proteins expressed in Pichia expression system and 5 by S. cerevisiae. Most of the structurally available eukaryotic channels such as potassium channels and water channels were expressed in yeast, as listed in Table 3.
Pichia is considered the best expression system among yeast species (Cereghino and Cregg, 2000). Several elements contribute to its increasing applications, including the simplicity of genetic manipulation, the high yield of heterologous protein, the cost-effective chemical reagents, as well as the ability of post-translational modifications (Macauley-Patrick et al., 2005). For these reasons, Pichia is a more suitable expression system for producing eukaryotic IMP than E. coli. Pichia shares the advantage of the molecular and genetic manipulation with S. cerevisiae, yet it adds extra advantage of 10-to 100-fold biomass out of the same cultural volume comparing with S. cerevisiae (Macauley-Patrick et al., 2005).
The improved techniques and the commercial availability together promote the development of Pichia (Cereghino and Cregg, 2000). Pichia is a methylotrophic yeast, capable of utilizing methanol as its sole carbon source (Yurimoto and Sakai, 2009). A promoter derived from the alcohol oxidase I (AOXI), which is the first-step enzyme in the methanol metabolism, strictly controls the expression of the foreign proteins (Macauley-Patrick et al., 2005). The commercial vector pPICZ (or pPICZα) takes advantage of the AOXI promoter, being induced by methanol (Li et al., 2007). AOXI promoter is prevailing than other promoters like PMA1 and GPD1 for its strong and highly inducible ability (Cereghino and Cregg, 2000). After the vector is readily prepared and transformed into the competent cells, the target gene can be inserted into the Pichia genome in high efficiency via homologous recombination to generate stable cell lines, and then the colonies with multiple copies that exhibit the highest protein expression level will be screened out through zeocinspread plates (Daly and Hearn, 2005). This zeocin selective marker for transformation selection is important regarding to the convenience of genetic manipulation in yeast. All the procedure typically takes about 10-15 days for a complete procedure from subcloning to protein expression. A potential disadvantage of the yeast culture concerns the difficulty in cell disruption due to the thick and hard cell walls.

Insect cell
The baculovirus infected insect cell system is undoubtedly the dominant heterologous expression system for obtaining eukaryotic IMPs (Contreras-Gomez et al., 2014). The most common method for generating recombinant baculovirus is based on the site-specific transposition of an expression cassette into a baculovirus shuttle vector (bacmid) that is amplified in E. coli (Ciccarone et al., 1998). The process is very convenient: clone the target gene into the pFastBac vector which uses the strong AcMNPV polyhedron (PH) as the promoter for high level protein expression, then transform the pFastBac vector into DH10Bac E. coli competent cells. DH10Bac cells possess a baculovirus shuttle vector (bacmid) with a transposon site and a helper plasmid, thus can help the pFastBac vector to have a transposition on the bacmid. Once the transposition occurs and the recombinant bacmid is generated, the bacmid could be isolated and purified for transfection. After the insect cells are cultured into a desired confluence, they are transfected by the purified bacmid DNA to generate a recombinant baculovirus that used for preliminary expression test (Contreras-Gomez et al., 2014). The pFastBac is ampicillin resistance and Bacmid is kanamycin resistance, and these selective markers provide expedience for this baculovirus expression system. It takes approximately 3-4 weeks to complete these procedures for initial protein expression test.
There are two most popular insect cell lines used for IMP expression, Spodoptera frugiperda (Sf9) and Trichoplusia ni (Hi5). Heterologous proteins have disparate performances on the yield and behavior when expressed in these two cell lines (Unger and Peleg, 2012). Till now, 30 structures were obtained for eukaryotic IMPs from Sf9 expression system and 5 from Hi5 (Table 4). After the protein IL-2 was first expressed in large scale with the baculovirus-infected insect cells in 1985, this system has been quickly accepted and widely used (Smith et al., 1983;Maeda et al., 1985). Owing to the convenience of scale up, safety and accuracy (Kost et al., 2005), the baculoviral insect cell system has yielded the largest number of eukaryotic IMPs up to date (Table 4). Notably, among the 35 eukaryotic IMP structures, 23 are of G-protein coupled receptors (GPCR) ( Table 4). The insect cell system has been the prevailing expression system for eukaryotic IMP. However, the cost for the cultural medium may represent a serious roadblock for most laboratories.

Mammalian cell
Mammalian expression system has become one of the popular recombinant protein production systems for its proper post-translational modification and human protein-like structure assembly (Khan, 2013). HEK (human embryo kidney) and CHO (Chinese hamster ovary) are two broadly used cell lines for recombinant expression. These two cell lines are extensively applied by researchers to do functional assay such as the electrophysiological assay . Both these two cell lines can be applied for transient and stable transfections (Zhu, 2012). For the transient transfection approach, it is relative easier to reach to a reasonable protein expression level, but this expression level may vary from batch to batch. On the other hand, although the proteins have higher productivity and less variation in the stable transfection method, it is very time consuming (one month at least) (Condreay et al., 1999;Baldwin et al., 2003). Consequently, it is a balance for scientists to choose between these two transfection methods.
HEK293 is a specific cell line originally derived from HEK cells, while the number "293" comes from Graham's habit of numbering his experiments (Louis et al., 1997). Large scale, transient transfection of HEK293 in suspension culture is a reliable way to generate milligram quantities of recombinant eukaryotic IMPs. When the gene of interest is ligated into the vector pcDNA3 or pCMV5, the complete plasmid is then transfected into the HEK293 cells and the cells are harvested after 48 h (Thomas and Smart, 2005). The whole procedure is more or less similar to that of the insect cell system, only with a couple of exceptions. For example, 5%-10% CO 2 is required for maintaining the HEK293 cells, and the culture temperature is 37°C for HEK293 but not 27°C as for insect cells. The overall process usually requires one to two weeks from initial cloning to small scale test for the transient expression. However, ascribe to the low yield, slow growth rate and higher cost of complex media (Sunley and Butler, 2010), the number of eukaryotic IMP structures generated based on the mammalian cells is very limited. So far, only three eukaryotic IMP structures are from this system, and two of them are obtained from HEK293 cells ( Table 5).
The BacMam system has to be mentioned for its safety, reproducibility and efficiency (Dukkipati et al., 2008). The baculoviruses are engineered by inserting a mammalian expression cassette for delivering foreign genes in mammalian cells. Their non-replicating property makes they are safe and welltolerated by mammalian cells. BacMam system gains widespread use for its safety and rapid manipulation (Reeves et al., 2002;Baconguis and Gouaux, 2012). Depending on the cell type, cell division rate and transduction efficiency, it lasts 5-14 days to detect the gene expression (Dukkipati et al., 2008). The dopamine transporter structure was determined by the BacMam system (Penmatsa et al., 2013).
From the foregoing discussion, it is concluded that every expression system has their distinctive properties for protein expression. We compare their relative merits for an intuitional understanding of each system which can help researchers to make the best choice for their proteins expression (Table 6).

HOMOLOGUE SCREEN
Eukaryotic membrane proteins are very difficult to yield in large quantities, and most of them tend to be unstable in the presence of detergents. As a result, identification of wellexpressed proteins is very essential. Homologue screen is widely applied for researchers to discover well-behaved proteins Xiaowei Hou, 2012).
Fluorescence detected size exclusion chromatography (FSEC) is a powerful method for homologue screen (Drew et al., 2006;Newstead et al., 2007). Compared with the common protocols, GFP fusion membrane proteins can be detected by measuring fluorescence in whole cells during the overexpression process. It saves time to help people preclude proteins that have no expression or low expression level. Also, it is much easier to assess the integrity of proteins by detecting the fluorescence in SDS polyacrylamide gels. Moreover, FSEC could be employed to figure out the most stable detergents in initial detergent screen. Considering these benefits, this technology is very widely applied (Jasti et al., 2007;Gonzales et al., 2009;Kawate et al., 2009;Sobolevsky et al., 2009). Taking P2X receptor as an example , because of its aggregation and instability problems, researchers applied this method to screen 35 orthologs and finally got one species which was fit for crystallization. FSEC is proven to be one of the most robust methods to facilitate the identification of appropriate candidates for solving the structures of eukaryotic membrane proteins.

OPTIMAL CONSTRUCTS DESIGN
Optimizing constructs is very beneficial for getting the wellpacked crystals. One way for optimizing constructs is to "cut off". Limited proteolysis is a conventional method to find the optimal constructs. Besides, it is worth noting that either N-terminal tag or C-terminal tag is removed before crystallization in most crystallization cases (Long et al., 2005;Long et al., 2007;Gonzales et al., 2009;Maeda et al., 2009;Sobolevsky et al., 2009;Tao et al., 2009). For instance, the desensitized ASIC1 was crystallized by removal of 25 N-terminal and 64 C-terminal residues (Jasti et al., 2007).
The contrary way for optimizing constructs is to "add up". T4 lysozyme (T4L) insertion and Fab/nanobody replacement are applied to produce stable proteins. The T4L fragment is soluble enough to effectively increase the solvent-exposed area, thereby facilitating protein-protein interactions and generating novel crystal packing interfaces (Cherezov et al., 2007). Fab/ nanobody, which are generated from monoclonal antibodies, can reduce the protein flexibility and improve the conformational homogeneity (Zhou et al., 2001;Rasmussen et al., 2007). GPCR is one of the most successful cases employing T4L and Fab/nanobody to the ultimate structure determination Rasmussen et al., 2011a, b).
Mutagenesis is an alternative way for constructs design. In order to improve the crystallization behavior and stabilize the tetrameric state of the glutamate receptor GluA2, point mutations were introduced, preventing non-specific aggregation and disulphide bond formation (Sobolevsky et al., 2009). And E329Q was introduced in order to stabilize GLUT1 in a certain conformation (Deng et al., 2014). Plus, glycosylation is the most common post-translational modification of eukaryotic membrane proteins and leads to heterogeneity of proteins. Thus, mutating of glycosylation sites or deglycosylation by enzymes is an essential step for crystallization (Deng et al., 2014).

DETERGENTS, LIPIDS AND CRYSTALLIZATION
We have summarized the detergents used for protein purification and crystallization from Table 1. 51 eukaryotic membrane proteins can be extracted from DDM or DM ( Fig. 2A), suggesting DDM/DM are the detergents suitable for the extraction process of the majority of eukaryotic membrane proteins. Collaterally, nearly half of the eukaryotic membrane protein crystals are obtained from DDM/DM, indicating DDM/DM are worthy of a trial for crystallization in the first place ( Fig. 2B and Table 1). Apart from these conventionally applied detergents, new detergents have also been developed to meet the new requirements. For example, when purifying β 2 adrenergic receptor-Gs protein, the authors stabilized protein complex by exchanging DDM with a newly developed maltose neopentyl glycol detergent MNG-3 (NG310, Anatrace) to prevent the complex dissociated from original detergent DDM (Chae et al., 2010;Rasmussen et al., 2011a, b).
It is worth noting that additional lipids are able to help crystal packing. There are three ways of lipid combinations. The first is mixing lipids with detergent(s) in hanging or sitting drop during crystallization. Take mammalian voltagedependent shaker family potassium channel as an example, the author utilized 0.1 mg/mL 3:1:1 POPC: POPE: POPG throughout purification and crystallization to obtain crystals (Long et al., 2005). The second approach is lipid cubic phase (LCP) method. The lipid cubic phase is a dynamic structure consisting of a highly organized single lipid bilayer pervaded by an inter-connected aqueous channel (Landau and Rosenbusch, 1996). Martin has an elaborate discussion about LCP method which we will not go into details in this review (Caffrey and Cherezov, 2009). The crystal structure of β 2 AR-GS complex was determined by the use of 7.7 MAG as the host lipid for crystallization (Rasmussen et al., 2011a, b). The third way is bicelle method, which is regarded as an intermediate approach between the traditional detergent crystallization method and the rigid LCP method. Bicelle can be considered as a lipid bilayer disc that formed by a long chain lipid and a short chain lipid or detergent (Agah and Faham, 2012). The general composition is 3:1 DMPC: CHAPSO. Several protein structures were determined utilizing bicelle method Payandeh et al., 2011).
Last but not the least, we will elaborate a few messages for the crystallization of eukaryotic membrane protein drawn from Table 1: (a) Protein concentration: almost all the protein concentration for crystallization is above 5 mg/mL. (b) Crystallization temperature: if we expel the LCP method that is routinely crystallized at 20 ± 2°C, nearly half of the eukaryotic membrane proteins are crystallized at low temperature, especially on 4°C. At cold temperature, for protein with "normal" solubility, protein will be more soluble in high salt and precipitate from lower concentration of the precipitant reagents, and also the equilibrium diffusion rate occurs more slowly. These manifest that crystallization at lower temperature is absolutely an indispensable trial. (c) Crystallization methods: hanging drop or sitting drop crystallization method is the main and conventional approach for most eukaryotic membrane protein. LCP method is an up-rising star which is extensively applied in determining the GPCR's structures which we have mentioned before. Remarkably, LCP method is not only propitious to GPCR, but also is able to be applied for none-GPCR protein structures determination (Suzuki et al., 2014).

CONCLUSION
In this review, we discuss the benefits and drawbacks of different expression systems for eukaryotic membrane protein, and illustrate some general methods of recent advances for eukaryotic membrane protein purification and crystallization. We hope our work can provide help to those who are interested and work on eukaryotic membrane proteins. Although the discussion of eukaryotic membrane protein structure determined by Cryo-EM or NMR is beyond the scope of this review, the general methodologies and technical strategies summarized here also come to an aid in protein yield augment and sample homogeneity improvement for Cryo-EM and NMR. They are very powerful tools to solve structures, for instance, the Cryo-EM was applied to determine TrpV1 structures Liao et al., 2013). With the development of advanced technologies, more and more eukaryotic membrane protein structures will emerge to answer the most significant questions in life sciences and provide the novel pharmaceutical targets in drug design.