Journal of Structural and Functional Genomics

, Volume 10, Issue 3, pp 233–247

Heterologous expression of L. major proteins in S. cerevisiae: a test of solubility, purity, and gene recoding

Authors

  • Erin Quartley
    • Center for Pediatric Biomedical ResearchUniversity of Rochester Medical School
  • Andrei Alexandrov
    • Department of Biochemistry and BiophysicsUniversity of Rochester School of Medicine and Dentistry
    • Yale University School of Medicine, HHMI
  • Maryann Mikucki
    • Center for Pediatric Biomedical ResearchUniversity of Rochester Medical School
  • Frederick S. Buckner
    • Division of Allergy and Infectious Diseases, Department of MedicineUniversity of Washington School of Medicine
  • Wim G. Hol
    • Department of BiochemistryUniversity of Washington School of Medicine
  • George T. DeTitta
    • Hauptman-Woodward Medical Research Institute
    • Department of Structural BiologySUNY at Buffalo
  • Eric M. Phizicky
    • Center for Pediatric Biomedical ResearchUniversity of Rochester Medical School
    • Department of Biochemistry and BiophysicsUniversity of Rochester School of Medicine and Dentistry
    • Center for Pediatric Biomedical ResearchUniversity of Rochester Medical School
    • Department of Biochemistry and BiophysicsUniversity of Rochester School of Medicine and Dentistry
Article

DOI: 10.1007/s10969-009-9068-9

Cite this article as:
Quartley, E., Alexandrov, A., Mikucki, M. et al. J Struct Funct Genomics (2009) 10: 233. doi:10.1007/s10969-009-9068-9

Abstract

High level expression of many eukaryotic proteins for structural analysis is likely to require a eukaryotic host since many proteins are either insoluble or lack essential post-translational modifications when expressed in E. coli. The well-studied eukaryote Saccharomyces cerevisiae possesses several attributes of a good expression host: it is simple and inexpensive to culture, has proven genetic tractability, and has excellent recombinant DNA tools. We demonstrate here that this yeast exhibits three additional characteristics that are desirable in a eukaryotic expression host. First, expression in yeast significantly improves the solubility of proteins that are expressed but insoluble in E. coli. The expression and solubility of 83 Leishmania major ORFs were compared in S. cerevisiae and in E. coli, with the result that 42 of the 64 ORFs with good expression and poor solubility in E. coli are highly soluble in S. cerevisiae. Second, the yield and purity of heterologous proteins expressed in yeast is sufficient for structural analysis, as demonstrated with both small scale purifications of 21 highly expressed proteins and large scale purifications of 2 proteins, which yield highly homogeneous preparations. Third, protein expression can be improved by altering codon usage, based on the observation that a codon-optimized construct of one ORF yields three-fold more protein. Thus, these results provide direct verification that high level expression and purification of heterologous proteins in S. cerevisiae is feasible and likely to improve expression of proteins whose solubility in E. coli is poor.

Keywords

YeastStructural genomicsHeterologous expressionProtein solubilityS. cerevisiae

Abbreviations

LIC

Ligation independent cloning

PSI

Protein structure initiative

PDB

Protein database

L. major

Leishmania major

Introduction

Development of heterologous expression systems has been and is key to efficient structural analysis, because the production and purification of large amounts of soluble, folded protein continues to be a rate limiting step for both NMR and x-ray crystallography [16, 38]. E. coli has been the host for the expression of a vast number of proteins for structural analysis, in large part due to the ease of genetic manipulation in E. coli, its rapid and inexpensive growth, as well as the ease of isotope and selenomethionine labeling of proteins for structural analysis [10, 38]. Moreover, regulated expression in E. coli can yield large quantities of highly purified protein from a single liter of culture: between 0.9 and 480 mg of purified protein per liter in one study in which 63 Plasmodium falciparum ORFs were purified [35].

Structural genomics initiatives, which developed as a response to the vast increase in the number of protein sequences from genome sequencing projects, have not only resulted in the solution of over 3,300 new protein structures comprising over half of the novel structures since 2004 [6, 8, 29, 36], but have also prompted a thorough investigation of the rate limiting steps in structural analysis. This investigation has, in turn, yielded novel high throughput methods to clone and express ORFs, and to analyze and purify proteins. Each of 14 worldwide structural genomics centers, which as of December, 2007, had collectively targeted, expressed and purified 109,423 proteins, track the success or failure of each step required to obtain a structure for every target protein. Each center initially expresses most genes in E. coli with the vast majority expressed under control of T7 or T5 promoters as His6 fusions [19]. Analysis of progress on this large number of diverse targets, which are chosen in large measure to provide novel structures, illustrates that obtaining purified protein is a rate limiting step for structural analysis. This problem is even more serious for eukaryotic proteins. Analysis of the total target set indicates that 36% of 8,043 targets from archaea and 30% of 58,806 targets from bacteria were purified, while only 19% of 42,439 targets from eukarya were purified by structural genomics centers [19]. Thus, there is tremendous attrition of targets due to inability to obtain purified protein, and this must be rectified to extend coverage of the structural landscape, as well as to obtain structural information on many medically and biologically important targets.

That the lack of solubility of proteins is a major obstacle to obtaining purified protein can be inferred from an examination of the fraction of proteins that are not soluble when expressed at high levels in E. coli. In 2002, the Northeast Structural Genomics Consortium reported that among 1,295 expressed proteins, only 773 were soluble [43]. Furthermore, since most of the solved structures were from bacterial proteins, Service [43] inferred that eukaryotic proteins were even more problematic. In a study of 424 non-membrane proteins from the thermophilic archaeon Methanobacterium thermoautotrophicum, Christendat et al. [10] found that while 80% of these proteins are expressed in E. coli, less than half are soluble, and only 20% are directly suitable for structural analysis, since a large fraction of the soluble proteins (57 of 100) display poor NMR spectra consistent with either non-specific aggregation or an unfolded state. Among eukaryotic proteins, approximately half of the cloned genes express protein, but solubility is markedly lower: on the order of 10–15% of full length human proteins are soluble when expressed in E. coli [3, 4], and about 30% of C. elegans genes, expressed in E. coli, produce soluble protein (1,536 soluble of 4,854 expressed ORFs in 10,167 attempted) [30]. In the parasitic protozoa, 19% (63 of 337) of expressed genes from Plasmodium falciparum produce soluble protein and 27% (655 of 2,406) of highly expressing genes from Trypanosoma cruzi, Trypanosoma brucei and Leishmania major are soluble [39] (E.Q. and E.M.P., unpublished data).

The magnitude of the problem with solubility can also be inferred from the number and variety of approaches that have been tried both to improve and to evaluate solubility. Numerous vectors, strains, and affinity purification tags, as well as technologies to accelerate expression and solubility screening have all been developed to maximize heterologous protein expression in E. coli [9, 12, 38, 48]. In several systematic studies, various fusion tags (GST, MBP, NusA, thioredoxin, ubiquitin, His6, the Z domain of protein A, the Gbl domain of protein G, and SUMO) have been screened for their effects on solubility of multiple proteins [22, 34, 37]. Significant improvements in solubility of individual proteins are observed with various tags, but there is no single best solution for all proteins. To obtain soluble protein from five genes of the fish pathogen Vibrio salmonicida, Niiranen et al. [37] compared expression using six affinity tags, two different E. coli hosts, and induction at three temperatures. Systematic analysis of the effects of codons usage on expression of multiple genes [5] demonstrated improvement in expression for 22 of 30 genes tested and improved solubility for 11 genes either due to recoding or tRNA over-expression. Multiple truncations at the N and C termini [18, 28] as well as introduction of mutations [41, 42] are routinely used by many laboratories to obtain soluble protein. In addition, constructs are routinely screened at multiple temperatures in E. coli bearing mutations in genes that affect the redox environment of the cell. To facilitate disulfide bond formation, an E. coli strain bearing mutations in genes encoding both thioredoxin reductase, and glutathione oxidoreductase was used to effect soluble expression of the extracellular N terminal domain of ISG75 from Trypanosoma brucei gambiense; in addition efficient translation of the heterologous gene was facilitated by inclusion of a plasmid encoding three tRNA genes [45]. Furthermore, the problem with protein solubility can also be gauged from the numerous methods that have been developed to rapidly screen for folded, soluble protein, including fusion reporters such as GFP, CAT, LacZα and others [46] as well as the continued development of reporters with different sensitivities to protein misfolding [7].

These problems, as well as the fact that many eukaryotic proteins bear post-translational modifications which are not carried out correctly in E. coli, has spurred the development of other expression hosts, primarily eukaryotic hosts. The single cell yeast S. cerevisiae shares with E. coli many of the traits that have made E. coli an ideal expression system, such as ease of genetic manipulation and rapid, inexpensive growth. In addition, yeast has been used for high level protein expression and affinity purification, with yields of 2 mg of purified protein per liter of culture [17], as well as for heterologous expression, purification and structure determination of several proteins, including the catalytic domain of the human RNA editing protein ADAR2 [32] and the membrane Ca2+-ATPase protein from rabbit sarcoplasmic-endoplasmic reticulum [24].

We and others have begun to develop tools to improve the use of S. cerevisiae as a host for high level protein expression and purification. We recently demonstrated that genetic manipulation of the genes encoding methionine adenosyltransferase allows growth of yeast on toxic levels of selenomethionine, efficient incorporation of selenomethionine into proteins and solution of the structure of tryptophan tRNA synthetase by MAD phasing [33]. Holz et al. [23] demonstrated high throughput expression of 221 human genes in yeast, nearly half of which could be purified by IMAC affinity chromatography, but the yields and solubility were not quantified.

Since it is, in large measure, the problem with solubility that we hope to resolve with expression in a eukaryotic host, we addressed this issue directly by determining whether or not proteins that are insoluble in E. coli are soluble when expressed in S. cerevisiae. We show here that 42 of 64 ORFs from Leishmania major that exhibit good expression but little or no solubility when expressed in E. coli are expressed in yeast with solubility levels above 50%. Furthermore, both the yield and purity of many protein preparations are sufficient for structural analysis, since 21 highly expressed proteins were purified by affinity chromatography with good yields, and two large scale preparations yield approximately 50 mg of nearly homogenous protein. Finally, we demonstrate that genetic recoding of one L. major gene with optimal yeast codons yields somewhat improved expression in S. cerevisiae, similar to results in E. coli. Thus, S. cerevisiae is a viable alternative to E. coli as a host for protein expression and purification since many of the same tools are operative in both organisms.

Materials and methods

Plasmid and strains

For expression in E. coli strain BL21DE3, L. major genes were PCR amplified and cloned in either BG1861 or AVA421 vectors using standard LIC (ligation independent cloning) procedures [2]; BG1861 has been described previously and is used to express proteins with an N terminal MAHHHHHH tag preceding the native methionine [1]. AVA421 is a LIC vector that is used to express protein with an N terminal fusion tag of MAHHHHHHMGTLEAQTQGPGS, which can be cleaved with Rhinovirus 3C protease leaving an N terminal GS preceding the native methionine. ORFs amplified with primer pairs containing the common sequences GGGTCCTGGTTCGATG and CTTGTTCGTGCTGTTTA on the 5′ and 3′ oligonucleotides respectively are treated with T4 DNA polymerase and dTTP, and annealed with AVA421 vector that has been digested with Nru1-Pme1, and treated with T4 DNA polymerase in the presence of dATP.

For expression in yeast, L. major ORFs were cloned under PGAL1 control into the previously described LIC vector BG2483, a 2 μ URA3 vector in which ORFs are expressed under control of the GAL1 promoter with their C terminus fused to a complex tag containing a 3C site, followed by an HA epitope, His6, and the ZZ domain of protein A [33]. ORFs are amplified with addition of common sequences: AATTCCATCAACCTTAAAATG and CTTCCAAACCACT to the 5′ and 3′ end of gene-specific oligonucleotides for cloning into this vector, and cloned into Pac1-BbrP1 digested BG2483 DNA by standard LIC procedures [2, 33]. ORFs are expressed in yeast strain BCY123 (MATa, pep4-3::HIS3, prb1::LEU2, bar1::HISG, lys2::GAL1/10-GAL4, can1, ade2, trp1, his3, ura3-52, leu2-3,112), obtained from M. Macbeth [31].

Protein expression, western detection and affinity purification

Yeast transformants, grown overnight in SD-uracil at 30°C, were diluted 20-fold into 5 ml Synthetic (S) dropout media—uracil (see [44]), with 2% raffinose, grown for 7 h at 30°C, diluted in 30 ml of the same media to OD600 of 0.02 and grown overnight to OD600 between 0.8 and 1.2, and then induced for protein expression by addition of 15 ml 3× YP media with 6% galactose (see [44]) and continued growth for 24 h, at which time cells were split, harvested and frozen.

Cells (from 22.5 ml growth) were resuspended in 1 ml extraction buffer A (50 mM Tris–Cl, pH 7.5, 1 mM EDTA, 4 mM MgCl2, 10% glycerol, 1 M NaCl, 5 mM β-mercaptoethanol) containing 2.5 μg/ml pepstatin, 2.5 μg/ml leupeptin and 1 mM pefabloc (Roche), were transferred to tubes containing 0.5 mm glass beads and 0.5 mM PMSF, lysed by bead beating, and the beads were removed as previously described [17] followed by addition of 0.5 mM PMSF. To obtain samples for total protein analysis, 1 μl of lysed cells was diluted into 50 μl prewarmed SDS loading buffer with 0.08 μg/μl PMSF held at 95°C; the mixture was vortexed, boiled for 1 min, vortexed, boiled and vortexed. To obtain soluble protein, lysed cells were centrifuged at maximum speed (13,000 rpm) for 10 min in 2 ml microcentrifuge tubes at 4°C and the supernatant was removed to a new tube containing PMSF (0.5 mM additional) and quick frozen on dry ice. Soluble protein was assessed by diluting 1 μl crude extract in 50 μl SDS loading dye.

For both total and soluble protein, 5 μl (~0.6–1 μg of total protein) was subjected to electrophoresis on 8–16% Tris–HCl SDS-PAGE Criterion gels (Bio-Rad), after which the protein in the gel was transferred to nitrocellulose membranes by electrophoresis at 100 V, 200 mA for 2 h at 4°C in Transfer Buffer (0.025 M Tris base, 0.192 M glycine, 0.02% SDS, 20% methanol). Membranes were rinsed in PBS (10 mM Sodium Phosphate pH 7.8, 150 mM NaCl), blocked in PBS containing 5% Calf Serum and 0.1% Tween overnight, rinsed twice for 5 min in PBS with 0.1% Tween, incubated with Rat anti-HA high affinity monoclonal antibody clone 3F10 (Roche 1 867 423) at 1: 3,000 dilution in PBS containing 5% Calf Serum for 2 h, washed five times with PBS containing 5% Calf Serum and 0.1% Tween. Membranes were incubated for 2 h with Peroxidase-conjugated AffiniPure Goat Anti-Rat IgG (Jackson ImmunoResearch 112-035-003) (1:5,000 dilution) in PBS containing 5% calf serum and 0.1% Tween, washed three time in PBS containing 0.1% Tween for 15 min, and developed using the ECL plus kit according to the manufacturers instructions (GE Healthcare). To evaluate protein yields after purification of protein on IgG sepharose, protein was bound to IgG sepharose and either eluted by cleavage with GST-3C protease as previously described [17], or eluted by boiling of the IgG sepharose beads in SDS loading dye as described [33].

Large scale protein expression and purification of L. major 6976

Growth of the yeast was similar to that described above except in scale, a total of 43.5 l of cells were grown to an average OD600 of 9.3. To harvest, cultures were put on ice, cells were harvested by centrifugation, washed in 192 ml cold ddH2O, transferred in a thin layer to a Ziploc bag, quick frozen in two aliquots as a pellet on dry ice, and stored at −80°C.

To purify L. major 6976 protein, frozen cell pellets from 405 OD-l (OD-liter) were broken into fine chunks with a hammer, stirred into 607 ml extraction Buffer A at room temperature until thawed, moved immediately to ice and subjected to bead beating (12 rounds 15 s followed by 1 min rest) in a ice-H2O cooled large bead beating apparatus filled with 0.5 mm Zirconia/Silica beads (Biospec Products, 11079105z), after which the liquid was separated from the beads, followed by addition of PMSF to 1 mM, and centrifugation for 10 min at 10,000 rpm in a Beckmann JLA16.250 rotor to make the crude extract. Crude extracts (780 ml) were quick frozen and stored at −80°C.

Crude extracts, thawed in the presence of 2 L IPP-0 buffer (10 mM Tris–Cl pH 8.0, 0.1% NP40) were mixed with an additional 2.3 l of IPP-0 buffer, 7.6 ml of 10% NP40, 13 ml of 10 mg/ml PMSF, 10 ml of 0.5 M EDTA, and 50 ml of IgG beads (GE Healthcare 17-0969), that were themselves previously washed 3 times in 240 ml IPP-150 buffer; the mixture was gently stirred for 2 h at 4°C, at which time the resin was allowed to settle for at least 25 min, and then, the resin was transferred into 14 × 50 ml conical tubes, which were subjected to low speed centrifugation for 2 min at 2 K at 4°C (JS 5.3 swinging bucket rotor) and the supernatant was discarded. The bound IgG resin was washed with 630 ml IPP-150 (IPP-0 buffer with 150 mM NaCl) by nutating the tubes for 4 min, followed by low speed centrifugation for 2 min at 2 K at 4°C and removal of the supernatant, and this wash step was repeated 5 times, followed by 5 washes in 3C Cleavage Buffer (10 mM Tris–Cl pH 8.0, 150 mM NaCl, 0.1% NP40, 2 mM β-mercaptoethanol), followed by addition of a volume of 3C cleavage buffer equal to the bead volume and addition of 7.8 mg GST-3C protease, gentle mixing overnight at 4°C. The next day, the eluted protein was separated from the beads by low speed centrifugation (2 K for 1 min at 4°C) and removal of the supernatant, followed by 2 washes in which an equal volume of 3C Cleavage buffer was added to the resin, followed by mixing for 20 min and low speed centrifugation. The GST-3C protease was removed from the eluted protein samples by incubation of the elution and wash supernatants with 0.6 ml equilibrated GSH resin (GE Healthcare, 27-4574) for 1 h at 4°C, and then filtration of the mixture through a Nalgene 0.45 PES filter unit to remove the GSH beads.

Prior to sizing the protein preparation, the concentration of NaCl in the combined elution and both washes was adjusted to 0.2 M NaCl by addition of an appropriate volume of 5 M NaCl, then the protein preparation was concentrated from 165 to 5 ml with 4 × 15 ml Amicon Ultra15 filter (Millipore UFC901024), spun at 4,000 rpm at 4°C and loaded onto a 120 ml bed volume Superdex 200 HiLoad 1660 sizing column (GE Healthcare 17-1069, 10 × 300 mm bed dimension), and eluted overnight. Protein was visualized by SDS-PAGE and quantified with Bradford assays. Pooled fractions were concentrated to ~5 ml, centrifuged at 4°C for 10 min at maximum speed.

Results

Expression of many L. major ORFs in E. coli yields insoluble protein

To obtain proteins from pathogenic protozoa at the high levels necessary for x-ray crystallography, we cloned, expressed and analyzed over 4,000 ORFs from Leishmania major, Trypanosoma brucei, Trypanosoma cruzi, Plasmodium falciparum, and closely related organisms for both expression and solubility in E. coli [13]. The ORFs were PCR amplified and cloned using Ligation Independent Cloning (LIC) methods such that expression of the ORF was regulated by the T7 promoter and the ORF was expressed as a fusion with either an N terminal His6 tag or a His6 tag followed by a soluble 3C cleavage site [1]. Expression of the ORF fusion protein was induced at 18°C by the addition of IPTG to induce expression of T7 RNA polymerase in the BL21DE3 host strain, and continued for ~18 h. Expression of ORF-fusion proteins was evaluated by examining the protein composition of SDS whole cell lysates with SDS-PAGE while the amount of soluble protein expression was determined from preparations of a crude extract in which insoluble material was removed by centrifugation.

As shown in Fig. 1A, many strains express high levels of fusion protein that are readily detected in the whole cell extract, but as shown in Fig. 1B, only a fraction of these proteins are nearly as abundant in a soluble crude extract. As shown in Fig. 1, a fusion protein is expressed in 14 of the 16 strains as judged by the presence of dark bands in the SDS lysate (top gel), but the fusion protein is soluble in only a fraction of these strains. The fusion protein is observed in the crude extract at substantial levels in only four examples (marked with star in both gels), while in five cases no soluble protein is detectable (marked with the circle), and in five other cases only a small percentage of the expressed protein is found in the soluble crude extract (marked with the diamond). In an analysis of 4,254 cloned target genes from Trypanosoma cruzi, Trypanosoma brucei and Leishmania major, we found that 2,406 (56%) are expressed at high levels, easily visible by Coomassie staining of a whole cell SDS lysate, but only 655 (27%) of the expressing strains produce substantial levels of soluble protein (comparable to those marked with a star in Fig. 1B).
https://static-content.springer.com/image/art%3A10.1007%2Fs10969-009-9068-9/MediaObjects/10969_2009_9068_Fig1_HTML.gif
Fig. 1

Expression and solubility of a set of L. major ORFs in E. coli: A Analysis of expressed protein in E. coli. To examine expression of L. major proteins fused to a cleavable N terminal affinity tag in E. coli, SDS lysates of cells were subjected to electrophoresis and Coomassie staining. Recombinant proteins were observed in SDS lysates in all lanes except m and o; position of recombinant protein is marked in each lane with either star, diamond or circle. B Analysis of soluble expressed protein in E. coli. Soluble crude extracts derived from the same cells examined in panel (A) were subjected to electrophoresis and Coomassie staining. The strains examined in lanes a, e, i and p were judged to produce substantial amounts of soluble proteins (marked with a star), those in lanes b, c, d, f, and g were judged to produce some soluble protein based on the presence of a band at the expected molecular weight (marked with a diamond) and the strains in lanes h, j, k, l, and n were judged to produce little or no soluble protein (marked with a circle)

Most of the L. major ORFs are expressed and soluble in S. cerevisiae

To learn if expression in Saccharomyces cerevisiae improves the solubility of L. major ORFs, we cloned and expressed 83 ORFs that had exhibited different expression and solubility characteristics in E. coli (Supplementary Table 1). We focused particularly on a set of 64 ORFs that were expressed well in E. coli but were poorly or not at all soluble in E. coli (generally significantly less than 25% soluble); this group is labeled as the Test Set in all figures and tables (Table 1). We chose ORFs from the large set of 2,406 high expressors in E. coli, based on intensity of the Coomassie stained bands in the SDS lysate (see Fig. 1A, lanes b, e, f, g, h, and n), but with low solubility, based on the intensity of the Coomassie stained bands in the soluble crude extract (see Fig. 1B, lanes b, f, g, h and n). We clustered this entire group into one TEST set without further subdivision, because, for 50 of the 64 Test ORFs, the band corresponding to the expressed gene in the soluble crude extract is either very light or nearly indistinguishable from background (similar to Fig. 1B, lanes h and n), which impedes more precise definition of solubility.
Table 1

Classification of L. major ORFs by expression and solubility in E. coli

Set

Number of genes

Expression in E. coli

Solubility in E. coli

Test

64

Good

Poor

Positive control

8

Good

Good

Negative control

11

Poor

Poor

In addition, we expressed and analyzed 8 L. major ORFs that had been both well expressed and highly soluble in E. coli (Positive Control [PC]), as well as 11 L. major ORFs that were poorly expressed in E. coli (Negative Control [NC]). The ORFs that had been both well expressed and highly soluble in E. coli were designated as a positive control because it seems likely that there is no inherent barrier to their expression or solubility, and thus we expected these ORFs to be expressed and soluble in yeast. All of these ORFs were cloned such that their expression was regulated by the yeast PGAL1 promoter and the ORF was fused at its carboxy terminus to a complex tag containing a site for 3C protease, followed by an HA epitope, His6, and the ZZ domain of protein A [33].

Expression of ORF fusion proteins was evaluated by SDS-PAGE analysis of cells after lysis with glass beads and solubilization in hot SDS buffer, while soluble ORF fusion proteins were evaluated by SDS-PAGE analysis of a crude extract derived from cells lysed with glass beads in standard extract buffer followed by centrifugation to remove insoluble material. After resolution by SDS-PAGE and transfer to nitrocellulose, the ORF fusion proteins were visualized by immuno-blotting with anti-HA antibody (Fig. 2A). As shown in Fig. 2A, the ORF fusion proteins from SDS lysates are loaded adjacent to the same volume of crude extract, both of which are made from the same cell lysate. The amount of ORF fusion protein in each sample was assessed by estimating the intensity of the signal in each lane, which was assigned a value between 0 and 6, as indicated in Fig. 2A. Solubility was estimated as the ratio of total protein detected in hot SDS to the protein present in the crude extract, with proteins exhibiting >50% solubility considered to have good solubility. The fraction of proteins in each set with different solubility properties are illustrated in Fig. 2, and shown in Table 2. The estimates of expression and solubility for each L. major ORF fusion protein are reported in Table 3.
https://static-content.springer.com/image/art%3A10.1007%2Fs10969-009-9068-9/MediaObjects/10969_2009_9068_Fig2_HTML.gif
Fig. 2

Expression of L. major ORF-fusions in S. cerevisiae: A Analysis of expression and solubility of L. major ORF-fusions expressed in S. cerevisiae by immunoblot with anti-HA antibody. Expressed protein was evaluated from lysed cells to which hot SDS loading buffer was added, while soluble protein was evaluated from the same lysed cells after centrifugation to remove insoluble material prior to addition of SDS loading buffer. In both cases, proteins were separated by SDS-PAGE, transferred to nitrocellulose membranes and probed with anti-HA antibody. Each pair of lanes represents the comparison of total and soluble protein from the same cells with the identity of the L. major ORFs as well as the score assigned based on the band intensity indicated below the figure. InVitrogen Magic Markers Mix (20, 30, 40, 50, 60 kDa) is indicated. B Comparison of the solubility properties of L. major ORF-fusions expressed in yeast as a function of their class based on their characteristics when expressed in E. coli. The Test set were expressed but insoluble in E. coli; the PC (positive control) set were expressed and soluble in E. coli; the NC set was poorly expressed in E. coli. The fraction of ORF-fusions with good, partial, poor and no solubility in each group is indicated. Solubility was called good if the amount of protein in the soluble crude extract was greater than 50% of the protein present in the hot SDS lysate. Solubility was called partial when soluble protein was 25–50% of the level of the protein present in the hot SDS lysate. Solubility was called poor when the soluble protein was less than 25% of the level of the protein present in the hot SDS lysate. Proteins were called insoluble when no protein was detected in the crude extract

Table 2

Solubility of L. major ORF groups expressed in S. cerevisiae

Set

Number

Insoluble

10–25% soluble

25–50% soluble

>50% soluble

Test

63

4

0

17

42

Positive control

8

0

1

0

7

Negative control

11

0

1

3

7

Total

82

4

2

20

56

Table 3

L. major ORFs: expression and solubility of in E. coli and in S. cerevisiae

L. major ORF ID

AA

E. coli expression & solubility

S. cerevisiae expression & solubility

Expression: SDS lysates

Soluble: CE

Set

Expression: SDS lysate

Soluble: CE

Percent soluble

1522

445

5.0

1.0

T

2.0

1.0

50.0

2393

656

5.0

1.0

T

6.0

5.0

83.3

2645

369

3.0

0.0

T

0.1

0.1

100.0

2694

337

3.5

2.0

T

4.0

4.0

100.0

2698

631

4.0

2.0

T

3.0

3.0

100.0

2759

633

4.0

2.0

T

6.0

6.0

100.0

2785

442

4.0

1.0

T

2.0

1.0

50.0

2827

451

3.0

1–2

T

6.0

3.0

50.0

2936

417

5.0

1.0

T

6.0

2.0

33.3

2937

323

3.0

0.5

T

5.0

3.0

60.0

2993

703

3.0

1.0

T

0.0

0.0

ns

3001

240

3.0

0.0

T

1.0

0.5

50.0

3157

705

2.5

0.5

T

0.5

0.0

0.0

3187

531

5.0

0.5

T

6.0

2.0

33.3

3463

201

4.0

0.0

T

3.0

1.0

33.3

3512

660

3.0

0.5

T

1.0

1.0

100.0

3538

656

3.5

1.0

T

0.5

0.5

100.0

3575

293

2–3

0–1

T

4.0

1.5

37.5

3577

299

3.0

0–1

T

3.0

3.0

100.0

3849

616

4.0

1.0

T

4.0

2.5

62.5

3864

741

3.0

0.5

T

5.0

0.0

0.0

3954

373

3.0

1–2

T

4.0

4.0

100.0

4089

382

4.0

<2

T

6.0

5.0

83.3

4109

268

3–4

1.0

T

3.0

1.0

33.3

4172

476

3.0

0–1

T

1.5

1.0

66.7

4234

621

4.0

1–2

T

0.1

0.2

100.0

4235

498

4.0

1.0

T

3.0

3.0

100.0

4275

441

3.0

1.0

T

1.5

1.0

66.7

4305

323

3.5

1.5

T

0.2

0.2

100.0

4390

687

5.0

0–1

T

5.0

2.0

40.0

4396

186

2.0

1.0

T

4.0

3.0

75.0

4486

311

4.5

0.5

T

5.0

5.0

100.0

4487

405

5.0

2.0

T

6.0

6.0

100.0

4609

474

3.0

1.0

T

0.2

0.2

100.0

4634

557

5.0

1.0

T

0.5

0.2

40.0

4680

483

4.0

0–1

T

5.0

2.0

40.0

4763

406

4.0

0–1

T

6.0

5.0

83.3

4892

204

5.0

0.5

T

3.0

3.0

100.0

5361

338

3.0

0–1

T

3.0

1.0

33.3

5455

296

3.0

0.5

T

5.0

5.0

100.0

5499

335

3.0

1.0

T

6.0

6.0

100.0

5898

242

3.0

1.0

T

3.0

1.0

33.3

6106

420

2.0

0.5

T

0.1

0.0

0.0

6122

389

4.0

0.0

T

4.0

0.0

0.0

6168

297

3.0

1.0

T

6.0

6.0

100.0

6222

455

2.5

0.5

T

1.0

0.5

50.0

6265

453

3.0

0–1

T

3.0

2.0

66.7

6312

361

4.0

2.0

T

3.0

2.0

66.7

6348

377

4.0

1.0

T

4.0

3.0

75.0

6421

301

4.0

0–1

T

4.0

3.0

75.0

6586

411

3.0

1–2

T

6.0

6.0

100.0

6593

353

3.0

0.5

T

4.0

4.0

100.0

6598

266

5.0

0–1

T

6.0

6.0

100.0

6640

365

5.0

2.0

T

3.0

3.0

100.0

6679

400

3.0

1.0

T

6.0

4.0

66.7

6864

397

3.0

1.0

T

6.0

6.0

100.0

6976

335

5.0

2.0

T

4.0

4.0

100.0

6989

366

4.0

0–1

T

4.0

3.0

75.0

7177

533

4.0

1.0

T

1.0

1.0

100.0

7200

688

4.0

0.5

T

1.0

0.5

50.0

7489

409

3.0

1.0

T

6.0

6.0

100.0

7581

567

3.0

0–1

T

1.5

1.0

66.7

8264

513

4.0

0–1

T

6.0

5.0

83.3

8634

657

3.0

0–1

T

6.0

2.0

33.3

2566

405

4.0

4.0

PC

1.0

1.0

100.0

3393

264

4.0

4.0

PC

6.0

4.0

66.7

4219

607

4.0

5.0

PC

6.0

6.0

100.0

4367

429

4–5

4–5

PC

6.0

5.0

83.3

4542

329

3–4

5.0

PC

6.0

6.0

100.0

5388

243

3.0

4.0

PC

3.0

3.0

100.0

5479

271

3–4

4.0

PC

1.0

0.2

20.0

6157

309

5.0

5.0

PC

6.0

6.0

100.0

0401

486

1.0

1.0

NC

3.0

0.5

16.7

0503

364

1.0

1.5

NC

2.0

2.0

100.0

0978

251

nd

1.0

NC

4.0

4.0

100.0

2438

447

0.5

2.0

NC

2.0

2.0

100.0

2999

177

1.0

0.5

NC

3.0

3.0

100.0

3000

167

0.5

1.0

NC

2.0

2.0

100.0

3192

227

1.0

nd

NC

3.0

1.0

33.3

5821

277

0.5

2.0

NC

2.0

1.5

75.0

6443

582

0.5

1.0

NC

0.5

0.2

30.0

7247

247

0.5

1.0

NC

0.5

0.3

50.0

8109

244

0–1

1.0

NC

4.0

4.0

100.0

In the entire set, most ORFs are expressed and highly soluble. All but one of the 83 ORFs was expressed at detectable levels in S. cerevisiae. The solubility levels of 56 ORFs (67% of the total set) are estimated to be greater than 50% of the total expressed level (Table 2). Moreover, only four ORFs are completely insoluble and only two other ORFs exhibit solubility of 10–25% of the expressed protein.

Solubility in the Test set of ORFs, which were insoluble in E. coli, is very high: 42 of 63 expressed ORFs exhibit greater than 50% solubility (Fig. 2B); in fact 40 of these ORFs were judged to yield >66% soluble protein. An additional 17 ORFs in the Test set (27%) are partially soluble, in the range of 25–50% of the total expressed protein, while four ORFs were expressed, but did not yield soluble protein. Thus, expression in S. cerevisiae dramatically improves solubility for 67% (42/63) to 94% (59/63) of the ORFs that were insoluble when expressed in E. coli.

In examining the solubility of ORFs in the other sets, we find, as might be expected, that solubility is greatest among the Positive Control (PC) set of ORFs, which were highly expressed and soluble in E. coli, with 7 of the 8 ORFs (88%) in this group displaying greater than 50% solubility (Fig. 2B, Table 2). Furthermore, even in the negative control (NC) set of ORFs, which were poorly expressed in E. coli, 7 of the 11 ORFs (64%) in the Negative Control Group are highly soluble when expressed in yeast. Thus solubility among all of these ORFs is relatively high.

Expression of ORFs varies and correlates weakly with good solubility

As can be seen in Fig. 2A, expression of the ORF fusion protein is highly variable, as has been observed previously with expression in both E. coli and in yeast [17, 39]. The panels in Fig. 2A illustrates ORFs whose expression was classified as low, with a score of 2 or less (e.g. L. major 4172), medium, with a score of 3–4 (L. major 5361, 6265, 3463) and high, with a score of 5–6 (L. major 2759, 4367, 6598, 4763, 2393, 8634). The number of ORFs in each expression category, shown in Fig. 3A, is relatively even for the Test ORFs. Based on the yields of purified protein from several high expressors (described below), ORFs classified as high expressors produce 60–200 μg protein per liter of culture at OD600 of 1; since cultures are routinely grown to OD600 of ~8, this corresponds to 480–1,600 μg per liter of culture. Among the 72 ORFs in the Test and Positive Control groups, both of which were highly expressed in E. coli, 28 ORFs (39%) are expressed at high levels in yeast while none of the 11 ORFs that were poorly expressed in E. coli (Negative Control) are expressed at high levels in yeast (Fig. 3A). Moreover, while 54% of the ORFs in the Negative Control group are expressed at low levels, only 29% of the ORFs in the Test and Positive Control groups are expressed at low levels. Thus there may be a correlation between expression in S. cerevisiae and in E. coli, although additional factors might account for the differences in expression observed in this study.
https://static-content.springer.com/image/art%3A10.1007%2Fs10969-009-9068-9/MediaObjects/10969_2009_9068_Fig3_HTML.gif
Fig. 3

Solubility as a function of expression. A Expression levels of L. major ORF-fusions from Test, PC and NC groups. The number of ORF fusion proteins in each set that display high, medium, low or no expression are plotted. B Solubility of L. major ORF-fusions from each group as a function of expression level. In each set, the fraction of ORF fusion proteins that display good, partial, poor or no solubility are examined as a function of their expression category

We further examined the relationship between solubility and expression in each group and among the total set (Fig. 3B). We find that more highly expressed proteins are slightly more likely to be soluble. In the test set, 16 ORFs in each of the high and medium expressor classes, representing 70% and 73% of these ORFs respectively, are highly soluble while only 10 ORFs (56%) in the low expressors’ class are highly soluble. Similarly, the only ORF from the Positive control group that is poorly soluble is also poorly expressed. While analysis of a much larger group of ORFs would be required to determine the validity of this relationship, it is intriguing to note this trend towards higher solubility in ORFs that are expressed better.

Many L. major ORFs can be purified with reasonably high yields in single step affinity purification

To find out if the soluble L. major proteins are likely to be folded when expressed in yeast and to quantify the predicted yields of protein, we determined if several of the soluble high expressors from the Test group could bind to IgG sepharose. Retention of ORF fusion proteins on the IgG sepharose was evaluated in a stick and strip assay, in which proteins in the crude extract are bound to the IgG sepharose beads, which are then washed to remove unbound protein, followed by boiling in SDS loading buffer and analysis by SDS PAGE. As shown in Fig. 4A, we observe substantial amounts of polypeptides in 7 of the 9 experimental lanes (heavy arrows) in addition to the heavy and light IgG chains which are seen in each lane and in the no extract control (lane c). This suggests that the fusion protein is folded in a conformation that can bind the affinity resin. In addition, an ORF-fusion protein is detected in lane j partially occluded by the heavy IgG band (and substantiated below).
https://static-content.springer.com/image/art%3A10.1007%2Fs10969-009-9068-9/MediaObjects/10969_2009_9068_Fig4_HTML.gif
Fig. 4

Evaluation of soluble protein expression based on affinity purification of L. major ORFs on IgG sepharose. A Analysis of soluble L. major ORFs. S. cerevisiae cells with appropriate plasmids were induced to express L. major ORF fusion proteins by galactose addition and harvested after 24 h of further growth. Expressed protein was evaluated after extract preparation, and binding to IgG Sepharose, followed by SDS-PAGE of the IgG beads and staining with Coomassie: lane a, molecular weight markers (BioRad broad range markers-0.4 μg each); lane b, His6-MBP-3C protease (5 μg); lane c, no extract, IgG beads only; lanes dl, L. major ORF-fusions; lane d, L. major ORF-fusion 2759; lane e, L. major ORF-fusion 6864; lane f, L. major ORF-fusion 8264; lane g, L. major ORF-fusion 4487; lane h, L. major ORF-fusion 7489; lane i, L. major ORF-fusion 6586; lane j, L. major ORF-fusion 6598; lane k, L. major ORF-fusion 6168; lane l, L. major ORF-fusion 5499. B Purification of L. major ORF-fusions on IgG sepharose. Proteins were bound to IgG Sepharose and washed, and bound protein was eluted after cleavage of the ZZ tag with 3C protease, lane a, molecular weight markers (BioRad broad range markers-0.4 μg each); lane b, His6-MBP-3C protease (5 μg); lanes cg, L. major ORF-fusion 6976; lanes hl, L. major ORF-fusion 4089; lanes mq, L. major ORF-fusion 6586. Lanes c, h, m, sample bound to IgG beads; lanes d, i, n, protein eluted with 3C protease; lanes e, j, o: IgG beads after proteolytic cleavage; lanes f, k, p, second wash of the IgG beads after proteolytic cleavage; lanes g, l, q, IgG beads after second wash

For five of the nine ORF-fusion proteins, the amount of protein from this one-step purification is sufficient for structural analysis. We observe between 4 and 10 μg of ORF-protein fusion migrating near the expected molecular weight, (indicated by arrows in lanes: d, L. major 2759: e. L. major 6864; f. L. major 8264; g. L. major 4487; and h. L. major 7489), based on the intensity of their staining compared to that of the 5 μg of His6-MBP-3C protease in lane b and the ~0.4 μg of the molecular weight markers in lane a. Since each lane is loaded with protein from an equivalent number of cells (4.8 ml at OD600 of ~8.7; 42 OD ml), we have calculated the yield of L. major ORF fusion proteins per OD600 liter of yeast. Thus, these ORFs produce ~240 μg of ORF fusion protein per OD-l or 2 mg per liter at OD600 of 8.5.

Since it is difficult to resolve some proteins from the IgG beads in this stick and strip assay and since some proteins display apparent heterogeneity, we further examined the yield of 16 proteins from the Test set in a purification that involved binding to IgG sepharose and release by cleavage with 3C protease (Fig. 4B). In each of the three examples shown in Fig. 4B, we observe one or two polypeptides released from the IgG resin after cleavage with 3C protease (lanes d, f, i, k, n, and p) as well as an ~17 kDa polypeptide generated by 3C cleavage that is retained on the IgG resin (compare lanes c, h, and m to lanes e, g, j, l, o and q). This 17 kDa polypeptide is the complex fusion tag, which is identical in all samples and retained on the IgG sepharose. Neither L. major 6976 ORF fusion protein nor L. major 4089 ORF fusion protein is detected prior to 3C cleavage (lanes c and h) on the IgG sepharose resin because both co-migrate with the larger IgG band. Similarly, we found that L. major 6598 ORF, which was not easily visible in the Stick and Strip (Fig. 4A, lane j), produces soluble, albeit heterogeneous, fusion protein with this 2 step purification (Supplementary Fig. 2). By contrast, multiple high molecular weight polypeptides are observed in the stick and strip with L. major 6586 (Fig. 4A, lane i), but these bands are resolved into a single species of the correct molecular weight after cleavage with 3C protease (Fig. 4B, third panel). Additional 2 step small scale purifications of 4 L. major ORF fusions not shown in Fig. 4A as well as a purification of L. major 8264 (Fig. 4A, lane e) are shown in Supplementary Figs. 1 and 2, all of which yield a single major polypeptide.

Surprisingly, two polypeptides co-purify from the strain expressing L. major 4089 ORF (Fig. 4B, lanes j and l). We think it likely that the single gene product is cleaved into two polypeptides autocatalytically. This ORF is annotated as a putative S-adenosylmethionine decarboxylase proenzyme, and in yeast, the analogous gene product Spe2 is cleaved into a 10,000 and 36,000 Da products with the 10,000 Da product arising from the N terminus [25]. Moreover, generation of the two subunits is observed when the yeast protein is expressed in E. coli, suggesting that cleavage is autocatalytic.

Purification of L. major ORF fusion proteins yielded between 84 and 240 μg of ORF-fusion protein per OD-l, for all but one of the 16 L. major ORFs examined (see Table 4). Moreover, the single step stick and strip analysis in Fig. 4A confirms these expected yields in the seven cases in which the predicted polypeptide was observed. Since S. cerevisiae is routinely grown to OD600 of 8.5 as part of our preparations, the actual yield of proteins is 700 μg to 2 mg per liter of culture. Thus, the amount of protein is within the range required for x-ray crystallography. This claim is substantiated further below.
Table 4

Yield of L. major proteins from S. cerevisiae based on purification on IgG beads

L. major ORF ID

Protein function (inferred)

Yielda

Lmaj006976AAA

Cyclin 1

200

Lmaj006593AAA

Sterol 24-c-methyltransferase

88

Lmaj004486AAA

‘Monoglyceride lipase’

120

Lmaj004089AAA

S-adenosylmethionine decarboxylase proenzyme

240

Lmaj006586AAA

Glucokinase 1-like protein

140

Lmaj004763AAA

Serine peptidase

16

Lmaj002393AAA

GMP synthase

160

Lmaj002759AAA

Phenylalanyl-tRNA synthetase

100

Lmaj006598AAA

Caltractin

120

Lmaj006679AAA

ensangp 00000010174-like protein

120

Lmaj006864AAA

Flagellar protofilament ribbon protein-like protein

160

Lmaj008264AAA

Beta-fructosidase

96

Lmaj004487AAA

n-Acyl-l-amino acid amidohydrolase

96

Lmaj007489AAA

Anion-transporting ATPase-like protein

140

Lmaj006168AAA

Small G-protein, putative

84

Lmaj005499AAA

d-isomer specific 2-hydroxyacid dehydrogenase-like protein

108

aYield (μg per OD-l)

Large scale purification of two L. major ORF fusions yields highly purified protein with the expected yield

To determine if protein preparations with sufficient purity for structural analysis could be obtained from these L. major ORF fusions, we purified two L. major proteins from 44 l of cells through IgG sepharose binding and elution with 3C protease, followed by sizing chromatography and concentration of pooled samples. As shown in Fig. 5 and in Supplementary Fig. 3, we obtained nearly homogenous preparations of L. major 6976 and L. major 4089 from this procedure. The yield of highly purified native L. major 6976 protein from 405 OD-ml (44 l) after purification on IgG sepharose followed by elution with 3C protease is estimated to be 88 mg. After sizing chromatography, 51 mg of protein (4.8 ml at 10.7 mg/ml) was obtained from five pooled fractions with an additional three fractions that contain substantial amounts of protein. We obtained similar results in the purification of L. major 4089 ORF-fusion from 384 OD-l with a final yield of 47 mg after sizing and pooling of selected fractions (Supplementary Fig. 3) with an additional five fractions that contain similar amounts of protein, for an estimate of more than 90 mg total. As described above, this protein apparently undergoes autocleavage and the two subunits co-purify throughout with the major contaminant likely corresponding to the full length polypeptide.
https://static-content.springer.com/image/art%3A10.1007%2Fs10969-009-9068-9/MediaObjects/10969_2009_9068_Fig5_HTML.gif
Fig. 5

Large scale purification of L. major 6976 ORF fusion protein. A Purification of L. major 6976 ORF-fusion from 405 OD-l. Proteins were bound to IgG sepharose, eluted by cleavage with 3C protease, followed by removal of 3C protease with GSH resin. lane a, molecular weight markers (BioRad broad range markers-0.4 μg each); lane b, His6-MBP-3C protease (5 μg); lane c, GST-3C protease (~0.9 μg); lane d, sample bound to IgG beads; lane e, protein eluted with 3C protease; lane f, protein after binding GST-3C protease to GSH resin and filtration to remove GSH beads; lane g, IgG beads after proteolytic cleavage; lanes h and j, second and third washes of the IgG beads after proteolytic cleavage; lanes i and k, protein from second and third washes after removal of GST-3C protease with GSH resin and filtration; lane l, IgG beads after the third wash. B Purification of L. major 6976 by sizing chromatography. Lanes am contain 25 μl each of fractions 43–55 (2 ml per fraction). C Purified, concentrated L. major 6976 protein. Protein from fractions 47 to 51 (lanes ei in B) was concentrated to ~5 ml and centrifuged for 10 min at maximum speed in a microfuge at 4°C

We compared the yields of protein in the large scale preparations to estimates from the small scale purification to find out if the small scale estimates of protein yield are accurate. The yield of L. major 6976 protein after purification on IgG sepharose is nearly identical to the estimated yield from the small scale purification (88 mg from 405 OD-l corresponds to 0.22 mg per OD-l while 20 μg from 100 OD-ml corresponds to 0.2 mg per OD-l). Likewise, L. major 4089 yields ~0.23 mg per OD-l in the large scale purification compared to 0.24 mg per OD-l in the small scale purification reported in Table 4.

Codon optimization improves the yield of L. major 6976 in S. cerevisiae

Expression of a large number of heterologous genes in E. coli is improved either by altering the codons that specify an identical polypeptide or by over-expression of rare tRNAs, with one recent study demonstrating improved expression of 22 of 30 human dehydrogenase/reductase genes using either a synthetic recoded gene or the native gene in a host that over-expresses a number of rare tRNA [5]. In yeast, Keppler-Ross et al. [26] determined that poorly expressed mCherry RFP was converted to a well expressed derivative by codon-optimization. We noted that codon usage among the L. major ORFs deviated significantly from optimal codon usage in yeast, resulting in relatively poor CAI scores among the L. major ORFs, in which CAI scores average 0.06 compared to an average of 0.172 among 5,565 yeast ORFs and of 0.26 among 319 highly expressed verified ORFs, cloned into a single expression vector [17]. Thus, we explored the effects of codon replacement on expression of L. major 6976 to learn if additional protein can be obtained in soluble form from this ORF. As shown in Fig. 6, the yield of L. major 6976 from a recoded version is 3 fold greater than wild type, enhancing the yield of this ORF to 6 mg purified protein per liter of culture.
https://static-content.springer.com/image/art%3A10.1007%2Fs10969-009-9068-9/MediaObjects/10969_2009_9068_Fig6_HTML.gif
Fig. 6

Yield of L. major 6976 ORF-fusion protein from the native gene and a gene recoded with optimal S. cerevisiae codons. The yields of protein from 50 OD-ml of non-recoded and recoded L. major 6976 were compared after purification on IgG sepharose followed by elution with 3C protease, and SDS PAGE of different amounts of the eluate as shown. The amounts of protein in the eluant are determined by comparison to the mass markers (His6-MBP-3C)

Discussion

We have provided direct evidence that heterologous proteins that are expressed but insoluble in E. coli are likely to be expressed and soluble in the simple eukaryote S. cerevisiae, and can be purified at levels sufficient for structural analysis in a significant fraction of the cases. Expression in S. cerevisiae resolves the solubility problem for a large fraction (42 of 63, 67%) of proteins that were expressed in both E. coli and yeast but were insoluble in E. coli. Furthermore, many of these proteins (15 of 63) can be purified by affinity chromatography in sufficient yield for structural analysis. Thus, this analysis of 63 insoluble targets generates 15 new candidates that are expressed at high levels, are nearly completely soluble, and can be purified by affinity chromatography. An additional 10 candidates are fully soluble but are not expressed as robustly, based on immunoblot analysis. Thus, the yield of viable candidates for structural analysis (24–40% of the 63 test genes) is at least as good as the results from a screen in E. coli, starting with a set of ORFs that failed solubility criteria in E. coli. Moreover, the protein preparations purified by affinity and sizing chromatography are nearly homogenous, more so than many yeast protein preparations with similar yields (Quartley, Grayhack, and Phizicky, unpublished data). We suggest that native proteins may have evolved to interact with other proteins in the organism even weakly, and thus expression in a foreign host may be advantageous for improved purity. We conclude from this data that the yeast S. cerevisiae is an effective alternative organism for preparation of proteins for structural analysis.

Although we think that expression in S. cerevisiae is the primary cause of improved solubility for these L. major ORFs, it is conceivable that the differences in either the location or identity of the fusion tag contribute to solubility of some ORFs. For expression in E. coli, L. major ORFs were fused to either an N terminal His6 tag or an N terminal His6 tag followed by a soluble 3C cleavage site [1], while for expression in S. cerevisiae, ORFs were fused to a C-terminal tag consisting of a 3C cleavage site, followed by an HA epitope, His6, and the ZZ domain of protein A [33]. Identity of the fusion tags has been shown to affect solubility of many recombinant proteins but no single tag or condition that improves solubility of most or all proteins has been identified [9, 12, 38, 48]. In particular, in one study, fusion of the Z domain of protein A at the N terminus of seven proteins resulted in significantly soluble protein for only 1 of 7 proteins tested, and in only 2 of 21 conditions [37]. In our study, we have examined solubility in a single condition with one ZZ domain construct, resulting in nearly complete solubility of 42 of the 64 test proteins. Thus, both the fraction of proteins with improved solubility and the fraction of the protein that is soluble are significantly improved relative to studies in which fusion tags were altered in E. coli.

In addition to studies on protein expression and purification, the wealth of genetic, molecular and genomic information and of molecular tools in both S. cerevisiae and Pichia pastoris [40] contribute significantly to their growing use for protein expression. Thus, production of secreted recombinant insulin in S. cerevisiae exploited information from years of study on the Golgi, ER and secretory mechanisms [27], and the recent production of both rat plasma membrane Na+/H+ antiporters [14] and a plant uracil transporter [15] was improved by expression in mutants deficient in a particular ubiquitin ligase, Rsp5. Similarly, the recent creation of engineered P. pastoris strains enabled production of recombinant proteins with a humanized glycan structure [20, 21]. In S. cerevisiae, there is an extensive tool-kit of plasmids for high level expression employing a variety of regulated and constitutive promoters, and appropriate GFP and RFP variants for study of expression [11, 26]. In addition, the rapid growth of the biopharmaceutical market of recombinant proteins and monoclonal antibodies is also driving the development of yeasts as hosts for expression. Although most of the 165 biopharmaceuticals products on the market in 2006 [47] were expressed in either mammalian cell culture or in E. coli, expression in mammalian cell culture is prohibitively expensive, and yeast is increasingly viewed as an alternative host. Indeed, of these 165 products, 21 recombinant proteins were produced in S. cerevisiae including Gardasil, a recombinant vaccine against human papillomavirus (from MerckTherapeutic), and Levemir, a long-acting rh insulin analog (from Novo Nordisk) [47]. The analysis described here suggests that expression in yeast is a suitable alternative for a large fraction of proteins, with the promise of an even larger fraction with the benefits of recoding.

Conclusions

Development of eukaryotic hosts for high level expression and purification of proteins for structural analysis is important because expression in the bacterium E. coli often results in improperly folded and insoluble proteins. We show here that the yeast Saccharomyces cerevisiae exhibits four qualities requisite to a eukaryotic expression host: improved solubility of proteins that are insoluble when expressed in E. coli, sufficient yields of protein for structural analysis, near homogeneity of purified protein preparations, and improved expression from altered codon usage.

Acknowledgments

We thank M. Dumont for advice and comments. This work was supported by National Institutes of Health (NIH) Grant NIH 1 U54 GM074899 establishing the Center for High Throughput Structural Biology.

Supplementary material

10969_2009_9068_MOESM1_ESM.doc (142 kb)
Supplementary Table 1L major ORF targets: Putative Function and Expression in E. coli (DOC 142 kb)
10969_2009_9068_MOESM2_ESM.tif (240 kb)
Supplementary Figure 1Evaluation of soluble protein expression based on affinity purification of L. major ORFs on IgG sepharose. Purification of of L. major ORF-fusions 8264, 4486 and 6593 on IgG sepharose. Proteins were bound to IgG Sepharose and washed, and bound protein was eluted after cleavage of the ZZ tag with 3C protease, lanes a–e, of L. major ORF-fusion 8264; f–j, of L. major ORF-fusion 4486; k–o, of L. major ORF-fusion 6593. Lanes a, f, k, sample bound to IgG beads; lanes b, g, l, protein eluted with 3C protease; lanes c, h, m: IgG beads after proteolytic cleavage; lanes d, i, n, second wash of the IgG beads after proteolytic cleavage; lanes e, j, o, IgG beads after second wash. (TIFF 541 kb)
10969_2009_9068_MOESM3_ESM.tif (186 kb)
Supplementary Figure 2Evaluation of soluble protein expression based on affinity purification of L. major ORFs on IgG sepharose. Purification of L. major ORF-fusions 6598, 2393 and 6679 on IgG sepharose. Methods and lanes are identical to Supplementary Figure 1. (TIFF 581 kb)
10969_2009_9068_MOESM4_ESM.tif (213 kb)
Supplementary Figure 3Large scale purification of L. major 4089. A. Purification of L. Major 4089 ORF-fusion from 384 OD-L on IgG sepharose, followed by cleavage with 3C protease, and removal of 3C protease with GSH resin. Lanes are identical to Figure 5. B. Purification of L. Major 4089 by sizing chromatography. Lanes a–m contain 25 µl each of fractions 39 to 49 (1.8 ml per fraction). C. Purified, concentrated L. Major 4089 protein. Protein from fractions 43 to 46 (lanes e–h in B) was concentrated to ~5 ml and centrifuged for 10 min at maximum speed in a micro-centrifuge at 4°C. (TIFF 1,752 kb)

Copyright information

© Springer Science+Business Media B.V. 2009