A microscale protein NMR sample screening pipeline
- 1.5k Downloads
As part of efforts to develop improved methods for NMR protein sample preparation and structure determination, the Northeast Structural Genomics Consortium (NESG) has implemented an NMR screening pipeline for protein target selection, construct optimization, and buffer optimization, incorporating efficient microscale NMR screening of proteins using a micro-cryoprobe. The process is feasible because the newest generation probe requires only small amounts of protein, typically 30–200 μg in 8–35 μl volume. Extensive automation has been made possible by the combination of database tools, mechanization of key process steps, and the use of a micro-cryoprobe that gives excellent data while requiring little optimization and manual setup. In this perspective, we describe the overall process used by the NESG for screening NMR samples as part of a sample optimization process, assessing optimal construct design and solution conditions, as well as for determining protein rotational correlation times in order to assess protein oligomerization states. Database infrastructure has been developed to allow for flexible implementation of new screening protocols and harvesting of the resulting output. The NESG micro NMR screening pipeline has also been used for detergent screening of membrane proteins. Descriptions of the individual steps in the NESG NMR sample design, production, and screening pipeline are presented in the format of a standard operating procedure.
KeywordsNMR screening Micro-cryoprobe Structural genomics Construct optimization
NMR spectroscopy is a powerful method for providing qualitative and quantitative information about biophysical properties of proteins in solution, including the tertiary structure, the secondary structure distribution, rotational correlation times, internal dynamics, and amide proton exchange rates. A large amount of information can be extracted from a few very simple experiments using a natural abundance or 15N-enriched sample. NMR sample screening provides a valuable approach for identifying protein constructs and solution conditions providing the best quality data, enabling resonance assignments and more extensive biophysical studies. As a result, NMR screening of multiple protein target constructs and solution conditions can greatly impact the efficiency, accuracy, speed, cost and ultimately the success of NMR research in structural biology and structural genomics. As NMR investigations shift more and more towards challenging larger proteins, multi-domain systems, and membrane proteins in detergent solutions, where many parameters and conditions require testing before a suitable sample is obtained, the role and value of screening NMR samples will become even more significant.
The Northeast Structural Genomics Consortium (NESG) (www.nesg.org) has implemented a largely “automated pipeline” for target selection, construct optimization, protein sample production, and efficient microscale screening of protein NMR targets using a micro-cryoprobe requiring only small amounts of protein, typically 10–200 μg of protein sample in 8–35 μl volume for each experiment. A remarkable degree of automation has been made possible by the combination of ad-hoc database tools, mechanization of key process steps and the use of a micro-cryoprobe that provides excellent data while requiring little optimization and manual setup thanks to the small coil diameter and enhanced mass sensitivity. During an initial exploratory phase, screening was conducted using a room temperature 600 MHz 1-mm probe (Bruker TXI 1 Microprobe). Subsequently, we switched to a more advanced 600 MHz 1.7-mm probe (Bruker TCI 1.7 MicroCryoprobe). This probe provides a mass sensitivity (S/N per μg of solute) that is one order of magnitude higher than conventional 5-mm probes. This impressive figure translates into a 1.7-mm probe that is as sensitive as a 5-mm room temperature probe with the sample volume requirement reduced by about an order of magnitude (~30 versus ~300 μl). Using this 1.7-mm micro NMR cryoprobe, a 600 MHz spectrometer system is used seamlessly for target screening, the acquisition of data necessary for backbone and side-chain chemical shift assignments and structure determination of [U-13C,15N] proteins up to 20 kDa.
Microcoil probe technology has been demonstrated to be very valuable for protein NMR studies, particularly for proteins for which only limited quantities are available or for which many conditions need screening. Wüthrich and co-workers have shown that microcoil probes are useful for NMR screening in a miniaturized high throughput structural genomics pipeline (Peti et al. 2005), and have also demonstrated the value of micro probes in screening detergent conditions for membrane protein structure determination by NMR (Zhang et al. 2008). Peti et al. (2004) have determined backbone and simultaneous aliphatic and aromatic side chain resonance assignments on <500 μg quantities of [U-13C, 15N]-labelled proteins using a flow-through HCN z-gradient CapNMR probe (MRM/Prostasis Inc.), and Aramini et al. (2007) have demonstrated complete 3D structure determination of small proteins using <100 μg of protein sample with a 1-mm Bruker Microprobe. In other applications, microcoil NMR probes have been combined with a micromixer to investigate solvent induced conformational transitions in ubiquitin (Kakuta et al. 2003) and capillary HPLC to characterize tryptic fragments of a protein kinase (Hentschel et al. 2005).
This article describes the general overall process of initial NMR sample characterization in the NESG, emphasizing the role of 1D and simple 2D NMR screening in selecting for optimal construct and solvent conditions, as well as determining oligomerization state, in order to validate protein targets for subsequent preparation with double- or triple-labeling (15N, 13C, 2H) for structure determination. We also outline the database infrastructure that has been put in place to allow for flexible implementation of our screening protocols and for harvesting and archiving of the resulting data. Descriptions of the individual steps in the NESG sample design, production and screening pipeline are presented in the following sections.
NESG high-throughput pipeline flow chart
Disorder prediction with dismeta server
Bioinformatics methods provide means for rapid identification of disordered regions in proteins. As the several disorder prediction software packages that have been developed each approach the problem from a slightly different point of view, we have found it useful to combine a number of these programs under a server and to extract a more robust consensus disorder prediction. The DisMeta Server (www-nmr.cabm.rutgers.edu/bioinformatics/disorder) runs a wide range of disorder prediction software, including DISEMBL (Linding et al. 2003a), DISOPRED2 (Ward et al. 2004), DISPro (Cheng et al. 2005), DRIP-PRED (MacCallum 2006), FoldIndex (Prilusky et al. 2005), FoldUnfold (Galzitskaya et al. 2006), GlobPlot2 (Linding et al. 2003b), IUPred (Dosztanyi et al. 2005), Prelink (Coeytaux and Poupon 2005), RONN (Yang et al. 2005), and VSL2 (Peng et al. 2006). The server has been designed to run standalone or interfaced directly with our target database for batch prediction and parsing of all NESG targets. Fig. 2 shows a representative DisMeta output for the Staphylococcus saprophyticus SSP0609 protein (Rossi et al. 2009) (NESG ID: SyR11), a secreted bacterial antigen with an intrinsically disordered amino-terminal signal peptide that was identified and excluded by this approach.
Construct design is carried out largely using automated tools developed by the NESG project. The software uses reports from DisMeta to identify the predicted secondary structure regions, signal peptides characteristic of secreted proteins, trans-membrane segments, and disordered regions. The construct design software will generate multiple alternative constructs for each ‘interest region’ of the structural core (at least 2 constructs per ‘interest region’). If either the N- or C- terminus of alternative constructs is predicted to be located in the middle of a helix or strand, it will be extended to the adjacent predicted loop region. Signal peptides, inter-membrane segments and large disordered regions predicted from the DisMeta report, are excluded from the construct design. For ‘interest regions’ with short disordered regions at the N or C-terminal ends, more constructs will be generated, excluding these flexible region(s) from the designed construct.
The standard E. coli expression systems used in the NESG project produce proteins inside of the cell, and are not generally suitable for producing secreted proteins that may contain disulfide bonds. However, proteins (or domains) containing zero or one Cys residue can be successfully made in intracellular E. coli expression systems, and these are also identified by the construct design software.
Cloning, expression, and purification
Once boundaries of the ordered core of the protein targets are identified, a number of primers are designed using the automated primer design software Primer Prim’er (Everett et al. 2004) and cloned into a set of E. coli pET vectors containing short hexaHis tags at the N- or C-terminal regions. A detailed description of the robotic cloning and expression platform used for NMR protein sample production has been published (Acton et al. 2005). The primers generated for PCR amplification of the targeted coding sequences add 15 base pair regions on each end of the DNA fragment. These sequences overlap with the multiple cloning site of either our pET15 or pET21 T7 expression vector derivatives, allowing for high-throughput, high-efficiency Infusion-based ligation independent cloning (Clontech). Expression vectors are constructed in a high throughput fashion in 96-well format using a Qiagen BioRobot 8000 system (Acton et al. 2005).
The growth medium used for fermentation is MJ9 (Jansson et al. 1996), a modified minimal medium containing a stronger buffering system and supplemental vitamins and trace elements optimized for efficient isotopic-enrichment of proteins. We have found that MJ9 medium can support the same cell density and protein expression levels as rich media such as LB (data not shown), although not as high as rich media such as Terrific Broth (Tartoff and Hobbs 1987). For NMR screening, samples are prepared with 100% 15N and 5% 13C enrichment. The fermentation process begins with transformation of the target expression vector into the appropriate BL21(DE3) strain of E. coli, followed by an LB preculture. This preculture is then used to inoculate a 8 ml overnight culture which is grown to saturation. The entire volumes of each overnight culture is then used to inoculate a 67 ml fermentation in a 100 ml tube (Midi Scale Fermentation) containing MJ9 supplemented with uniformly U-15NH4-salts (1–2 g/l) and a mixture of 100% U-13C-glucose (5%) and unenriched glucose (95%) (3–4 g/l) the sole sources of nitrogen and carbon, respectively. The cultures are incubated with constant aeration with 100% O2 at 37°C until OD600 ~1.0–1.5 units, equilibrated at 17°C, and induced with IPTG. Incubation with vigorous aeration in a 17°C room continues overnight followed by harvesting through centrifugation. Aliquots of the induced cells are taken and SDS polyacrylamide gel electrophoresis analysis is performed on sonicated aliquots to assay for expression and solubility (Acton et al. 2005). The cell pellets are then stored at −20°C.
In the purification stage of sample prepartion, cell pellets are disrupted by sonication, and centrifuged to remove their insoluble portion. The resultant supernatant is then applied to an ÄKTAxpress™ (GE Healthcare) system using two-step protocol consisting of HisTrap HP affinity chromatography followed directly by HiLoad 16/60 Superdex 75 gel filtration chromatography. Samples are concentrated using an Amicon ultrafiltration concentrator (Millipore) to 0.3 to 1.0 mM in 95% H2O/5% 2H2O solution containing an appropriate screening buffer; e.g. 20 mM MES, 200 mM NaCl, 10 mM DTT, 5 mM CaCl2 at pH 6.5, or 20 mM NH4OAc, 200 mM NaCl, 10 mM DTT, 5 mM CaCl2 at pH 5.5 or pH 4.5 (see Supplementary Table S1 entries MJ001, MJ002 and MJ003 for complete reagent list). All buffers contain 50 μM DSS standard for internal referencing (Markley et al. 1998). Aliquots (8 or 35 μl) are then transferred to 1.0-mm or 1.7-mm SampleJet Tubes (Bruker) using Gilson 96-well liquid-handler.
Preparation of samples for NMR screening
Various protein samples to be screened are placed in 96-well plates, and 1.7-mm NMR tubes are robotically filled with 35 μl of protein. NMR spectra are then obtained using a 1.7-mm micro-cryoprobe on a Bruker 600 MHz instrument equipped with a Bruker B-ACS 60 sample handler. The device is controlled by Bruker IconNMR software. Integrated database tools have been developed to reduce operator intervention to a minimum during data acquisition and archival. The IconNMR run execution is scripted based on sample ID and conditions, which are manually entered only one time, upstream at the sample production stage. The robotic autosampler holds up to sixty samples and each sample is locked, tuned and shimmed prior to data acquisition (see image of the dedicated hardware in Supplementary Figure S1). The sample temperature is regulated at 20°C and in the first, completely automated step, 1D 1H NMR spectra with solvent presaturation are acquired using a standard template with optimized values for parameters such as the carrier position, proton sweep width, proton 90° pulse width, saturation power and delay for water-presaturation, and 256 scans for each spectrum. The only human involvement in running a set of 1D screening spectra is the loading of the samples into the autosampler.
1D NMR screening and scoring
2D NMR screening and scoring
2D 1H-15N HSQC spectra are queued with the best “good” samples heading the queue. Each of the 2D spectra is typically run for about 2 h at 20°C. An attempt to acquire 2D spectra on all the samples is made unless the protein shows no signal (poor classification), but a maximum of 10 h is allotted for the least concentrated sample. In addition, digital resolution in the 15N dimension is adjusted for larger proteins (18–20 kDa) as deemed necessary to obtain a better spectrum for scoring. Conversely, weak samples are run with emphasis on sensitivity (e.g. 64 points in 15N evolution and 4× the number of scans).
Throughput and bottlenecks
Using the robotic sample changer with the 1.7-mm micro cryoprobe, about 60 1D spectra are acquired in ~24 h. These data are used to prioritize samples for 2D 1H-15N HSQC data collection. 2D spectra are each acquired for 2–6 h, depending on the sensitivity of the sample and size of the target molecule. Generally, 2D screening for a sixty-sample batch of targets is completed in about a week. For the NESG NMR structure production pipeline, scheduling on the spectrometer is adjusted to allow for the screening of about one hundred samples in each 2-week period of each month.
One of the limitations of the automatic screening is fluctuation of the sample volume, due to evaporation, that causes lock failure. Care must be taken to ensure that volume fluctuation be kept within 5 μl by acquiring data as quickly as possible given the available sample concentration or by sealing the microtube. The automation software is configured to send error notification via e-mail to the operator. One obvious solution can be to start with slightly larger sample volumes i.e., 40–45 μl samples. However, samples prepared with larger volumes have been observed to result in poorer performance of the autoshimming software that is run in setting up each sample.
For proteins providing marginal quality (e.g. “Promising”) HSQC spectra, several “salvage” processes have been developed to provide improved solvent conditions and/or construct design. Some of the most effective strategies include sample buffer optimization by NMR and amide hydrogen deuterium exchange with mass spectrometry detection (HDX-MS) for construct optimization (Sharma et al. 2009).
Amide hydrogen deuterium exchange with mass spectrometery detection (HDX-MS) for construct optimization
Using NMR to determine oligomerization states of proteins
A priori knowledge of the oligomerization state of a protein is critical to sample labeling choice and to accurate protein structure determination by solution NMR techniques. The principal approaches employed in the NESG for elucidation of the oligomerization state of targets selected for NMR structure determination include: (1) analytical gel filtration chromatography, (2) static light scattering, and (3) 1D 15N NMR relaxation measurements. Our standard protocol for analytical gel filtration with static light scattering detection has been described elsewhere (Acton et al. 2005). Here we discuss our standard procedure for measurements of rotational correlation times from 1H-detected 1D 15N relaxation measurements, which are executed on the NMR sample to be studied as part of the microscale NMR screening.
In this paper we have described, the salient aspects of our microscale NMR screening pipeline. Sample production, automatic data setup and acquisition, data analysis and data archiving have been optimized and streamlined. The aim was to obtain the most accurate results in the shortest possible time and in a cost effective manner, while minimizing operator intervention and error. This was made possible by (1) expanding the SPiNE database tools to oversee and coordinate the the microscale NMR screening pipeline, (2) application of state-of-the-art 600 MHz 1.7-mm micro cryoprobe technology adopted to reduce sample requirements and to better utilize limited NMR resources, (3) introduction of bioinformatics (DisMeta and other construct optimization software), and experimental techniques (HDX-MS) to identify and remove disordered N- or C-terminal segments from the protein construct. In its current form, the NESG microscale NMR screening pipeline is a combination of dedicated in-house database development, commercially available hardware, established proteomics techniques, and optimal expert input. The NESG micro NMR screening pipeline has also been used for detergent screening of membrane proteins (Mao et al. 2009). The strategy has proven essential for the success of the NESG consortium NMR structure production, and may provide a useful template for structural biology programs exploring samples and conditions suitable for studies of complex system that may require construct and/or buffer optimization.
We thank A. Eletski, K. Singarapu, Y. Tang and R. Mani for helpful discussions, and for datasets used in the production of this manuscript. This work was supported by the National Institutes of General Medical Science Protein Structure Initiative program, grant U54 GM074958.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Acton TB, Gunsalus KC, Xiao R, Ma LC, Aramini J, Baran MC, Chiang YW, Climent T, Cooper B, Denissova NG, Douglas SM, Everett JK, Ho CK, Macapagal D, Rajan PK, Shastry R, Shih LY, Swapna GV, Wilson M, Wu M, Gerstein M, Inouye M, Hunt JF, Montelione GT (2005) Robotic cloning and protein production platform of the northeast structural genomics consortium. Methods Enzymol 394:210–243CrossRefGoogle Scholar
- Bertone P, Kluger Y, Lan N, Zheng D, Christendat D, Yee A, Edwards AM, Arrowsmith CH, Montelione GT, Gerstein M (2001) SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics. Nucleic Acids Res 29:2884–2898CrossRefGoogle Scholar
- Cheng J, Sweredoski MJ, Baldi P (2005) Accurate prediction of protein disordered regions by mining protein structure data. Data Mining and Knowledge Discovery 11Google Scholar
- MacCallum RM (2006) Order/disorder prediction with self organising mapsGoogle Scholar
- Mao L, Tang Y, Vaiphei T, Shimazu T, Kim SG, Mani REW, Montelione GT, Inouye M (2009) Production of membrane proteins for NMR studies without purification. J Struc Funct Genom. doi: 10.1007/s10969-009-9072-0
- Markley JL, Bax A, Arata Y, Hilbers CW, Kaptein R, Sykes BD, Wright PE, Wuthrich K (1998) Recommendations for the presentation of NMR structures of proteins and nucleic acids. IUPAC-IUBMB-IUPAB inter-union task group on the standardization of data bases of protein and nucleic acid structures determined by NMR spectroscopy. J Biomol NMR 12:1–23CrossRefGoogle Scholar
- Rossi P, Aramini JM, Xiao R, Chen CX, Nwosu C, Owens LA, Maglaqui M, Nair R, Fischer M, Acton TB, Honig B, Rost B, Montelione GT (2009) Structural elucidation of the Cys-His-Glu-Asn proteolytic relay in the secreted CHAP domain enzyme from the human pathogen Staphylococcus saprophyticus. Proteins 74:515–519CrossRefGoogle Scholar
- Sharma S, Zheng H, Huang YJ, Ertekin A, Hamuro Y, Rossi P, Tejero R, Acton TB, Xiao R, Jiang M, Zhao L, Ma LC, Swapna GV, Aramini JM, Montelione GT (2009) Construct optimization for protein NMR structure analysis using amide hydrogen/deuterium exchange mass spectrometry. Proteins 76:882–894CrossRefGoogle Scholar
- Tartoff KD, Hobbs CA (1987) Bethesda research laboratory focus, vol 9, p 12Google Scholar