Abstract
Magic-angle spinning solid-state NMR (MAS SSNMR) represents a fast developing experimental technique with great potential to provide structural and dynamics information for proteins not amenable to other methods. However, few automated analysis tools are currently available for MAS SSNMR. We present a methodology for automating protein resonance assignments of MAS SSNMR spectral data and its application to experimental peak lists of the β1 immunoglobulin binding domain of protein G (GB1) derived from a uniformly 13C- and 15N-labeled sample. This application to the 56 amino acid GB1 produced an overall 84.1% assignment of the N, CO, CA, and CB resonances with no errors using peak lists from NCACX 3D, CANcoCA 3D, and CANCOCX 4D experiments. This proof of concept demonstrates the tractability of this problem.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Magic-angle spinning solid-state NMR (MAS SSNMR) represents a fast developing experimental method with great potential to provide structural and dynamics information for proteins not amenable to solution NMR nor X-ray crystallography. Many technical aspects of MAS SSNMR are rapidly developing, among them: (i) improvements in nano/microcrystalline and membrane protein sample preparation (Frericks et al. 2006; Li et al. 2007; Lorch et al. 2005) (ii) improvements in commercially available hardware, and (iii) development of pulse sequences for new and improved experiments (Sun et al. 1997; Li et al. 2007; Franks et al. 2007; Zhong et al. 2007; Hong 1999; Bockmann et al. 2003, Rienstra et al. 2000; Pauli et al. 2001; Igumenova et al. 2004; Astrof et al. 2001). In many cases, adaptation of tools and techniques from solution NMR have fueled this rapid development. However, the development of analysis software for MAS SSNMR lags far behind. In particular, more sophisticated automated protein resonance assignment programs for solution NMR cannot be directly used on SSNMR data lacking hydrogen resonances. This is because leading protein resonance assignment programs (Zimmerman et al. 1997; Leutner et al. 1998; Atreya et al. 2000; Bartels et al. 1996, 1997, 2004; Moseley et al. 2001; Moseley and Montelione 1999; Moseley et al. 2004; Huang et al. 2005; Coggins and Zhou 2003; Jung and Zweckstetter 2004; Eghbalnia et al. 2005; Hyberts and Wagner; 2003) are hard wired with an amide 15N-1H double resonance spin system root definition (Fig. 1) and require hydrogen-based experiments. To address this deficiency, we present a methodology for automating protein resonance assignments of MAS SSNMR spectral data and its practical application to an experimental peak list dataset of β1 immunoglobulin binding domain of protein G (GB1) as a proof of concept. Our goals are: (i) to eventually provide the necessary software tools to automate the MAS SSNMR protein resonance assignment process (ii) to improve the quality of this analysis, and (iii) to make this analysis more objective and reproducible.
Standard dipeptide spin system definitions for sequential protein resonance assignments in solution and solid state NMR. Spin system root resonances are in red. The solid red box indicates that the root resonances are found in all standard experiments used in dipeptide spin system assembly. The dashed red boxes indicate pairs of root resonances are found in only a subset of the experiments used in dipeptide spin system assembly
Figure 2 shows the protein resonance assignment problem represented as a bipartite graph. This assignment problem is essentially the same for both solution and solid-state NMR (Tycko 1996; Hong 1999) and involves seven basic steps to effectively solve it (Table 1). But one of the critical differences between solution and solid-state NMR is the root resonances used to group peaks into spin systems. These resonances are dictated by the set of NMR experiments (i.e., experimental strategy) used to solve this assignment problem. As shown in Fig. 1, common MAS SSNMR protein resonance assignment strategies use a partial triple resonance spin system root definition (Pauli et al. 2001; Igumenova et al. 2004; Franks et al. 2005; Balayssac et al. 2007; Hong 1999; Sperling et al. 2010), since not all three resonances may be present within each experiment in a given strategy. MAS SSNMR experimental strategies naturally group into three categories of assignment strategies (Table 2). In category I, two sets of experiments containing either Ni-C’i-1 or Ni-Cαi root resonances are combined into complete dipeptide spin systems using the single common amide nitrogen root resonance. In categories IIa and IIb, experiments containing either Ni-C’i-1 or Ni-Cαi root resonances are combined into complete dipeptide spin systems using two common root resonances. In category III, the listed 4D experiments contain all three root resonances, which represent a complete triple resonance spin system root definition. Labs have published assignment results using category I strategies, but only on small proteins (Hong 1999; Pauli et al. 2001; Igumenova et al. 2004; Franks et al. 2005; Balayssac et al. 2007). Labs are starting to use category II strategies for larger proteins (Frericks et al. 2006; Li et al. 2007; Li et al. 2008). It is expected that labs in the future will probably explore category III strategies using newer G-matrix Fourier transformation (GFT) experiments(Szyperski et al. 1993a; Szyperski et al. 1993b; Kim and Szyperski 2003; Kim and Szyperski 2004; Astrof et al. 2001; Luca and Baldus 2002). Moreover, category II and III strategies have strengths that could make them better for automation than even solution NMR strategies. First, the chemical shift dispersion in Euclidean space of Ni-Cαi, and especially C′i−1-Ni-Cαi root resonance tuples is significantly greater than for Ni-Hi root resonance tuples. Said another way, Ni-Cαi pairs of chemical shifts for a folded protein plotted on a 2D graph as small circles with radius representing the uncertainty in their chemical shift values will show less dense clumps (i.e. less overlapping of circles) than Ni-Hi pairs of chemical shifts plotted in a similar way. This helps prevent the non-unique grouping of peaks into spin systems, which severely complicates resonance assignments. Second, category IIa and IIb strategies can be combined into a single strategy represented as a merged double bipartite graph. This representation may lead to the development of superior grouping and linking algorithms.
However, MAS SSNMR spectra, especially of membrane proteins, often lack significant numbers of resonances at a given experimental condition (Andronesi et al. 2005; Li et al. 2007), which can especially confuse both global optimization and exhaustive search mapping algorithms. But spectroscopists are finding clever ways to optimize their experiments for higher sensitivity. For instance, dropping the temperature below 0°C can improve signal intensity several-fold (Kloepper et al. 2007). Moreover, experiments can be collected under multiple conditions to improve detection of all resonances. Another historical problem in SSNMR experiments is large spectral line widths, which increase spectral crowding and peak overlap. However, improvements in magic-angle spinning techniques, pulse sequences, and micro/nano crystalline sample preparations are greatly reducing observed line widths into the sub-ppm range (Franks et al. 2005; Pauli et al. 2000, McDermott et al. 2000; Martin and Zilm 2003). For example, a recent MAS SSNMR resonance assignment of 20 kDa membrane protein DsbB had average 15N and 13C line widths of 0.7 and 0.5 ppm, respectively (Li et al. 2007, 2008). Furthermore, several labs have recently developed and used 3D and 4D experiments to reduce peak overlap in spectra of membrane proteins (Zhong et al. 2007; Kijac et al. 2007; Li et al. 2007, 2008; Frericks et al. 2006; Franks et al. 2007).
Materials and methods
We have implemented a prototype of alignment, grouping, and typing algorithms and combined them with the linking and mapping algorithms from the solution NMR assignment package AutoAssign (Moseley et al. 2001; Moseley and Montelione 1999; Moseley et al. 2004; Baran et al. 2004; Huang et al. 2005; Zimmerman et al. 1997) to provide a proof of concept. The alignment algorithm constructs and compares Euclidean distance matrices for “input” and “root” peak lists and is similar to the point pattern match algorithm pioneered by Ranade and Rosenfeld (Ranade and Rosenfeld 1980) and improved later for use in landstat image registration (Ton and Jain 1989). We have three improvements over their algorithm: (i) the use of the Jaccard coefficient (i.e. set union divided by set intersection) in place of a simple support list count as the robustness score; (ii) the multiplication of the Jaccard coefficient by the probability of a support pair’s registration; and (iii) the use of a weighted standard deviation of registration in deriving support tolerances. The latter two improvements convert the algorithm into a stationary iterative method. The algorithm is optimized to a computational complexity of O(mn2logn) where m and n represent the lengths of the root and input peak lists, respectively. But we see a clear path to improve the computational complexity to O(mn2). This alignment algorithm provides: (i) the best mapping of peaks from an “input” peak list to peaks in a “root” peak list for their comparable spectral dimensions; (ii) the registration needed to translate the input peak list to the root peak list in their comparable dimensions; and (iii) the standard deviation of this registration, which is needed to calculate match tolerances. While the alignment step is the most computationally intensive step, it only has to be performed once and provides the first set of major quality control measures for the given dataset.
The next step involves grouping of peaks into dipeptide spin systems using root resonances that all the peaks in the spin system have in common. Each dipeptide spin system is composed of intra-residue resonances and sequential-residue resonances organized as ladders. Our grouping algorithm uses a new bottom-up approach to dipeptide spin system grouping in contrast to the common top-down algorithms that use a single root spectrum as seeds for spin system creation. In this grouping algorithm, peak list-based and ladder-based groupings are done first before building the dipeptide spin systems. Peaks from a single spectrum are more self-consistent in their values than peaks between spectra. The new algorithm can use narrower tolerances to group peaks within a spectrum first and then average the root resonances of these intra-spectra peaks to improve their standard error. The same logic is applied to groups of peaks in the same ladder. The number of complete spin systems derived from the grouping algorithm provides the second major quality control measure for the given dataset.
For the typing algorithm, we introduce the concept of a chemical shift tuple or ordered list of chemical shifts that have some support for being in the same ladder or dipeptide spin system. Using a heuristic, the algorithm constructs a set of possible carbon chemical shift tuples to calculate Bayesian typing probabilities. Doing so minimizes the deleterious effects of resonance misclassification, which can arise from a multitude of situations including overlapped spin systems, noise peaks, and missing peaks. Furthermore, we can constrain tuple creation using 4D information from category III experiments (Table 2) and bottom-up grouping. However, the probability densities are no longer comparable in this Bayesian statistical framework because the probability density function changes with the number of carbon chemical shifts or independent variables used. This variation in the number of independent variables across the 20 residue types requires the use of chi-square probabilities, or p-values of a chi-square statistic, instead of probability densities. In the future, we can use the tuple concept to improve the linking and mapping algorithms.
Results and discussion
Currently, our implementation handles only a limited set of experimental peak lists which includes: (i) NCACX 3D (with 35ms DARR mixing) (ii) CANcoCA 3D, and (iii) CANCOCX 4D (Franks et al. 2005; Franks et al. 2007). These peak lists represent a category IIb assignment strategy (Table 2) which uses a Ni-Cαi root to create dipeptide spin systems. The implementation takes these peak lists, aligns them, groups peaks into dipeptide spin systems in a bottom-up strategy, and then types each ladder to probable amino acids using the carbon shift tuples. The implementation then simulates a set of Ni-Hi rooted peak lists for AutoAssign with an artificial HN shift equal to the observed CA shift divided by 6 (HN = CA/6). This creation of artificial HN shifts is necessary because AutoAssign requires Ni-Hi rooted peak lists. We then use AutoAssign to perform the linking and mapping steps. From this, we have an overall 84.1% assignment of the N, CO, CA, and CB resonances with no errors (Fig. 3), as compared to manually determined and verified assignments (BMRB entry 15156). These results demonstrate the feasibility of automating protein resonance assignments of MAS SSNMR spectral data. They are easily reproduced by the software and lack significant human subjectivity in the grouping and typing of spin systems. Also, the input peak lists are not perfect either, representing realistic peak lists that a spectroscopist used for manual assignment. There are only matching peaks to form 52 out of 56 dipeptide spin systems and some CB peaks are simply missing. Since the CANCOCX experiment is a 4D experiment, the resolution of the CA dimension is very low, causing a matching standard deviation of ~0.5 ppm when aligned to the other two peak lists. But our implementation handled the missing information and resolution issues and assigned 43 out of 52 dipeptide spin systems. There are three main reasons for these results: (i) better dispersion with a Ni-Cαi root; (ii) an improved bottom-up grouping algorithm that especially allows CANCOCX peaks to group around a common C’i-1-Ni-Cαi root before grouping with peaks from other peak lists; and (iii) improved amino acid typing algorithms that shrank the average “possible residue type list” to 5.7 residues with 0.9999 confidence (normally ~8 residues with Cα/Cβ typing). We expect even better results once improved linking and mapping algorithms are implemented, allowing the development of software that will improve the quality of analysis over manual assignment alone. This software is available at http://bioinformatics.chem.louisville.edu.
References:
Andronesi OC, Becker S, Seidel K, Heise H, Young HS, Baldus M (2005) Determination of membrane protein structure and dynamics by magic-angle-spinning solid-state NMR spectroscopy. J Am Chem Soc 127:12965–12974
Astrof NS, Lyon CE, Griffin RG (2001) Triple resonance solid state NMR experiments with reduced dimensionality evolution periods. J Magn Reson 152:303–307
Atreya HS, Sahu SC, Chary KV, Govil G (2000) A tracked approach for automated NMR assignments in proteins (TATAPRO). J Biomol NMR 17:125–136
Balayssac S, Bertini I, Falber K, Fragai M, Jehle S, Lelli M, Luchinat C, Oschkinat H, Yeo KJ (2007) Solid-state NMR of matrix metalloproteinase 12: an approach complementary to solution NMR. Chembiochem 8:486–489
Baran MC, Huang YJ, Moseley HN, Montelione GT (2004) Automated analysis of protein NMR assignments and structures. Chem Rev 104:3541–3556
Bartels C, Billeter M, Güntert P, Wüthrich K (1996) Automated sequence-specific NMR assignment of homologous proteins using the program GARANT. J Biomol NMR 7:207–213
Bartels C, Güntert P, Billeter M, Wüthrich K (1997) GARANT—a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J Comp Chem 18:139–149
Bockmann A, Lange A, Galinier A, Luca S, Giraud N, Juy M, Heise H, Montserret R, Penin F, Baldus M (2003) Solid state NMR sequential resonance assignments and conformational analysis of the 2 × 10.4 kDa dimeric form of the Bacillus subtilis protein Crh. J Biomol NMR 27:323–339
Coggins BE, Zhou P (2003) PACES: protein sequential assignment by computer-assisted exhaustive search. J Biomol NMR 26:93–111
Eghbalnia HR, Bahrami A, Wang L, Assadi A, Markley JL (2005) Probabilistic identification of spin systems and their assignments including coil-helix inference as output (PISTACHIO). J Biomol NMR 32:219–233
Franks WT, Zhou DH, Wylie BJ, Money BG, Graesser DT, Frericks HL, Sahota G, Rienstra CM (2005) Magic-angle spinning solid-state NMR spectroscopy of the beta1 immunoglobulin binding domain of protein G (GB1): 15N and 13C chemical shift assignments and conformational analysis. J Am Chem Soc 127:12291–12305
Franks W, Kloepper K, Wylie B, Rienstra C (2007) Four-dimensional heteronuclear correlation experiments for chemical shift assignment of solid proteins. J Biomol NMR 39:107–131
Frericks HL, Zhou DH, Yap LL, Gennis RB, Rienstra CM (2006) Magic-angle spinning solid-state NMR of a 144 kDa membrane protein complex: E. coli cytochrome bo3 oxidase. J Biomol NMR 36:55–71
Hong M (1999) Resonance assignment of 13C/15N labeled solid proteins by two- and three-dimensional magic-angle-spinning NMR. J Biomol NMR 15:1–14
Huang YJ, Moseley HN, Baran MC, Arrowsmith C, Powers R, Tejero R, Szyperski T, Montelione GT (2005) An integrated platform for automated analysis of protein NMR structures. Methods Enzymol 394:111–141
Hyberts SG, Wagner G (2003) IBIS–a tool for automated sequential assignment of protein spectra from triple resonance experiments. J Biomol NMR 26:335–344
Igumenova TI, Wand AJ, McDermott AE (2004) Assignment of the backbone resonances for microcrystalline ubiquitin. J Am Chem Soc 126:5323–5331
Jung YS, Zweckstetter M (2004) Mars—robust automatic backbone assignment of proteins. J Biomol NMR 30:11–23
Kijac AZ, Li Y, Sligar SG, Rienstra CM (2007) Magic-angle spinning solid-state NMR spectroscopy of nano disc-embedded human CYP3A4. Biochemistry 46:13696–13703
Kim S, Szyperski T (2003) GFT NMR, a new approach to rapidly obtain precise high-dimensional NMR spectral information. J Am Chem Soc 125:1385–1393
Kim S, Szyperski T (2004) GFT NMR experiments for polypeptide backbone and 13Cbeta chemical shift assignment. J Biomol NMR 28:117–130
Kloepper K, Zhou D, Li Y, Winter K, George J, Rienstra C (2007) Temperature-dependent sensitivity enhancement of solid-state NMR spectra of a-synuclein fibrils. J Biomol NMR 39:197–211
Leutner M, Gschwind RM, Liermann J, Schwarz C, Gemmecker G, Kessler H (1998) Automated backbone assignment of labeled proteins using the threshold accepting algorithm. J Biomol NMR 11:31–43
Li Y, Berthold DA, Frericks HL, Gennis RB, Rienstra CM (2007) Partial (13)C and (15)N chemical-shift assignments of the disulfide-bond-forming enzyme DsbB by 3D magic-angle spinning NMR spectroscopy. Chembiochem 8:434–442
Li Y, Berthold D, Gennis R, Rienstra C (2008) Chemical shift assignment of the transmembrane helices of DsbB, a 20 kDa integral membrane enzyme, by 3D magic-angle spinning NMR spectroscopy. Protein Sci 17:199
Lorch M, Fahem S, Kaiser C, Weber I, Mason AJ, Bowie JU, Glaubitz C (2005) How to prepare membrane proteins for solid-state NMR: A case study on the alpha-helical integral membrane protein diacylglycerol kinase from E. coli. Chembiochem 6:1693–1700
Luca S, Baldus M (2002) Enhanced spectral resolution in immobilized peptides and proteins by combining chemical shift sum and difference spectroscopy. J Magn Reson 159:243–249
Martin RW, Zilm KW (2003) Preparation of protein nanocrystals and their characterization by solid state NMR. J Magn Reson 165:162–174
McDermott A, Polenova T, Bockmann A, Zilm K, Paulsen E, Martin R, Montelione G (2000) Partial NMR assignments for uniformly (13C, 15N)-enriched BPTI in the solid state. J Biomol NMR 16:209–219
Moseley HN, Montelione GT (1999) Automated analysis of NMR assignments and structures for proteins. Curr Opin Struct Biol 9:635–642
Moseley HN, Monleon D, Montelione GT (2001) Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. Methods Enzymol 339:91–108
Moseley HN, Sahota G, Montelione GT (2004) Assignment validation software suite for the evaluation and presentation of protein resonance assignment data. J Biomol NMR 28:341–355
Pauli J, van Rossum B, Förster H, de Groot H, Oschkinat H (2000) Sample optimization and identification of signal patterns of amino acid side chains in 2D RFDR spectra of the a-spectrin SH3 domain. J Magn Reson 143:411–416
Pauli J, Baldus M, van Rossum B, de Groot H, Oschkinat H (2001) Backbone and side-chain 13C and 15N signal assignments of the alpha-spectrin SH3 domain by magic angle spinning solid-state NMR at 17.6 Tesla. Chembiochem 2:272–281
Ranade S, Rosenfeld A (1980) Point pattern matching by relaxation. Pattern Recogn 12:269–275
Rienstra CM, Hohwy M, Hong M, Griffin RG (2000) 2D and 3D 15 N-13 C-13 C NMR chemical shift correlation spectroscopy of solids: assignment of MAS spectra of peptides. J Am Chem Soc 122:10979–10990
Sperling LJ, Berthold DA, Sasser TL, Jeisy-Scott V, Rienstra CM (2010) Assignment strategies for large proteins by magic-angle spinning NMR: the 21 kDa disulfide-bond-forming enzyme DsbA. J Mol Biol 399:268–282
Sun BQ, Rienstra CM, Costa PR, Williamson JR, Griffin RG (1997) 3D 15 N–13 C–13 C chemical shift correlation spectroscopy in rotating solids. J Am Chem Soc 119:8540–8546
Szyperski T, Wider G, Bushweller JH, Wuethrich K (1993a) Reduced dimensionality in triple-resonance NMR experiments. J Am Chem Soc 115:9307–9308
Szyperski T, Wider G, Bushweller JH, Wuthrich K (1993b) 3D 13C–15N-heteronuclear two-spin coherence spectroscopy for polypeptide backbone assignments in 13C–15N-double-labeled proteins. J Biomol NMR 3:127–132
Ton J, Jain AK (1989) Registering landsat images by point matching. IEEE Trans Geosci Remote Sens 27:642–651
Tycko R (1996) Prospects for resonance assignments in multidimensional solid-state NMR spectra of uniformly labeled proteins. J Biomol NMR 8:239–251
Zhong L, Bamm V, Ahmed M, Harauz G, Ladizhansky V (2007) Solid-state NMR spectroscopy of 18.5 kDa myelin basic protein reconstituted with lipid vesicles: spectroscopic characterisation and spectral assignments of solvent-exposed protein fragments. BBA-Biomembranes 1768:3193–3205
Zimmerman DE, Kulikowski CA, Huang Y, Feng W, Tashiro M, Shimotakahara S, Chien C, Powers R, Montelione GT (1997) Automated analysis of protein NMR assignments using methods from artificial intelligence. J Mol Biol 269:592–610
Acknowledgments
We would like to acknowledge Dr. David A. Snyder’s help during the evolution of the peak list alignment algorithm. This work was supported in part by DOE DE-EM0000197 (to H.M.) and NIH R01-GM075937 (to C.M.R.).
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Moseley, H.N.B., Sperling, L.J. & Rienstra, C.M. Automated protein resonance assignments of magic angle spinning solid-state NMR spectra of β1 immunoglobulin binding domain of protein G (GB1). J Biomol NMR 48, 123–128 (2010). https://doi.org/10.1007/s10858-010-9448-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10858-010-9448-2