Automated protein resonance assignments of magic angle spinning solid-state NMR spectra of β1 immunoglobulin binding domain of protein G (GB1)
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s10858-010-9448-2
- Cite this article as:
- Moseley, H.N.B., Sperling, L.J. & Rienstra, C.M. J Biomol NMR (2010) 48: 123. doi:10.1007/s10858-010-9448-2
Abstract
Magic-angle spinning solid-state NMR (MAS SSNMR) represents a fast developing experimental technique with great potential to provide structural and dynamics information for proteins not amenable to other methods. However, few automated analysis tools are currently available for MAS SSNMR. We present a methodology for automating protein resonance assignments of MAS SSNMR spectral data and its application to experimental peak lists of the β1 immunoglobulin binding domain of protein G (GB1) derived from a uniformly ^{13}C- and ^{15}N-labeled sample. This application to the 56 amino acid GB1 produced an overall 84.1% assignment of the N, CO, CA, and CB resonances with no errors using peak lists from NCACX 3D, CANcoCA 3D, and CANCOCX 4D experiments. This proof of concept demonstrates the tractability of this problem.
Keywords
Automated resonance assignments Magic angle spinning Solid-state Protein GB1Introduction
Protein resonance assignment process
Step |
---|
1. Peak list registration |
2. Peak list quality assessment |
3. Spin system grouping |
4. Amino acid typing |
5. Linking |
6. Mapping |
7. Resonance assignment quality assessment |
MAS SSNMR experimental strategies for protein resonance assignment
Category I | Category IIa | Category IIb | Category III |
---|---|---|---|
Cα_{i}- N_{i}-C′_{i−1}^{a,b} | C′_{i−1}-N_{i}-Cα_{i}^{a,b} | Cα_{i}-N_{i}-C′_{i−1}^{a,b} | Cα_{i}-N_{i}-C′_{i−1}-CX_{i-1}^{c} |
N_{i}-Cα_{i}-CX_{i}^{d,e,f} | C′_{i−1}-N_{i}-(Cα_{i})-CX_{i}^{g} | Cα_{i}-N_{i}-(C′_{i−1})-CX_{i−1}^{c} | C′_{i-1}-N_{i}-Cα_{i}-CX_{i} |
N_{i}-C′_{i−1}-CX_{i−1}^{d,e,h} | N_{i}–C′_{i−1}–CX_{i−1}^{d,e,h} | N_{i}-Cα_{i}-CX_{i}^{d,e,f,h} | Cα_{i}-N_{i}-C′_{i−1}-Cα_{i−1}^{c} |
N_{i}-C′_{i−1}-Cα_{i−1}^{f,i} | N_{i}-C′_{i−1}-Cα_{i−1}^{f,i} | N_{i}-Cα_{i}-Cα_{i}Cβ_{i}^{i,j} | C′_{i−1}-N_{i}-Cα_{i}-Cβ_{i} |
N_{i}-Cα_{i}-Cα_{i}Cβ_{i}^{i,j} | C′_{i−1}-N_{i}-(Cα_{i})-Cβ_{i} | N_{i}-Cα_{i}-Cβ_{i}^{b} | C′_{i−1}-N_{i}-Cα_{i}-Cα_{i}Cβ_{i} |
N_{i}-Cα_{i}-Cβ_{i}^{b} | C′_{i−1}-N_{i}-(Cα_{i})-C′_{i} | N_{i}-Cα_{i}-C′_{I}^{f} | C′_{i−1}-N_{i}-Cα_{i}-C′_{i} |
N_{i}-C′_{i−1}-(Cα_{i-1})-Cα_{i−1}Cβ_{i−1}^{j} | N_{i}-C′_{i-1}-(Cα_{i-1})-Cα_{i−1}Cβ_{i−1}^{j} | Cα_{i}-N_{i}-(C′_{i−1})-Cα_{i-1} | |
N_{i}-C′_{i−1}-(Cα_{i−1})-Cβ_{i−1} | N_{i}-C′_{i−1}-(Cα_{i-1})-Cβ_{i−1} | ||
N_{i}-Cα_{i}-C′_{I}^{f} |
However, MAS SSNMR spectra, especially of membrane proteins, often lack significant numbers of resonances at a given experimental condition (Andronesi et al. 2005; Li et al. 2007), which can especially confuse both global optimization and exhaustive search mapping algorithms. But spectroscopists are finding clever ways to optimize their experiments for higher sensitivity. For instance, dropping the temperature below 0°C can improve signal intensity several-fold (Kloepper et al. 2007). Moreover, experiments can be collected under multiple conditions to improve detection of all resonances. Another historical problem in SSNMR experiments is large spectral line widths, which increase spectral crowding and peak overlap. However, improvements in magic-angle spinning techniques, pulse sequences, and micro/nano crystalline sample preparations are greatly reducing observed line widths into the sub-ppm range (Franks et al. 2005; Pauli et al. 2000, McDermott et al. 2000; Martin and Zilm 2003). For example, a recent MAS SSNMR resonance assignment of 20 kDa membrane protein DsbB had average ^{15}N and ^{13}C line widths of 0.7 and 0.5 ppm, respectively (Li et al. 2007, 2008). Furthermore, several labs have recently developed and used 3D and 4D experiments to reduce peak overlap in spectra of membrane proteins (Zhong et al. 2007; Kijac et al. 2007; Li et al. 2007, 2008; Frericks et al. 2006; Franks et al. 2007).
Materials and methods
We have implemented a prototype of alignment, grouping, and typing algorithms and combined them with the linking and mapping algorithms from the solution NMR assignment package AutoAssign (Moseley et al. 2001; Moseley and Montelione 1999; Moseley et al. 2004; Baran et al. 2004; Huang et al. 2005; Zimmerman et al. 1997) to provide a proof of concept. The alignment algorithm constructs and compares Euclidean distance matrices for “input” and “root” peak lists and is similar to the point pattern match algorithm pioneered by Ranade and Rosenfeld (Ranade and Rosenfeld 1980) and improved later for use in landstat image registration (Ton and Jain 1989). We have three improvements over their algorithm: (i) the use of the Jaccard coefficient (i.e. set union divided by set intersection) in place of a simple support list count as the robustness score; (ii) the multiplication of the Jaccard coefficient by the probability of a support pair’s registration; and (iii) the use of a weighted standard deviation of registration in deriving support tolerances. The latter two improvements convert the algorithm into a stationary iterative method. The algorithm is optimized to a computational complexity of O(mn^{2}logn) where m and n represent the lengths of the root and input peak lists, respectively. But we see a clear path to improve the computational complexity to O(mn^{2}). This alignment algorithm provides: (i) the best mapping of peaks from an “input” peak list to peaks in a “root” peak list for their comparable spectral dimensions; (ii) the registration needed to translate the input peak list to the root peak list in their comparable dimensions; and (iii) the standard deviation of this registration, which is needed to calculate match tolerances. While the alignment step is the most computationally intensive step, it only has to be performed once and provides the first set of major quality control measures for the given dataset.
The next step involves grouping of peaks into dipeptide spin systems using root resonances that all the peaks in the spin system have in common. Each dipeptide spin system is composed of intra-residue resonances and sequential-residue resonances organized as ladders. Our grouping algorithm uses a new bottom-up approach to dipeptide spin system grouping in contrast to the common top-down algorithms that use a single root spectrum as seeds for spin system creation. In this grouping algorithm, peak list-based and ladder-based groupings are done first before building the dipeptide spin systems. Peaks from a single spectrum are more self-consistent in their values than peaks between spectra. The new algorithm can use narrower tolerances to group peaks within a spectrum first and then average the root resonances of these intra-spectra peaks to improve their standard error. The same logic is applied to groups of peaks in the same ladder. The number of complete spin systems derived from the grouping algorithm provides the second major quality control measure for the given dataset.
For the typing algorithm, we introduce the concept of a chemical shift tuple or ordered list of chemical shifts that have some support for being in the same ladder or dipeptide spin system. Using a heuristic, the algorithm constructs a set of possible carbon chemical shift tuples to calculate Bayesian typing probabilities. Doing so minimizes the deleterious effects of resonance misclassification, which can arise from a multitude of situations including overlapped spin systems, noise peaks, and missing peaks. Furthermore, we can constrain tuple creation using 4D information from category III experiments (Table 2) and bottom-up grouping. However, the probability densities are no longer comparable in this Bayesian statistical framework because the probability density function changes with the number of carbon chemical shifts or independent variables used. This variation in the number of independent variables across the 20 residue types requires the use of chi-square probabilities, or p-values of a chi-square statistic, instead of probability densities. In the future, we can use the tuple concept to improve the linking and mapping algorithms.
Results and discussion
Currently, our implementation handles only a limited set of experimental peak lists which includes: (i) NCACX 3D (with 35ms DARR mixing) (ii) CANcoCA 3D, and (iii) CANCOCX 4D (Franks et al. 2005; Franks et al. 2007). These peak lists represent a category IIb assignment strategy (Table 2) which uses a N_{i}-Cα_{i} root to create dipeptide spin systems. The implementation takes these peak lists, aligns them, groups peaks into dipeptide spin systems in a bottom-up strategy, and then types each ladder to probable amino acids using the carbon shift tuples. The implementation then simulates a set of N_{i}-H_{i} rooted peak lists for AutoAssign with an artificial H^{N} shift equal to the observed CA shift divided by 6 (H^{N} = CA/6). This creation of artificial H^{N} shifts is necessary because AutoAssign requires N_{i}-H_{i} rooted peak lists. We then use AutoAssign to perform the linking and mapping steps. From this, we have an overall 84.1% assignment of the N, CO, CA, and CB resonances with no errors (Fig. 3), as compared to manually determined and verified assignments (BMRB entry 15156). These results demonstrate the feasibility of automating protein resonance assignments of MAS SSNMR spectral data. They are easily reproduced by the software and lack significant human subjectivity in the grouping and typing of spin systems. Also, the input peak lists are not perfect either, representing realistic peak lists that a spectroscopist used for manual assignment. There are only matching peaks to form 52 out of 56 dipeptide spin systems and some CB peaks are simply missing. Since the CANCOCX experiment is a 4D experiment, the resolution of the CA dimension is very low, causing a matching standard deviation of ~0.5 ppm when aligned to the other two peak lists. But our implementation handled the missing information and resolution issues and assigned 43 out of 52 dipeptide spin systems. There are three main reasons for these results: (i) better dispersion with a N_{i}-Cα_{i} root; (ii) an improved bottom-up grouping algorithm that especially allows CANCOCX peaks to group around a common C’_{i-1}-N_{i}-Cα_{i} root before grouping with peaks from other peak lists; and (iii) improved amino acid typing algorithms that shrank the average “possible residue type list” to 5.7 residues with 0.9999 confidence (normally ~8 residues with Cα/Cβ typing). We expect even better results once improved linking and mapping algorithms are implemented, allowing the development of software that will improve the quality of analysis over manual assignment alone. This software is available at http://bioinformatics.chem.louisville.edu.
Acknowledgments
We would like to acknowledge Dr. David A. Snyder’s help during the evolution of the peak list alignment algorithm. This work was supported in part by DOE DE-EM0000197 (to H.M.) and NIH R01-GM075937 (to C.M.R.).
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.