A software framework for analysing solid-state MAS NMR data
Solid-state magic-angle-spinning (MAS) NMR of proteins has undergone many rapid methodological developments in recent years, enabling detailed studies of protein structure, function and dynamics. Software development, however, has not kept pace with these advances and data analysis is mostly performed using tools developed for solution NMR which do not directly address solid-state specific issues. Here we present additions to the CcpNmr Analysis software package which enable easier identification of spinning side bands, straightforward analysis of double quantum spectra, automatic consideration of non-uniform labelling schemes, as well as extension of other existing features to the needs of solid-state MAS data. To underpin this, we have updated and extended the CCPN data model and experiment descriptions to include transfer types and nomenclature appropriate for solid-state NMR experiments, as well as a set of experiment prototypes covering the experiments commonly employed by solid-sate MAS protein NMR spectroscopists. This work not only improves solid-state MAS NMR data analysis but provides a platform for anyone who uses the CCPN data model for programming, data transfer, or data archival involving solid-state MAS NMR data.
KeywordsSolid-state MAS NMR CCPN Experiment nomenclature Software Data model
During the past decade solid-state magic-angle-spinning (MAS) NMR has developed into a useful tool for the study of protein structure and dynamics. With no intrinsic size limit on the structure studied and no requirement for long range order, it forms a highly complementary technique to solution NMR and X-ray crystallography. It has enabled structure determination in cases which are difficult to tackle by other methods, such as amyloid fibrils (Jaroniec et al.2004; Petkova et al.2004; Ferguson et al.2006; Wasmer et al.2008), dynamic oligomeric protein complexes (Jehle et al.2010) as well as membrane proteins in their native lipid environment (Andronesi et al.2005; Cady et al.2010). So far, data analysis and structure calculations have mainly been carried out using software tools developed for solution NMR. These include spectrum visualisation programs used for resonance and distance restraint assignments (Goddard and Kneller; Johnson and Blevins 1994), structure calculation programs (Brünger 1992; Güntert et al.1997; Brünger et al.1998; Schwieters et al.2003; Rieping et al.2007) as well as tools to extract torsion angle restraints from chemical shifts (Shen et al.2009; Cheung et al.2010) and to validate structures (Hooft et al.1996; Bhattacharya et al.2007) (http://nmr.cmbi.ru.nl/cing/). Over the years, increasingly sophisticated software tools have become available to solution NMR spectroscopists which enable (semi-) automated data analysis and significantly speed up the overall process of protein structure determination (Zimmerman et al.1997; Herrmann et al.2002; Jung and Zweckstetter 2004; Huang et al. 2005; Rieping et al.2007; Lemak et al.2011). The Collaborative Computation Project for the NMR community (CCPN) has the aim of streamlining these processes even further. To this end a data model describing all the information used during NMR analysis and in related disciplines was developed (Vranken et al.2005). Based on this data model, the CCPN group has further developed a spectrum analysis program (CcpNmr Analysis) and a format converter. Importantly, the data model forms an informatics system which other programs can build on and which facilitates data storage and exchange: it enables movement from one software package to another without encountering difficulties with data conversion or loss of information. Initiatives to connect a wide range of NMR software packages with the CCPN data model have already interfaced several programs (www.extend-nmr.eu) and more are currently ongoing.
To date, few software packages include modules which have been designed specifically with solid-state MAS NMR data analysis in mind (Fossi et al.2005; Loquet et al. 2008; Moseley et al. 2010; Tycko and Hu 2010). However, this is a definite requirement if solid-state NMR spectroscopists are to move towards an efficient, user-friendly and (semi-) automated data analysis pipeline such as that available for solution NMR. Solid-state MAS NMR has several requirements which are not met by the current solution-NMR based software tools. Spectral features such as side bands, the frequent use of experiments with double-quantum chemical shift axes or complicated isotope labelling schemes, such as those obtained when using 1,3-13C- or 2-13C-labelled glycerol as the sole carbon source in the bacterial growth medium, call for further software features to enable the spectral information to be captured efficiently, and for the spectral analysis to be made user-friendly. Finally, solid-state NMR uses different experiments from those used in solution, with a different set of magnetisation/coherence transfer types, different preferences for excitation and measurement, and a separate vocabulary to describe them. Descriptions of these experiments are useful because resonance assignment routines and data filtering facilities require knowledge of magnetisation transfer in the NMR experiments used; while the information has been available for some time within the CCPN data model for solution NMR experiments, solid-state MAS NMR experiments have not been included until now. Here we present extensions to the CCPN data model and CcpNmr Analysis software to address the needs of solid-state MAS NMR spectroscopists. We provide a formal description of solid-state MAS NMR experiment types which concentrates on spectral features and downstream data analysis requirements, rather than on detailed transfer mechanisms. Descriptions of user-specified labelling patterns are introduced into the data model and these can be used as a filter within a variety of operations in CcpNmr Analysis. Furthermore, we have incorporated a double quantum chemical shift axis type into CcpNmr Analysis which enables automatic calculation of chemical shifts in the double quantum dimension and improved visual comparison with other spectra. In addition, a number of smaller features make the program especially user-friendly for solid-state NMR spectroscopists, and many features already available for solution-NMR data analysis have been extended to deal with solid-state MAS NMR data. Importantly, the extensions which describe MAS experiments and isotope labelling schemes in the CCPN data model have not only allowed new features to be introduced to CcpNmr Analysis, but are of benefit to anybody who uses the CCPN data model for programming, data transfer, or data archival involving solid-state MAS NMR data.
Reference data descriptions
Classification of NMR experiment types
The CCPN data model contains a description of NMR experiments at the magnetisation transfer level, with an associated naming convention. An earlier version of the system has been described previously (Fogh et al.2006); this paper serves as a reference for a new and improved version of the naming convention. The data model and naming convention are described in detail in the supplementary material (Figure S1). The description abstracts the key information required for resonance assignment and other downstream data analysis from the physics of the NMR experiment. Thus magnetisation transfer by HSQC, HMQC or SPECIFIC cross-polarisation is uniformly described as ‘one-bond’, and NOESY, ROESY and long mixing time DARR transfer is uniformly described as ‘through-space’ (any deviations from this due to the isotopic labelling pattern used are dealt with downstream). The core of the data model represents the atom sites over which the magnetisation passes, the magnetisation transfers connecting these atoms, and the NMR measurements being recorded (which potentially arise from more than one atom site). The description of the 2D HH J-resolved experiment provides the simplest example: there are two atom sites, H1 and H2, connected by a J coupling magnetisation transfer. Here the NMR parameters being measured are the H1 chemical shift, and the H1H2 J coupling value. NMR experiments descriptions are organised by magnetisation/coherence transfer pathway. For example the HCACONNH experiment is potentially a 5D experiment, although spectroscopists usually acquire this experiment with fewer observed dimensions, e.g. a 4D HCA(CO)NNH, 3D H(CACO)NNH or 3D (H)CA(CO)NNH (where the frequencies of nuclei in brackets are not measured). In the data model the magnetisation transfer for this experiment is described only once in terms of the 5D experiment with five atom sites and the lower dimensional experiments are all described by reference to the top-level description.
The model corresponds to a convention for systematically naming experiments that is a more precise extension of the traditional NMR experiment names. The names reflect the underlying experiment descriptions in the data model and the general magnetisation transfer pathways found in the actual experiments. In brief, the atoms involved in the magnetisation transfer are listed in order; sites whose chemical shifts are not evolved are written in lower case; out-and-back transfers are indicated by square brackets; when the transfer is neither a one-bond transfer nor mediated by J-couplings, an underscore is inserted between the atom names, and the transfer types (e.g. ‘relayed’ or ‘through-space’) are listed in order after a full stop. For instance, a 1H-15N HSQC, an H(CACO)NNH and a 15N-NOESY-HSQC are represented as H[N], HcacoNH and H_H[N].through-space, respectively.
‘onebond’—transfer between directly bonded atoms, regardless of transfer mechanism
‘relayed’—multiple stepwise transfer along chemical bonds (replaces ‘TOCSY’)
‘relayed-alternate’—like ‘relayed’, but peaks from alternating atoms along the bond network have alternating sign
‘through-space’—transfer through space, not limited to chemical bonds (replaces ‘NOESY’)
‘Jcoupling’—J coupling transfer, not limited to a particular number of bonds
‘Jmultibond’—J coupling transfer, excluding one-bond couplings
The default transfer type is generally ‘Jcoupling’ for H–H, H–F, and F–F transfers and ‘onebond’ for all other transfers (regardless of mechanism), but exceptions are made in order to follow normal spectroscopic usage. The HNCA experiment is commonly understood to involve J coupling transfer between the protein backbone N and either Cαi or Cαi-1. To remain consistent with this usage transfer between atoms named N and CA defaults to ‘Jcoupling’. The solid state experiment known as ‘NCA’, which actually involves a SPECIFIC CP transfer between the N and Cα, is therefore given the official name N_CA.onebond (whereas the official name for the ‘NCO’ experiment remains simply NCO). Three of the transfer types are new to CCPN: (1) ‘relayed’ refers to intra-residue transfers and replaces the previous ‘TOCSY’ transfer type. (2) ‘through-space’ refers to any through-space transfer, regardless of the mechanism (e.g. TEDOR, cross polarisation, NOE, coupling through hydrogen bonds etc.) and replaces the previous ‘NOESY’ transfer type. (3) ‘relayed-alternate’ is a transfer type which has been created to deal with double-quantum transfers which produce ‘relayed’ magnetisation transfers, but with the added information that n + 1 and n bond transfers yield cross peaks of alternate sign.
The CCPN experiment types provide input to assignment and validation routines, but neither data model nor assignment software insists on absolute consistency. A few peaks that are not compatible with the description in the experiment type can be accommodated in the assignment process. In problematic cases it is up to the spectroscopist to choose the experiment description that best fits the data, e.g. by replacing a ‘relayed-alternate’ experiment type by a less restrictive ‘relayed’ or even ‘through-space’ alternative.
New solid-state MAS NMR Experiment Prototypes included in CcpNmr Analysis showing for each experiment prototype the systematic name, the maximum possible number of dimensions, a synonym and examples of experiments included in the prototype
Systematic experiment name
CC (onebond); CC COSYc
DOAM, CMAR, CMR, CTUC
CC (relayed); CC TOCSYc
TOBSY, short mixing time PDSD, DARR, RFDR, PAR
CC (through-space); CC NOESYc
Long mixing time PDSD, DARR, RFDR, PAR
POST-C7, INADEQUATE etc.
Short mixing time TEDOR, PAIN
Long mixing time TEDOR, PAIN
HCC (relayed); HCC TOCSY
Users who develop new pulse sequences requiring new experiment descriptions can add these using the graphical Experiment Prototypes module within CcpNmr Analysis. It is hoped that users will then share these experiment descriptions with the wider NMR community by making them available for future distributions of CCPN.
Once an experiment has been allocated an experiment prototype within CcpNmr Analysis, the information contained within the prototype is used by the program at several different stages. For instance, in an NCACX spectrum recorded with a long mixing time (N_CA_C.onebond,through-space), the experiment prototype description indicates that the N and Cα must derive from the same spin system; when looking at assignment options for a peak the program can filter the assignment options accordingly. Similarly, the descriptions of a DARR and CHHC experiment (C_C.through-space and Ch_hC.through-space, respectively) allow the software to distinguish automatically whether distance restraints need to be generated between carbon atoms (in the case of a DARR spectrum) or their bonded hydrogen atoms (in the case of a CHHC spectrum), despite the fact that the peak lists will in both cases simply consist of pairs of carbon atom chemical shifts.
Isotope labelling schemes
Solid-state MAS NMR studies often benefit from a variety of non-uniform labelling schemes (van Gammeren et al.2004; Etzkorn et al.2007; Hiller et al.2008; Higman et al.2009; Schneider et al.2010; Hefke et al.2010). Selective and extensive isotope labelling with 1,3-13C or 2-13C-labelled glucose, for instance, results in few neighbouring atom sites being simultaneously spin-labelled. The advantages of these labelling schemes include the narrowing of signals, a reduction in signal overlap, an increase in observable long-range correlations and the possibility of accurately identifying intermolecular contacts (Hong and Jakes 1999; Castellani et al.2002; Loquet et al.2010). However, the labelling schemes are generally not straight forward, especially for amino acids synthesised via the citric acid cycle for which several different isotopomers are produced. The labelling scheme contains significant amounts of useful information for resonance and distance constraint assignment, but harnessing this information is not trivial. The program SOLARIA was developed to filter for the labelling scheme when assigning distance constraints for iterative automated structure calculations (Fossi et al.2005) and this approach has also been used in ARIA (Rieping et al.2007) from version 2.3 onwards. The labelling information can also be useful at many other stages of data analysis.
We have introduced a comprehensive Isotopic Labelling module into the data model and a corresponding graphical interface in CcpNmr Analysis which allows users to specify any type of macromolecular labelling. This will benefit both solid-state MAS and solution NMR applications. The system can describe uniform labelling (e.g. [U-13C,15N] labelling), amino-acid specific labelling (e.g. from cell-free protein synthesis or reverse labelling), labelling which results in a mixture of isotopomers for some amino acids (e.g. labelling with [1,3-13C]-glycerol), or segmental labelling which results in only part of the molecule being labelled (e.g. obtained by using solid-phase peptide synthesis or ligation techniques).
Reference isotopic labelling schemes are provided for many commonly used isotopic labelling strategies including natural abundance labelling, [U-13C,15N], [U-2H,13C,15N], [U-15N] with [1,3-13C]-glycerol labelling, SAIL labelling (Kainosho et al.2006) and others. Copies of these standard schemes can be altered for any amino acid type on an atom by atom basis, or new schemes can be created from scratch. It is possible to introduce multiple isotopomers for any amino acid type and, if known, the relative residue isotopomer populations can be specified (this is required e.g. for the glycerol-based labelling schemes).
Analysis of solid-state MAS NMR spectra
Double quantum chemical shift axes
Spinning side bands
Isotope abundance and correlation filtering
CcpNmr Analysis (and ARIA to which it is has been coupled) makes use of the isotope labelling information associated with a particular NMR experiment in order to generate label-aware assignments and distance restraints from through-space correlations. The labelling information is specified simply in the graphical interface by allocation of an experiment to a ‘labelled sample’ (with isotopic labels derived from a standard scheme, a modified scheme or a mixture of different schemes, see above). The abundances of magnetically active nuclei that give rise to a peak are then automatically calculated according to the experiment from which the peak derives. Any operation that requires the coincidence of two magnetically active nuclei is aware of both the average isotopic incorporation at each atom site and the proportion of situations where both sites are labelled at the same time; for example Cα and Cβ atoms in a given residue may both be labelled, but never at the same time, within the same isotopomer. The incorporation of isotopic labelling information allows easier assignment within CcpNmr Analysis since inappropriate resonance assignment possibilities are removed. Importantly, when generating distance restraints by matching chemical shifts to peak positions, ambiguity is reduced. Knowledge of the particular isotopic abundance also allows for a calibration of through-space peak intensities, such that the underlying intensity of the peak, as if it were 100% spin labelled, can be calculated prior to the application of an intensity-to-distance relation. This method is also used for NMR experiments where the through-space correlated resonances are not observed directly, for example in a CHHC experiment where only 13C resonances are observed, but the actual magnetisation transfer (and therefore the generated distance restraints) are between the unobserved hydrogen atoms. In this case the labelling of both the carbon and hydrogen nuclei are considered.
Synthetic peak lists
The use of the experiment types and labelling schemes becomes very powerful when combined with the ability of CcpNmr Analysis to create synthetic peak lists. One possibility, for instance, is to import chemical shift lists from the BMRB or shift prediction programs such as SHIFTX (Neal et al. 2003) and SPARTA (Shen and Bax 2007) using the CcpNmr Format Converter. These shift lists can then be used to create peaks for any type of spectrum with any type of isotopic labelling in order to compare predicted and actual peak positions. Alternatively, once a protein has been assigned, it is possible to create a synthetic peak list which contains only intra-residue peaks or peaks between neighbouring residues. This allows rapid identification of long-range peaks required for structure calculations in spectra containing both long- and short-range correlations. Such peak lists can also be used as an efficient tool to check whether all assignments have been made in a spectrum.
In addition to the features described above which have been developed specifically for solid-state MAS NMR data analysis, several other features in CcpNmr Analysis are particularly useful in the analysis of solid-state MAS NMR data. The easy way in which 2D and 3D spectra can be overlaid is well suited to making comparisons between spectra recorded with different mixing times or on samples with different labelling schemes. The option to arrange strips horizontally rather than vertically suits many solid-state MAS NMR spectra, such as NCACX or NCOCX spectra, whose peaks conventionally occur in horizontal rather than vertical strips. The ability to make tentative assignments is a helpful way of keeping track electronically of multiple assignment possibilities in ambiguous situations. Many other routines within CcpNmr Analysis have also been updated and tested to ensure their compatibility with solid-state MAS NMR data, enabling the easy display of build-up curves, the creation of distance restraints involving carbon or nitrogen atoms or assignment statistics in the quality reports which exclude hydrogen atoms.
In conclusion, we present additions to the CCPN data model and new features within the CcpNmr Analysis program which significantly improve solid-state MAS NMR data analysis and streamline the resonance assignment and structure calculation processes. The structure calculation program, ARIA (Rieping et al.2007), for example, uses the CCPN data model and movement between ARIA and CcpNmr Analysis is almost seamless. This is now also the case when using solid-state MAS NMR data. The information about labelling schemes and experiment types which has been entered into CcpNmr Analysis can be accessed by ARIA and exploited during its structure calculation routines. The expanded data model also provides a platform for the development of new, interconnected software geared towards solid-state MAS NMR data analysis, e.g. automated assignment programs (Moseley et al. 2010; Tycko and Hu 2010) or line-shape fitting routines for the extraction of dipolar couplings to produce angular structure restraints (Franks et al.2008).
Several of the features presented here also provide improvements for solution NMR data analysis. The double quantum axes, for instance, can be used to analyse reduced dimensionality experiments and filtering according to isotope labelling schemes is useful when using SAIL-labelled proteins (Kainosho et al.2006). Overall, the expanded data model and new features in CcpNmr Analysis provide an important contribution towards the analysis of protein solid-state MAS NMR data to match the rapid methodological developments that have occurred over the past decade.
Versions 2.2.1 and above of CcpNmr Analysis contain all the described solid-state MAS NMR extensions, together with the latest CCPN data model. APIs for Python, C and Java languages are available, storing data in the form of the updated data model, which includes the new experiment descriptions and isotope labelling schemes. Subsequent CCPN software releases will include further experiment descriptions and labelling schemes provided by the user community. The software can be downloaded free of charge for non-profit institutions at http://www.ccpn.ac.uk. Detailed documentation is available at the CCPN web site via http://www.ccpn.ac.uk/documentation/ and http://www.ccpn.ac.uk/wiki/.
We thank Johanna Becker-Baldus, Patrick van der Wel and Trent Franks for helpful discussions. We acknowledge the Deutsche Forschungsgemeinschaft and the Biotechnology and Biological Sciences Research Council (UK) for funding.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Brünger AT (1992) XPLOR version 3.1: a system for X-ray crystallography and NMR. Yale University Press, New HavenGoogle Scholar
- Brünger AT, Adams PD, Clore GM, DeLano WL, Gros P, Grosse-Kunstleve RW, Jiang JS, Kuszewski J, Nilges M, Pannu NS, Read RJ, Rice LM, Simonson T, Warren GL (1998) Crystallography & NMR system: a new software suite for macromolecular structure determination. Acta Cryst D 54:905–921CrossRefGoogle Scholar
- Cady SD, Schmidt-Rohr K, Wang J, Soto CS, DeGrado WF, Hong M (2010) Structure of the amantadine binding site of influenza M2 proton channels in lipid bilayers. Nature 463:689–692Google Scholar
- Goddard TD, Kneller DG. SPARKY 3. University of California, San Fransisco. http://www.cgl.ucsf.edu/home/sparky/
- Hefke F, Bagaria A, Reckel S, Ullrich S, Dötsch V, Glaubitz C and Güntert P (2010) Optimization of amino acid type-specific 13C and 15 N labeling for the backbone assignment of membrane proteins by solution- and solid-state NMR with the UPLABEL algorithm. J Biomol NMR. doi:10.1007/s10858-010-9462-4
- Jehle S, Rajagopal P, Bardiaux B, Markovic S, Kühne R, Stout JR, Higman VA, Klevit RE, van Rossum BJ, Oschkinat H (2010) Solid-state NMR and SAXS studies provide a structural basis for the activation of alpha B-crystallin oligomers. Nature Struct Mol Biol 17:1027–1042Google Scholar