Journal of Biomolecular NMR

, Volume 64, Issue 2, pp 115–130 | Cite as

Conformationally selective multidimensional chemical shift ranges in proteins from a PACSY database purged using intrinsic quality criteria

  • Keith J. FritzschingEmail author
  • Mei Hong
  • Klaus Schmidt-RohrEmail author


We have determined refined multidimensional chemical shift ranges for intra-residue correlations (13C–13C, 15N–13C, etc.) in proteins, which can be used to gain type-assignment and/or secondary-structure information from experimental NMR spectra. The chemical-shift ranges are the result of a statistical analysis of the PACSY database of >3000 proteins with 3D structures (1,200,207 13C chemical shifts and >3 million chemical shifts in total); these data were originally derived from the Biological Magnetic Resonance Data Bank. Using relatively simple non-parametric statistics to find peak maxima in the distributions of helix, sheet, coil and turn chemical shifts, and without the use of limited “hand-picked” data sets, we show that ~94 % of the 13C NMR data and almost all 15N data are quite accurately referenced and assigned, with smaller standard deviations (0.2 and 0.8 ppm, respectively) than recognized previously. On the other hand, approximately 6 % of the 13C chemical shift data in the PACSY database are shown to be clearly misreferenced, mostly by ca. −2.4 ppm. The removal of the misreferenced data and other outliers by this purging by intrinsic quality criteria (PIQC) allows for reliable identification of secondary maxima in the two-dimensional chemical-shift distributions already pre-separated by secondary structure. We demonstrate that some of these correspond to specific regions in the Ramachandran plot, including left-handed helix dihedral angles, reflect unusual hydrogen bonding, or are due to the influence of a following proline residue. With appropriate smoothing, significantly more tightly defined chemical shift ranges are obtained for each amino acid type in the different secondary structures. These chemical shift ranges, which may be defined at any statistical threshold, can be used for amino-acid type assignment and secondary-structure analysis of chemical shifts from intra-residue cross peaks by inspection or by using a provided command-line Python script (PLUQin), which should be useful in protein structure determination. The refined chemical shift distributions are utilized in a simple quality test (SQAT) that should be applied to new protein NMR data before deposition in a databank, and they could benefit many other chemical-shift based tools.


Protein chemical shift Databases Protein secondary structure Data mining PIQC PACSY PLUQin SQAT 



K. S. R. gratefully acknowledges Brandeis University for support. This work was partly supported by NIH Grant GM066976 to M. H.

Supplementary material

10858_2016_13_MOESM1_ESM.pdf (1.4 mb)
Supplementary material 1 (PDF 1428 kb)


  1. Berens P (2009) CircStat: a MATLAB toolbox for circular statistics. J Stat Softw 31:1–21. doi: 10.18637/jss.v031.i10 CrossRefMathSciNetGoogle Scholar
  2. Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302. doi: 10.1023/A:1008392405740 CrossRefGoogle Scholar
  3. Fritzsching KJ, Yang Y, Schmidt-Rohr K, Hong M (2013) Practical use of chemical shift databases for protein solid-state NMR: 2D chemical shift maps and amino-acid assignment with secondary-structure information. J Biomol NMR 56:155–167. doi: 10.1007/s10858-013-9732-z CrossRefGoogle Scholar
  4. Ginzinger SW, Gerick F, Coles M, Heun V (2007) CheckShift: automatic correction of inconsistent chemical shift referencing. J Biomol NMR 39:223–227. doi: 10.1007/s10858-007-9191-5 CrossRefGoogle Scholar
  5. Ginzinger SW, Skocibusic M, Heun V (2009) CheckShift improved: fast chemical shift reference correction with high accuracy. J Biomol NMR 44:207–211. doi: 10.1007/s10858-009-9330-2 CrossRefGoogle Scholar
  6. Hampel FR (1974) The influence curve and its role in robust estimation. J Am Stat Assoc 69:383–393. doi: 10.1080/01621459.1974.10482962 CrossRefMathSciNetzbMATHGoogle Scholar
  7. Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 50:43–57. doi: 10.1007/s10858-011-9478-4 CrossRefGoogle Scholar
  8. Hastie T, Tibshirani R, Firedman J (2009) Model inference and averaging: the elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, BerlinGoogle Scholar
  9. Hazan C et al (2008) Structural insights on the pamoic acid and the 8 kDa domain of DNA polymerase beta complex: towards the design of higher-affinity inhibitors. BMC Struct Biol 8:22. doi: 10.1186/1472-6807-8-22 CrossRefGoogle Scholar
  10. Heinig M, Frishman D (2004) STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 32:W500–W502. doi: 10.1093/nar/gkh429 CrossRefGoogle Scholar
  11. Helmus JJ, Jaroniec CP (2013) Nmrglue: an open source Python package for the analysis of multidimensional NMR data. J Biomol NMR 55:355–367. doi: 10.1007/s10858-013-9718-x CrossRefGoogle Scholar
  12. Hu KN, Qiang W, Tycko R (2011) A general Monte Carlo/simulated annealing algorithm for resonance assignment in NMR of uniformly labeled biopolymers. J Biomol NMR 50:267–276. doi: 10.1007/s10858-011-9517-1 CrossRefGoogle Scholar
  13. Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9:90–95. doi: 10.1109/Mcse.2007.55 CrossRefGoogle Scholar
  14. Iwadate M, Asakura T, Williamson MP (1999) Cα and Cβ carbon-13 chemical shifts in proteins from an empirical database. J Biomol NMR. doi: 10.1023/A:1008376710086 Google Scholar
  15. Lee W, Yu W, Kim S, Chang I, Lee W, Markley JL (2012) PACSY, a relational database management system for protein structure and chemical shift analysis. J Biomol NMR 54:169–179. doi: 10.1007/s10858-012-9660-3 CrossRefGoogle Scholar
  16. Lovell SC et al (2003) Structure validation by Cα geometry: Φ, Ψ and Cβ deviation. Proteins 50:437–450. doi: 10.1002/prot.10286 CrossRefGoogle Scholar
  17. McGuffin LJ, Bryson K, Jones DT (2000) The PSIPRED protein structure prediction server. Bioinformatics 16:404–405CrossRefGoogle Scholar
  18. Moseley HN, Sahota G, Montelione GT (2004) Assignment validation software suite for the evaluation and presentation of protein resonance assignment data. J Biomol NMR 28:341–355. doi: 10.1023/B:JNMR.0000015420.44364.06 CrossRefGoogle Scholar
  19. Neal S, Nip AM, Zhang HY, Wishart DS (2003) Rapid and accurate calculation of protein H-1, C-13 and N-15 chemical shifts. J Biomol NMR 26:215–240. doi: 10.1023/A:1023812930288 CrossRefGoogle Scholar
  20. Open Source Geospatial Foundation (2003) GEOS—Geometry engine open source. Accessed Sept 2015
  21. Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  22. Raschle T, Hiller S, Yu TY, Rice AJ, Walz T, Wagner G (2009) Structural and functional characterization of the integral membrane protein VDAC-1 in lipid bilayer nanodiscs. J Am Chem Soc 131:17777–17779CrossRefGoogle Scholar
  23. Romano JP (1988) On weak-convergence and optimality of kernel density estimates of the mode. Ann Stat 16:629–647. doi: 10.1214/aos/1176350824 CrossRefMathSciNetzbMATHGoogle Scholar
  24. Rost B, Sander C (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19:55–72. doi: 10.1002/prot.340190108 CrossRefGoogle Scholar
  25. Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53:382–387. doi: 10.2307/2686061 Google Scholar
  26. Saito H, Ando I, Ramamoorthy A (2010) Chemical shift tensor—the heart of NMR: insights into biological aspects of proteins. Prog Nucl Magn Reson Spectrosc 57:181–228. doi: 10.1016/j.pnmrs.2010.04.005 CrossRefGoogle Scholar
  27. Shen Y, Bax A (2010) SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J Biomol NMR 48:13–22. doi: 10.1007/s10858-010-9433-9 CrossRefGoogle Scholar
  28. Shen Y, Bax A (2013) Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J Biomol NMR 56:227–241. doi: 10.1007/s10858-013-9741-y CrossRefGoogle Scholar
  29. Shen Y, Delaglio F, Cornilescu G, Bax A (2009) TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 44:213–223. doi: 10.1007/s10858-009-9333-z CrossRefGoogle Scholar
  30. Spera S, Bax A (1991) Empirical correlation between protein backbone conformation and C. alpha. and C. beta. 13C nuclear magnetic resonance chemical shifts. J Am Chem Soc 113:5490–5492. doi: 10.1021/ja00014a071 CrossRefGoogle Scholar
  31. Tycko R (2015) On the problem of resonance assignments in solid state NMR of uniformly 15N, 13C-labeled proteins. J Magn Reson 253:166–172. doi: 10.1016/j.jmr.2015.02.006 CrossRefADSGoogle Scholar
  32. Tycko R, Hu KN (2010) A Monte Carlo/simulated annealing algorithm for sequential resonance assignment in solid state NMR of uniformly labeled proteins with magic-angle spinning. J Magn Reson 205:304–314. doi: 10.1016/j.jmr.2010.05.013 CrossRefADSGoogle Scholar
  33. Ulrich EL et al (2008) BioMagResBank. Nucleic Acids Res 36:D402–D408. doi: 10.1093/nar/gkm957 CrossRefGoogle Scholar
  34. van der Walt Sf, Colbert SC, Varoquaux Gl (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30. doi: 10.1109/mcse.2011.37 CrossRefGoogle Scholar
  35. Wang Y, Jardetzky O (2002a) Investigation of the neighboring residue effects on protein chemical shifts. J Am Chem Soc 124:14075–14084. doi: 10.1021/ja026811f CrossRefGoogle Scholar
  36. Wang Y, Jardetzky O (2002b) Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci 11:852–861. doi: 10.1110/ps.3180102 CrossRefGoogle Scholar
  37. Wang L, Markley JL (2009) Empirical correlation between protein backbone 15N and 13C secondary chemical shifts and its application to nitrogen chemical shift re-referencing. J Biomol NMR 44:95–99. doi: 10.1007/s10858-009-9324-0 CrossRefGoogle Scholar
  38. Wang Y, Wishart DS (2005) A simple method to adjust inconsistently referenced 13C and 15N chemical shift assignments of proteins. J Biomol NMR 31:143–148. doi: 10.1007/s10858-004-7441-3 CrossRefzbMATHGoogle Scholar
  39. Wang L, Eghbalnia HR, Bahrami A, Markley JL (2005) Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications. J Biomol NMR 32:13–22. doi: 10.1007/s10858-005-1717-0 CrossRefGoogle Scholar
  40. Wang L, Eghbalnia HR, Markley JL (2007) Nearest-neighbor effects on backbone alpha and beta carbon chemical shifts in proteins. J Biomol NMR 39:247–257. doi: 10.1007/s10858-007-9193-3 CrossRefGoogle Scholar
  41. Wang B, Wang Y, Wishart DS (2010) A probabilistic approach for validating protein NMR chemical shift assignments. J Biomol NMR 47:85–99. doi: 10.1007/s10858-010-9407-y CrossRefGoogle Scholar
  42. Yang Y, Fritzsching KJ, Hong M (2013) Resonance assignment of the NMR spectra of disordered proteins using a multi-objective non-dominated sorting genetic algorithm. J Biomol NMR 57:281–296. doi: 10.1007/s10858-013-9788-9 CrossRefGoogle Scholar
  43. Zhang H, Neal S, Wishart DS (2003) RefDB: a database of uniformly referenced protein chemical shifts. J Biomol NMR 25:173–195. doi: 10.1023/A:1022836027055 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.Department of ChemistryBrandeis UniversityWalthamUSA
  2. 2.Department of ChemistryMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations