Abstract
Chemical shifts provide not only peak identities for analyzing nuclear magnetic resonance (NMR) data, but also an important source of conformational information for studying protein structures. Current structural studies requiring Hα chemical shifts suffer from the following limitations. (1) For large proteins, the Hα chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of Cα that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict Hα chemical shifts. Predicting accurate Hα chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called Hash, to predict Hα chemical shifts. Hash combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate Hα chemical shifts. Our testing results on different possible combinations of input data indicate that Hash has a wide rage of potential NMR applications in structural and biological studies of proteins.
Similar content being viewed by others
References
Apaydin MS, Çatay B, Patrick N, Donald BR (2010) NVR-BIP: nuclear vector replacement using binary integer programming for NMR structure-based assignments. Comput J
Apaydin S, Conitzer V, Donald BR (2008) Structure-based protein NMR assignments using native structural ensembles. J Biomol NMR 40:263–276
Arun K, Langmead C (2006) Structure based chemical shift prediction using Random Forests non-linear regression. In: Proceedings of the forth Asia-Pacific bioinformatics conference, (APBC) 2006
Bailey-Kellogg C, Widge A, Kelley JJ, Berardi MJ, Bushweller JH, Donald BR (2000) The NOESY jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data. J Comput Biol 7(3–4):537–558
Cleveland W, Devlin S (1988) Locally-weighted regression: An approach to regression analysis by local fitting. J Am Stat Assoc 403:596–610
Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302
Delaglio F, Kontaxis G, Bax A (2000) Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J Am Chem Soc 122:2142–2143
Donald BR (2011) Algorithms in structural molecular biology. MIT Press, Cambridge, Mass., USA
Donald BR, Martin J (2009) Automated NMR assignment and protein structure determination using sparse dipolar coupling constraints. Prog NMR Spectrosc 55:101–127
Han B, Liu Y, Ginzinger SW, Wishart DS (2011) SHIFTX2: significantly improved protein chemical shift prediction. J Biomol NMR 50(1):43–57
Hastie T (2011) R Package: generalized additive models. http://cran.r-project.org/web/packages/gam/
Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall, London
He Y, Chen Y, Alexander P, Bryan PN, Orban J (2008) NMR structures of two designed proteins with high sequence identity but different fold and function. Proc Natl Acad Sci USA 105(38):14412–14417
Iwadate M, Asakura T, Williamson MP (1999) C alpha and C beta carbon-13 chemical shifts in proteins from an empirical database. J Biomol NMR 13(3):199–211
Jang R, Gao X, Li M (2011) Towards fully automated structure-based NMR resonance assignment of 15N-labeled proteins from automatically picked peaks. J Comput Biol 18(3):347–363
Kohlhoff KJ, Robustelli P, Cavalli A, Salvatella X, Vendruscolo M (2009) Fast and accurate predictions of protein NMR chemical shifts from interatomic distances. J Am Chem Soc 131(39):13894–13895
Lange OF, Rossi P, Sgourakis NG, Song Y, Lee H-W, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D (2012) Determination of solution structures of proteins up to 40 kda using cs-rosetta with sparse nmr data from deuterated samples. Proc Natl Acad Sci USA 109(27):10873–10878
Langmead C, Donald B (2004) An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments. J Biomol NMR 29(2):111–138
Langmead CJ, Yan AK, Lilien RH, Wang L, Donald BR (2004) A polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignments. J Comput Biol 11:277–298
Marin A, Malliavin T, Nicolas P, Delsuc M (2004) From NMR chemical shifts to amino acid types: investigation of the predictive power carried by nuclei. J Biomol NMR 30:47–60
Meiler J (2003) PROSHIFT: protein chemical shift prediction using artificial neural networks. J Biomol NMR 26(1):25–37
Moon S, Case DA (2007) A new model for chemical shifts of amide hydrogens in proteins. J Biomol NMR 38(2):139–150
Morrone A, McCully ME, Bryan PN, Brunori M, Daggett V, Gianni S, Travaglini-Allocatelli C (2011) The denatured state dictates the topology of two proteins with almost identical sequence but different native structure and function. J Biol Chem 286(5):3863–3872
Mulder FAA, Filatov M (2010) NMR chemical shift data and ab initio shielding calculations: emerging tools for protein structure determination. Chem Soc Rev 39(2):578–590
Neal S, Nip AM, Zhang H, Wishart DS (2003) Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J Biomol NMR 26(3):215–240
Pople JA (1956) Proton magnetic resonance of hydrocarbons. J Chem Phys 29:1012–1014
Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT, Baker D (2010) NMR structure determination for larger proteins using backbone-only data. Science 327(5968):1014–1018
Rosato A, Aramini JM, Arrowsmith C, Bagaria A, Baker D, Cavalli A, Doreleijers JF, Eletsky A, Giachetti A, Guerry P, Gutmanas A, G1ntert P, He Y, Herrmann T, Huang YJ, Jaravine V, Jonker HRA, Kennedy MA, Lange OF, Liu G, Malliavin TE, Mani R, Mao B, Montelione GT, Nilges M, Rossi P, van der Schot G, Schwalbe H, Szyperski TA, Vendruscolo M, Vernon R, Vranken WF, de Vries S, Vuister GW, Wu B, Yang Y, Bonvin AMJJ (2012) Blind testing of routine, fully automated determination of protein structures from nmr data. Structure 20(2):227–236
Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM (2003) The Xplor-NIH NMR molecular structure determination package. J Magn Reson 160:65–73
Shen Y, Bax A (2007) Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology. J Biomol NMR 38(4):289–302
Shen Y, Bax A (2010) SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J Biomol NMR 48(1):13–22
Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu G, Eletsky A, Wu Y, Singarapu KK, Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D, Bax A (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci USA 105(12):4685–4690
Shen Y, Vernon R, Baker D, Bax A (2009) De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 43:63–78
Thompson JM, Sgourakis NG, Liu G, Rossi P, Tang Y, Mills JL, Szyperski T, Montelione GT, Baker D (2012) Accurate protein structure modeling using sparse nmr data and homologous structure information. Proc Natl Acad Sci USA 109(25):9875–9880
Tripathy C, Zeng J, Zhou P, Donald BR (2012) Protein loop closure using orientational restraints from NMR Data. Proteins Struct Funct Bioinform 80(2):433 – 453
Ulrich E, Akutsu H, Doreleijers J, Harano Y, Ioannidis Y, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte C, Tolmie D, Wenger R, Yao H, Markley J (2007) BioMagResBank. Nucleic Acids Res 36:D402–D408
Vila JA, Arnautova YA, Martin OA, Scheraga HA (2009) Quantum-mechanics-derived 13Calpha chemical shift server (CheShift) for protein structure validation. Proc Natl Acad Sci USA 106(40):16972–16977
Vila JA, Serrano P, Wüthrich K, Scheraga HA (2010) Sequential nearest-neighbor effects on computed 13calpha chemical shifts. J Biomol NMR 48(1):23–30
Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London
Wang L, Donald BR (2004) Exact solutions for internuclear vectors and backbone dihedral angles from NH residual dipolar couplings in two media, and their application in a systematic search algorithm for determining protein backbone structure. J Biomol NMR 29(3):223–242
Wang L, Eghbalnia HR, Bahrami A, Markley JL (2005) Linear analysis of carbon-13 chemical shift differences and its application to the detection and correction of errors in referencing and spin system identifications. J Biomol NMR 32(1):13–22
Wang L, Markley JL (2009) Empirical correlation between protein backbone 15N and 13C secondary chemical shifts and its application to nitrogen chemical shift re-referencing. J Biomol NMR 44(2):95–99
Wang L, Mettu R, Donald BR (2006) A polynomial-time algorithm for De Novo protein backbone structure determination from NMR data. J Comput Biol 13(7):1276–1288
Wishart DS (2011) Interpreting protein chemical shift data. Prog Nucl Magn Reson Spectros 58:62–87
Wishart DS, Arndt D, Berjanskii M, Tang P, Zhou J, Lin G (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res 36(Web Server issue):W496–W502
Wishart DS, Watson MS, Boyko RF, Sykes BD (1997) Automated 1H and 13C chemical shift prediction using the BioMagResBank. J Biomol NMR 10(4):329–336
Xiong F, Pandurangan G, Bailey-Kellogg C (2008) Contact replacement for NMR resonance assignment. Bioinformatics 24(13):i205–i213
Xu XP, Case DA (2001) Automated prediction of 15N, 13Calpha, 13Cbeta and 13C’ chemical shifts in proteins using a density functional database. J Biomol NMR 21(4):321–333
Xu Y, Xu D, Uberbacher EC (1998) An efficient computational method for globally optimal threading. J Comput Biol. 5(3):597–614
Yershova A, Tripathy C, Zhou P, Donald B (2011) Algorithms and analytic solutions using sparse residual dipolar couplings for high-resolution automated protein backbone structure determination by NMR. In Workshop on the algorithmic foundations of robotics (WAFR), Singapore
Zeng J, Boyles J, Tripathy C, Wang L, Yan A, Zhou P, Donald BR (2009) High-resolution protein structure determination starting with a global fold calculated from exact solutions to the RDC equations. J Biomol NMR 45(3):265–281
Zeng J, Roberts KE, Zhou P, Donald BR (2011a) A bayesian approach for determining protein side-chain rotamer conformations using unassigned NOE data. In: Proceedings of the 15th annual international conference on research in computational molecular biology (RECOMB’11), Vancouver
Zeng J, Tripathy C, Zhou P, Donald BR (2008) A Hausdorff-Based NOE assignment algorithm using protein backbone determined from residual dipolar couplings and rotamer patterns. In: Proceedings of the 7th annual international conference on computational systems bioinformatics, Stanford, pp 169–181. ISBN 1752–7791. PMID: 19122773
Zeng J, Zhou P, Donald BR (2010) A markov random field framework for protein side-chain resonance assignment. In: Proceedings of the 14th annual international conference on research in computational molecular biology (RECOMB’10), Lisbon, Portugal
Zeng J, Zhou P, Donald BR (2011b) Protein side-chain resonance assignment and NOE assignment using RDC-Defined backbones without TOCSY Data. J Biomol NMR 50(4):371–95
Zhang H, Neal S, Wishart DS (2003) RefDB: a database of uniformly referenced protein chemical shifts. J Biomol NMR 25(3):173–195
Acknowledgments
We thank all members of the Donald and Zhou labs for helpful discussions and comments. This work is supported by the following grants from National Institutes of Health: R01 GM-65982 to B.R.D. and R01 GM-079376 to P.Z.
Author information
Authors and Affiliations
Corresponding authors
Additional information
The source code of Hash is available by contacting the authors, and is distributed open-source under the GNU Lesser General Public License (Gnu 2002). The source code can be freely downloaded.
Rights and permissions
About this article
Cite this article
Zeng, J., Zhou, P. & Donald, B.R. Hash: a program to accurately predict protein Hα shifts from neighboring backbone shifts. J Biomol NMR 55, 105–118 (2013). https://doi.org/10.1007/s10858-012-9693-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10858-012-9693-7