Prediction of hydrogen and carbon chemical shifts from RNA using database mining and support vector regression
- 640 Downloads
The Biological Magnetic Resonance Data Bank (BMRB) contains NMR chemical shift depositions for over 200 RNAs and RNA-containing complexes. We have analyzed the 1H NMR and 13C chemical shifts reported for non-exchangeable protons of 187 of these RNAs. Software was developed that downloads BMRB datasets and corresponding PDB structure files, and then generates residue-specific attributes based on the calculated secondary structure. Attributes represent properties present in each sequential stretch of five adjacent residues and include variables such as nucleotide type, base-pair presence and type, and tetraloop types. Attributes and 1H and 13C NMR chemical shifts of the central nucleotide are then used as input to train a predictive model using support vector regression. These models can then be used to predict shifts for new sequences. The new software tools, available as stand-alone scripts or integrated into the NMR visualization and analysis program NMRViewJ, should facilitate NMR assignment and/or validation of RNA 1H and 13C chemical shifts. In addition, our findings enabled the re-calibration a ring-current shift model using published NMR chemical shifts and high-resolution X-ray structural data as guides.
KeywordsRNA Chemical shift Secondary structure NMR signal assignment and validation
This research was supported by Grants from the National Institute of General Medical Sciences of the National Institutes of Health (NIGMS, R01 GM42561 to MFS and P50 GM 103297 to BAJ), and JDB was supported by a NIGMS Grant for maximizing student diversity, NIGMS R25 GM 055036. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
- Bishop CM (2006) Pattern recognition and machine learning. Information science and statistics. Springer, New YorkGoogle Scholar
- Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2:27Google Scholar
- Dejaegere A, Bryce RA, Case DA (1999) An empirical analysis of proton chemical shifts in nucleic acids. In: Facelli J, deDios AC (eds) Modelling NMR chemical shifts: gaining insight into structure and environment. ACS symposium series. American Chemical Society, Washington, pp 194–206Google Scholar
- Haigh C, Mallion R (1980) Progress in NMR spectroscopy, vol 13. Pergamon, New York, pp 303–344Google Scholar
- Wang Y, Witten IH (2002) Modeling for optimal probability prediction. In: Proceedings of the nineteenth international conference on machine learning, 2002. Morgan Kaufmann, San Mateo, pp 650–657Google Scholar
- Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, San MateoGoogle Scholar
- Wüthrich K (1995) NMR in structural biology: a collection of papers by Kurt Wüthrich. World Scientific series in 20th century chemistry, vol 5. World Scientific, Singapore, River EdgeGoogle Scholar