Abstract
Protein secondary structure provides rich structural information, hence the description and understanding of protein structure relies heavily on it. Identification or prediction of secondary structures therefore plays an important role in protein research. In protein NMR studies, it is more convenient to predict secondary structures from chemical shifts as compared to the traditional determination methods based on inter-nuclear distances provided by NOESY experiment. In recent years, there was a significant improvement observed in deep neural networks, which had been applied in many research fields. Here we proposed a deep neural network based on bidirectional long short term memory (biLSTM) to predict protein 3-state secondary structure using NMR chemical shifts of backbone nuclei. While comparing with the existing methods the proposed method showed better prediction accuracy. Based on the proposed method, a web server has been built to provide protein secondary structure prediction service.
Similar content being viewed by others
Data availability
The supporting information is attached as supplementary information.
Software availability
The service is provided by an online web server: http://www.proteindeeplearning.info/.
References
Bengio Y (2009) Learning deep architectures for AI. Foundations Trends® in Machine Learning 2:1–127
Berman HM et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Bohr H et al (1988) Protein secondary structure and homology by neural networks - the alpha-helices in rhodopsin. FEBS Lett 241:223–228
Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:222–245
Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18:602–610
Guzzo AV (1965) Influence of amino-acid sequence on protein structure. Biophys J 5:809–822
Hafsa NE, Wishart DS (2014) CSI 2.0: a significantly improved version of the Chemical Shift Index. J Biomol NMR 60:131–146
Hafsa NE, Arndt D, Wishart DS (2015) CSI 3.0: a web server for identifying secondary and super-secondary structure in proteins using NMR chemical shifts. Nucleic Acids Res 43:W370–W377
Harris DM, Harris SL, Prinz P, Crawford T (2019) Digital design and computer architecture. Morgan Kaufmann
He B et al (2009) Predicting intrinsic disorder in proteins: an overview. Cell Res 19:929–949
Heffernan R et al (2015) Improving prediction of secondary structure, local backbone angles and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Hung LH, Samudrala R (2003) Accurate and automated classification of protein secondary structure with PsiCSI. Protein Sci 12:288–295
Jiang Q, Jin X, Lee SJ, Yao SW (2017) Protein secondary structure prediction: a survey of the state of the art. J Mol Graph Model 76:379–402
Kabat EA, Wu TT (1973) The influence of nearest-neighbor amino acids on the conformation of the middle amino acid in proteins: comparison of predicted and experimental determination of β-sheets in concanavalin A. Proc Natl Acad Sci 70:1473
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern-recognition of hydrogen-bonded and geometrical features. Biopolymers 22:2577–2637
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. in International Conference on Learning Representations 1–13
Kuhlman B, Bradley P (2019) Advances in protein structure prediction and design. Nat Rev Mol Cell Biol 20:681–697
Meiler J, Muller M, Zeidler A, Schmaschke F (2001) Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks. J Mol Model 7:360–369
Minsky M (1954) Neural-analog networks and the brain model problem. Ph. D. Thesis
Muggleton S, King RD, Sternberg MJE (1992) Protein secondary structure prediction using logic-based machine learning. Protein Eng 5:647–657
Nelson DL, Cox MM (2017) Lehninger principles of biochemistry, 7th ed. Macmillan
Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134:204–218
Rost B, Sander C (1993) Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci 90:7558
Senior AW et al (2020) Improved protein structure prediction using potentials from deep learning. Nature 577:706–710
Shen Y, Bax A (2013) Protein backbone and sidechain torsion angles predicted from NMR chemical shifts using artificial neural networks. J Biomol NMR 56:227–241
Smolarczyk T, Roterman-Konieczna I, Stapor K (2020) Protein secondary structure prediction: a review of progress and directions. Curr Bioinform 15:90–107
Soding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33:W244–W248
Tealab A (2018) Time series forecasting using artificial neural networks methodologies: a systematic review. Fut Comput Inf J 3:334–340
Ulrich EL et al (2019) NMR-STAR: comprehensive ontology for representing, archiving and exchanging data from nuclear magnetic resonance spectroscopic experiments. J Biomol NMR 73:5–9
Wang YJ, Jardetzky O (2002) Probability-based protein secondary structure identification using combined NMR chemical-shift data. Protein Sci 11:852–861
Ward JJ, McGuffin LJ, Buxton BF, Jones DT (2003) Secondary structure prediction with support vector machines. Bioinformatics 19:1650–1655
Wishart DS, Sykes BD, Richards FM (1992) The chemical-shift index - a fast and simple method for the assignment of protein secondary structure through NMR-spectroscopy. Biochemistry 31:1647–1651
Wuthrich K (1991) NMR with proteins and nucleic acids. Wiley, Chichester
Zvelebil MJ, Barton GJ, Taylor WR, Sternberg MJE (1987) Prediction of protein secondary structure and active-sites using the alignment of homologous sequences. J Mol Biol 195:957–961
Acknowledgements
This research was funded by National Key R&D Program of China (Grant Nos. 2018YFA0704002, 2018YFE0202300, 2017YFA0505400), National Natural Science Foundation of China (Grant Nos. 21735007, 21991080, 21921004), and CAS Key Research Program of Frontier Sciences (Grant No. QYZDJ-SSW-SLH027).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing financial interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
10858_2021_383_MOESM1_ESM.pdf
The Supplementary Information includes the PDB IDs, BMRB entries, residue numbers, and numbers of assigned backbone chemical shifts of the proteins used as training and test dataset, the protein secondary structure accuracies by three methods on test dataset, and the secondary structures information and prediction accuracies of the training and validation dataset. The Supplementary Information are available free of charge on publication’s website. Supplementary material 1 (PDF 2050.9 kb)
Rights and permissions
About this article
Cite this article
Miao, Z., Wang, Q., Xiao, X. et al. CSI-LSTM: a web server to predict protein secondary structure using bidirectional long short term memory and NMR chemical shifts. J Biomol NMR 75, 393–400 (2021). https://doi.org/10.1007/s10858-021-00383-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10858-021-00383-9