Skip to main content

Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 6685)

Abstract

In this paper we evaluate the performance of machine learning methods in the task of predicting the bonding state of cysteines starting from protein sequences. This task is the first step for the identification of disulfide bonds in proteins. We score the performance of three different approaches: 1) Hidden Support Vector Machines (HSVMs) which integrate the SVM predictions with a Hidden Markov Model; 2) SVM-HMMs which discriminatively train models that are isomorphic to a kth-order hidden Markov model; 3) Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) that we recently introduced. We evaluate two different encoding schemes based on sequence profile and position specific scoring matrix (PSSM) as computed with the PSI-BLAST program and we show that when the evolutionary information is encoded with PSSM all the methods perform better than with sequence profile. Among the different methods it appears that GRHCRFs perform slightly better than the others achieving a per protein accuracy of 87% with a Matthews correlation coefficient (C) of 0.73. Finally, we investigate the difference between disulfide bonding state predictions in Eukaryotes and Prokaryotes. Our analysis shows that the per-protein accuracy in Prokaryotic proteins is higher than that in Eukaryotes (0.88 vs 0.83). However, given the paucity of bonded cysteines in Prokaryotes as compared to Eukaryotes the Matthews correlation coefficient is drastically reduced (0.48 vs 0.80).

Keywords

  • Machine Learning
  • Conditional Random Fields
  • Disulfide Prediction
  • Disulfide Bonding State
  • Protein Structure Prediction
  • Protein Folding

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-21946-7_8
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-21946-7
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. of Mol. Biol. 213(3), 403–410 (1990)

    CrossRef  Google Scholar 

  2. Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov Support Vector Machines. In: Twentieth International Conference on Machine Learning (ICML 2003), Washington DC (2003)

    Google Scholar 

  3. Baldi, P., Cheng, J., Vullo, A.: Large-scale prediction of disulphide bond connectivity. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 97–104. MIT Press, Cambridge (2005)

    Google Scholar 

  4. Byrd, R.H., Lu, P., Nocedal, J.: A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific and Statistical Computing 16(5), 1190–1208 (1995)

    MathSciNet  CrossRef  MATH  Google Scholar 

  5. Ceroni, A., Passerini, A., Vullo, A., Frasconi, P.: DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server. Nucleic Acids Research 34 (Web Server), W177–W181 (2006)

    Google Scholar 

  6. Chang, C.-C., Lin, C.-J.: LIBSVM : a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Chen, Y.C., Lin, Y.S., Lin, C.J., Hwang, J.K.: Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins: Structure, Function, and Bioinformatics 55(4), 1036–1042 (2004)

    CrossRef  Google Scholar 

  8. Creighton, T.E.: Proteins: Structures and Molecular Properties. W.H. Freeman, New York (1992)

    Google Scholar 

  9. Derman, A.I., Beckwith, J.: Escherichia coli alkaline phosphatase fails to acquire disulfide bonds when retained in the cytoplasm. Journal of Bacteriology 173(23), 7719–7722 (1991)

    CrossRef  Google Scholar 

  10. Fariselli, P., Riccobelli, P., Casadio, R.: Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Protein: Structure, Function, and Bioinformatics 36(3), 340–346 (1999)

    CrossRef  Google Scholar 

  11. Fariselli, P., Martelli, P.L., Casadio, R.: A neural network based method for predicting the disulfide connectivity in proteins. In: Damiani, E., et al. (eds.) Knowledge Based Intelligent Information Engineering Systems and Allied Technologies (KES 2002), vol. 1, pp. 464–468. IOS Press, Amsterdam (2002)

    Google Scholar 

  12. Fariselli, P., Savojardo, C., Martelli, P.L., Casadio, R.: Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications. Algorithms for Molecular Biology 4(13) (2009)

    Google Scholar 

  13. Fiser, A., Simon, I.: Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics 16(3), 251–256 (2000)

    CrossRef  Google Scholar 

  14. Kadokura, H., Katzen, F., Beckwith, J.: Protein disulfide bond formation in prokaryotes. Annual Review of Biochemistry 72, 111–135 (2003)

    CrossRef  Google Scholar 

  15. Joachims, T.: SVM-HMM (2010), http://www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html

  16. Liu, H.-L.: Recent Advances in Disulfide Connectivity Predictions. Current Bioinformatics 2(1), 31–47 (2007)

    MathSciNet  CrossRef  Google Scholar 

  17. Martelli, P.L., Fariselli, P., Malaguti, L., Casadio, R.: Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. Protein Engineering Design and Selection 15(12), 951–953 (2002)

    CrossRef  Google Scholar 

  18. Mucchielli-Giorgi, M.H., Hazout, S., Tuffery, P.: Predicting the disulfide bonding state of cysteines using protein descriptors. Proteins: Structure, Function, and Bioinformatics 46(3), 243–249 (2002)

    CrossRef  Google Scholar 

  19. Nakamoto, H., Bardwell, J.C.A.: Catalysis of disulfide bond formation and isomerization in the bacterial periplasm. Biochimica et Biophysica Acta 1694(1-3), 111–119 (2004)

    CrossRef  Google Scholar 

  20. Sevier, C.S., Qu, H., Heldman, N., Gross, E., Fass, D., Kaiser, C.A.: Modulation of cellular disulfide-bond formation and the ER redox environment by feedback regulation of Ero1. Cell 129(2), 333–344 (2007)

    CrossRef  Google Scholar 

  21. Song, J.N., Wang, M.L., Li, W.J., Xu, W.B.: Prediction of the disulfide-bonding state of cysteines in proteins based on dipeptide composition. Biochemical and Biophysical Research Communications 318(1), 142–147 (2004)

    CrossRef  Google Scholar 

  22. Tsochataridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  23. Vincent, M., Passerini, A., Labb, M., Frasconi, P.: A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics 9(20) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Savojardo, C., Fariselli, P., Martelli, P.L., Shukla, P., Casadio, R. (2011). Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods. In: Rizzo, R., Lisboa, P.J.G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2010. Lecture Notes in Computer Science(), vol 6685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21946-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21946-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21945-0

  • Online ISBN: 978-3-642-21946-7

  • eBook Packages: Computer ScienceComputer Science (R0)