Skip to main content

Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods

  • Conference paper
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2010)

Abstract

In this paper we evaluate the performance of machine learning methods in the task of predicting the bonding state of cysteines starting from protein sequences. This task is the first step for the identification of disulfide bonds in proteins. We score the performance of three different approaches: 1) Hidden Support Vector Machines (HSVMs) which integrate the SVM predictions with a Hidden Markov Model; 2) SVM-HMMs which discriminatively train models that are isomorphic to a kth-order hidden Markov model; 3) Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) that we recently introduced. We evaluate two different encoding schemes based on sequence profile and position specific scoring matrix (PSSM) as computed with the PSI-BLAST program and we show that when the evolutionary information is encoded with PSSM all the methods perform better than with sequence profile. Among the different methods it appears that GRHCRFs perform slightly better than the others achieving a per protein accuracy of 87% with a Matthews correlation coefficient (C) of 0.73. Finally, we investigate the difference between disulfide bonding state predictions in Eukaryotes and Prokaryotes. Our analysis shows that the per-protein accuracy in Prokaryotic proteins is higher than that in Eukaryotes (0.88 vs 0.83). However, given the paucity of bonded cysteines in Prokaryotes as compared to Eukaryotes the Matthews correlation coefficient is drastically reduced (0.48 vs 0.80).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. of Mol. Biol. 213(3), 403–410 (1990)

    Article  Google Scholar 

  2. Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov Support Vector Machines. In: Twentieth International Conference on Machine Learning (ICML 2003), Washington DC (2003)

    Google Scholar 

  3. Baldi, P., Cheng, J., Vullo, A.: Large-scale prediction of disulphide bond connectivity. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 97–104. MIT Press, Cambridge (2005)

    Google Scholar 

  4. Byrd, R.H., Lu, P., Nocedal, J.: A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific and Statistical Computing 16(5), 1190–1208 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ceroni, A., Passerini, A., Vullo, A., Frasconi, P.: DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server. Nucleic Acids Research 34 (Web Server), W177–W181 (2006)

    Google Scholar 

  6. Chang, C.-C., Lin, C.-J.: LIBSVM : a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  7. Chen, Y.C., Lin, Y.S., Lin, C.J., Hwang, J.K.: Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins: Structure, Function, and Bioinformatics 55(4), 1036–1042 (2004)

    Article  Google Scholar 

  8. Creighton, T.E.: Proteins: Structures and Molecular Properties. W.H. Freeman, New York (1992)

    Google Scholar 

  9. Derman, A.I., Beckwith, J.: Escherichia coli alkaline phosphatase fails to acquire disulfide bonds when retained in the cytoplasm. Journal of Bacteriology 173(23), 7719–7722 (1991)

    Article  Google Scholar 

  10. Fariselli, P., Riccobelli, P., Casadio, R.: Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Protein: Structure, Function, and Bioinformatics 36(3), 340–346 (1999)

    Article  Google Scholar 

  11. Fariselli, P., Martelli, P.L., Casadio, R.: A neural network based method for predicting the disulfide connectivity in proteins. In: Damiani, E., et al. (eds.) Knowledge Based Intelligent Information Engineering Systems and Allied Technologies (KES 2002), vol. 1, pp. 464–468. IOS Press, Amsterdam (2002)

    Google Scholar 

  12. Fariselli, P., Savojardo, C., Martelli, P.L., Casadio, R.: Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications. Algorithms for Molecular Biology 4(13) (2009)

    Google Scholar 

  13. Fiser, A., Simon, I.: Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics 16(3), 251–256 (2000)

    Article  Google Scholar 

  14. Kadokura, H., Katzen, F., Beckwith, J.: Protein disulfide bond formation in prokaryotes. Annual Review of Biochemistry 72, 111–135 (2003)

    Article  Google Scholar 

  15. Joachims, T.: SVM-HMM (2010), http://www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html

  16. Liu, H.-L.: Recent Advances in Disulfide Connectivity Predictions. Current Bioinformatics 2(1), 31–47 (2007)

    Article  MathSciNet  Google Scholar 

  17. Martelli, P.L., Fariselli, P., Malaguti, L., Casadio, R.: Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. Protein Engineering Design and Selection 15(12), 951–953 (2002)

    Article  Google Scholar 

  18. Mucchielli-Giorgi, M.H., Hazout, S., Tuffery, P.: Predicting the disulfide bonding state of cysteines using protein descriptors. Proteins: Structure, Function, and Bioinformatics 46(3), 243–249 (2002)

    Article  Google Scholar 

  19. Nakamoto, H., Bardwell, J.C.A.: Catalysis of disulfide bond formation and isomerization in the bacterial periplasm. Biochimica et Biophysica Acta 1694(1-3), 111–119 (2004)

    Article  Google Scholar 

  20. Sevier, C.S., Qu, H., Heldman, N., Gross, E., Fass, D., Kaiser, C.A.: Modulation of cellular disulfide-bond formation and the ER redox environment by feedback regulation of Ero1. Cell 129(2), 333–344 (2007)

    Article  Google Scholar 

  21. Song, J.N., Wang, M.L., Li, W.J., Xu, W.B.: Prediction of the disulfide-bonding state of cysteines in proteins based on dipeptide composition. Biochemical and Biophysical Research Communications 318(1), 142–147 (2004)

    Article  Google Scholar 

  22. Tsochataridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research 6, 1453–1484 (2005)

    MathSciNet  MATH  Google Scholar 

  23. Vincent, M., Passerini, A., Labb, M., Frasconi, P.: A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics 9(20) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Savojardo, C., Fariselli, P., Martelli, P.L., Shukla, P., Casadio, R. (2011). Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods. In: Rizzo, R., Lisboa, P.J.G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2010. Lecture Notes in Computer Science(), vol 6685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21946-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21946-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21945-0

  • Online ISBN: 978-3-642-21946-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics