Abstract
In this paper we evaluate the performance of machine learning methods in the task of predicting the bonding state of cysteines starting from protein sequences. This task is the first step for the identification of disulfide bonds in proteins. We score the performance of three different approaches: 1) Hidden Support Vector Machines (HSVMs) which integrate the SVM predictions with a Hidden Markov Model; 2) SVM-HMMs which discriminatively train models that are isomorphic to a kth-order hidden Markov model; 3) Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) that we recently introduced. We evaluate two different encoding schemes based on sequence profile and position specific scoring matrix (PSSM) as computed with the PSI-BLAST program and we show that when the evolutionary information is encoded with PSSM all the methods perform better than with sequence profile. Among the different methods it appears that GRHCRFs perform slightly better than the others achieving a per protein accuracy of 87% with a Matthews correlation coefficient (C) of 0.73. Finally, we investigate the difference between disulfide bonding state predictions in Eukaryotes and Prokaryotes. Our analysis shows that the per-protein accuracy in Prokaryotic proteins is higher than that in Eukaryotes (0.88 vs 0.83). However, given the paucity of bonded cysteines in Prokaryotes as compared to Eukaryotes the Matthews correlation coefficient is drastically reduced (0.48 vs 0.80).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. of Mol. Biol. 213(3), 403–410 (1990)
Altun, Y., Tsochantaridis, I., Hofmann, T.: Hidden Markov Support Vector Machines. In: Twentieth International Conference on Machine Learning (ICML 2003), Washington DC (2003)
Baldi, P., Cheng, J., Vullo, A.: Large-scale prediction of disulphide bond connectivity. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, pp. 97–104. MIT Press, Cambridge (2005)
Byrd, R.H., Lu, P., Nocedal, J.: A Limited Memory Algorithm for Bound Constrained Optimization. SIAM Journal on Scientific and Statistical Computing 16(5), 1190–1208 (1995)
Ceroni, A., Passerini, A., Vullo, A., Frasconi, P.: DISULFIND: a Disulfide Bonding State and Cysteine Connectivity Prediction Server. Nucleic Acids Research 34 (Web Server), W177–W181 (2006)
Chang, C.-C., Lin, C.-J.: LIBSVM : a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen, Y.C., Lin, Y.S., Lin, C.J., Hwang, J.K.: Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences. Proteins: Structure, Function, and Bioinformatics 55(4), 1036–1042 (2004)
Creighton, T.E.: Proteins: Structures and Molecular Properties. W.H. Freeman, New York (1992)
Derman, A.I., Beckwith, J.: Escherichia coli alkaline phosphatase fails to acquire disulfide bonds when retained in the cytoplasm. Journal of Bacteriology 173(23), 7719–7722 (1991)
Fariselli, P., Riccobelli, P., Casadio, R.: Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Protein: Structure, Function, and Bioinformatics 36(3), 340–346 (1999)
Fariselli, P., Martelli, P.L., Casadio, R.: A neural network based method for predicting the disulfide connectivity in proteins. In: Damiani, E., et al. (eds.) Knowledge Based Intelligent Information Engineering Systems and Allied Technologies (KES 2002), vol. 1, pp. 464–468. IOS Press, Amsterdam (2002)
Fariselli, P., Savojardo, C., Martelli, P.L., Casadio, R.: Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications. Algorithms for Molecular Biology 4(13) (2009)
Fiser, A., Simon, I.: Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics 16(3), 251–256 (2000)
Kadokura, H., Katzen, F., Beckwith, J.: Protein disulfide bond formation in prokaryotes. Annual Review of Biochemistry 72, 111–135 (2003)
Joachims, T.: SVM-HMM (2010), http://www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html
Liu, H.-L.: Recent Advances in Disulfide Connectivity Predictions. Current Bioinformatics 2(1), 31–47 (2007)
Martelli, P.L., Fariselli, P., Malaguti, L., Casadio, R.: Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. Protein Engineering Design and Selection 15(12), 951–953 (2002)
Mucchielli-Giorgi, M.H., Hazout, S., Tuffery, P.: Predicting the disulfide bonding state of cysteines using protein descriptors. Proteins: Structure, Function, and Bioinformatics 46(3), 243–249 (2002)
Nakamoto, H., Bardwell, J.C.A.: Catalysis of disulfide bond formation and isomerization in the bacterial periplasm. Biochimica et Biophysica Acta 1694(1-3), 111–119 (2004)
Sevier, C.S., Qu, H., Heldman, N., Gross, E., Fass, D., Kaiser, C.A.: Modulation of cellular disulfide-bond formation and the ER redox environment by feedback regulation of Ero1. Cell 129(2), 333–344 (2007)
Song, J.N., Wang, M.L., Li, W.J., Xu, W.B.: Prediction of the disulfide-bonding state of cysteines in proteins based on dipeptide composition. Biochemical and Biophysical Research Communications 318(1), 142–147 (2004)
Tsochataridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research 6, 1453–1484 (2005)
Vincent, M., Passerini, A., Labb, M., Frasconi, P.: A simplified approach to disulfide connectivity prediction from protein sequences. BMC Bioinformatics 9(20) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Savojardo, C., Fariselli, P., Martelli, P.L., Shukla, P., Casadio, R. (2011). Prediction of the Bonding State of Cysteine Residues in Proteins with Machine-Learning Methods. In: Rizzo, R., Lisboa, P.J.G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2010. Lecture Notes in Computer Science(), vol 6685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21946-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-21946-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21945-0
Online ISBN: 978-3-642-21946-7
eBook Packages: Computer ScienceComputer Science (R0)