Predicting the Disulfide Bonding State of Cysteines with Combinations of Kernel Machines

  • Alessio Ceroni
  • Paolo Frasconi
  • Andrea Passerini
  • Alessandro Vullo


Cysteines may form covalent bonds, known as disulfide bridges, that have an important role in stabilizing the native conformation of proteins. Several methods have been proposed for predicting the bonding state of cysteines, either using local context or using global protein descriptors. In this paper we introduce an SVM based predictor that operates in two stages. The first stage is a multi-class classifier that operates at the protein level, using either standard Gaussian or spectrum kernels. The second stage is a binary classifier that refines the prediction by exploiting local context enriched with evolutionary information in the form of multiple alignment profiles. At both stages, we enriched profile encoding with information about cysteine conservation. The prediction accuracy of the system is 85% measured by 5-fold cross validation, on a set of 716 proteins from the September 2001 PDB Select dataset.

bonding state of cysteines kernel machines machine learning structural genomics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    P. Fariselli, P. Riccobelli, and R. Casadio, "Role of Evolutionary Information in Predicting the Disulfide-Bonding State of Cysteine in Proteins," Proteins, vol. 36, 1999, pp. 340-346.CrossRefzbMATHGoogle Scholar
  2. 2.
    A. Fiser and I. Simon, "Predicting the Oxidation State of Cysteines by Multiple Sequence Alignment," Bioinformatics, vol. 16, no. 3, 2000, pp. 251-256.CrossRefGoogle Scholar
  3. 3.
    M. Mucchielli-Giorgi, S. Hazout, and P. Tuffery, "Predicting the Disulfide Bonding State of Cysteines Using Protein Descriptors," Proteins, vol. 46, 2002, pp. 243-249.CrossRefGoogle Scholar
  4. 4.
    P. Fariselli and R. Casadio, "Prediction of Disulfide Connectivity in Proteins," Bioinformatics, vol. 17, 2001, pp. 957-964.CrossRefGoogle Scholar
  5. 5.
    U. Hobohm and C. Sander, "Enlarged Representative Set of Protein Structures," Protein Science, vol. 3, 1994, pp. 522-524.CrossRefGoogle Scholar
  6. 6.
    C. Leslie, E. Eskin, and W. Noble, "The Spectrum Kernel: A String Kernel for SVM Protein Classification," in Proc. Pacific Symposium on Biocomputing, 2002, pp. 564-575.Google Scholar
  7. 7.
    C. Boutilier, N. Friedman, M. Goldszmidt, and D. Koller, "Context-Specific Independence in Bayesian Networks," in Prof. 12th Conf. on Uncertainty in Artificial Intelligence, Morgan Kaufmann, 1996, pp. 115-123.Google Scholar
  8. 8.
    V. Vapnik, Statistical Learning Theory, New York: John Wiley, 1998.zbMATHGoogle Scholar
  9. 9.
    J. Kwok, "Moderating the Outputs of Support Vector Machine Classifiers," IEEE Transactions on Neural Networks, vol. 10, no. 5, 1999, pp. 1018-1031.CrossRefGoogle Scholar
  10. 10.
    J. Platt, "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods," in Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Scholkopf, and D. Schuurmans (Eds.), MIT Press, 2000.Google Scholar
  11. 11.
    A. Passerini, M. Pontil, and P. Frasconi, "From Margins to Probabilities in Multiclass Learning Problems," in Proc. 15th European Conf. on Artificial Intelligence, F. van Harmelen (Ed.), 2002.Google Scholar
  12. 12.
    J. Bridle, "Probabilistic Interpretation of Feedforward Classifi-cation Network Outputs, with Relationships to Statistical Pattern Recognition," in Neuro-Computing: Algorithms, Architectures, and Applications, F. Fogelman-Soulie and J. H´erault (Eds.), Springer-Verlag, 1989.Google Scholar
  13. 13.
    R. Jacobs, M. Jordan, S. Nowlan, and G.E. Hinton, "Adaptive Mixtures of Local Experts," Neural Computation, vol. 3, no. 1, 1991, pp. 79-87.CrossRefGoogle Scholar
  14. 14.
    R. Collobert, S. Bengio, and Y. Bengio, "A Parallel Mixture of SVMs for Very Large Scale Problems," Neural Computation, vol. 14, no. 5, 2002.Google Scholar
  15. 15.
    D. Haussler, "Convolution Kernels on Discrete Structures," 1999.Google Scholar
  16. 16.
    D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997.Google Scholar
  17. 17.
    E. Ukkonen, "On-Line Construction of Suffix Trees," Algorithmica, vol. 14, no. 3, 1995, pp. 249-260.MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    W. Kabsch and C. Sander, "Dictionary of Protein Secondary Structure: Pattern Recognition of Hydrogen-Bonded and Geometrical Features," Biopolymers, vol. 22, 1983, pp. 2577-2637.CrossRefGoogle Scholar
  19. 19.
    R. Schneider, A. de Daruvar, and C. Sander, "The HSSP Database of Protein Structure-Sequence Alignments," Nucleic Acids Res., vol. 25, 1997, pp. 226-230.CrossRefGoogle Scholar
  20. 20.
    O. Bousquet and A. Elisseeff, "Stability and Generalization," Journal of Machine Learning Research, vol. 2, 2002.Google Scholar
  21. 21.
    P. Frasconi, M. Gori, and A. Sperduti, "A General Framework for Adaptive Processing of Data Structures," IEEE Trans. on Neural Networks, vol. 9, 1998, pp. 768-786.CrossRefGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Alessio Ceroni
    • 1
  • Paolo Frasconi
    • 1
  • Andrea Passerini
    • 1
  • Alessandro Vullo
    • 1
  1. 1.Dipartimento di Sistemi e InformaticaUniversità di FirenzeItaly

Personalised recommendations