Improving Prediction of Zinc Binding Sites by Modeling the Linkage Between Residues Close in Sequence

  • Sauro Menchetti
  • Andrea Passerini
  • Paolo Frasconi
  • Claudia Andreini
  • Antonio Rosato
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3909)


We describe and empirically evaluate machine learning methods for the prediction of zinc binding sites from protein sequences. We start by observing that a data set consisting of single residues as examples is affected by autocorrelation and we propose an ad-hoc remedy in which sequentially close pairs of candidate residues are classified as being jointly involved in the coordination of a zinc ion. We develop a kernel for this particular type of data that can handle variable length gaps between candidate coordinating residues. Our empirical evaluation on a data set of non redundant protein chains shows that explicit modeling the correlation between residues close in sequence allows us to gain a significant improvement in the prediction performance.


Support Vector Machine Protein Data Bank Site Type Bonding State Metal Binding Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blom, N., Gammeltoft, S., Brunak, S.: Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362 (1999)CrossRefGoogle Scholar
  2. 2.
    Nielsen, H., Brunak, S., von Heijne, G.: Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng. 12(1), 3–9 (1999)CrossRefGoogle Scholar
  3. 3.
    Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6 (1997)CrossRefGoogle Scholar
  4. 4.
    Martelli, P.L., Fariselli, P., Casadio, R.: Prediction of disulfide-bonded cysteines in proteomes with a hidden neural network. Proteomics 4, 1665–1671 (2004)CrossRefGoogle Scholar
  5. 5.
    Fiser, A., Simon, I.: Predicting the oxidation state of cysteines by multiple sequence alignment. Bioinformatics 16, 251–256 (2000)CrossRefGoogle Scholar
  6. 6.
    Fariselli, P., Casadio, R.: Prediction of disulfide connectivity in proteins. Bioinformatics 17, 957–964 (2001)CrossRefGoogle Scholar
  7. 7.
    Vullo, A., Frasconi, P.: Disulfide connectivity prediction using recursive neural networks and evolutionary information. Bioinformatics 20, 653–659 (2004)CrossRefGoogle Scholar
  8. 8.
    Andreini, C., Bertini, I., Rosato, A.: A hint to search for metalloproteins in gene banks. Bioinformatics 20, 1373–1380 (2004)CrossRefGoogle Scholar
  9. 9.
    Passerini, A., Frasconi, P.: Learning to discriminate between ligand-bound and disulfide-bound cysteines. Protein Eng. Des. Sel. 17, 367–373 (2004)CrossRefGoogle Scholar
  10. 10.
    Rost, B., Sander, C.: Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc. Natl. Acad. Sci. U.S.A. 90, 7558–7562 (1993)CrossRefGoogle Scholar
  11. 11.
    Jensen, D., Neville, J.: Linkage and autocorrelation cause feature selection bias in relational learning. In: Proceedings of the Nineteenth International Conference on Machine Learning (ICML 2002) (2002)Google Scholar
  12. 12.
    Taskar, B., Abbeel, P., Koller, D.: Discriminative probabilistic models for relational data. In: Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco (2002)Google Scholar
  13. 13.
    Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)Google Scholar
  14. 14.
    Mika, S., Rost, B.: Uniqueprot: creating sequence-unique protein data sets. Nucleic Acids Res. 31, 3789–3791 (2003)CrossRefGoogle Scholar
  15. 15.
    Vallee, B.L., Auld, D.S.: Functional zinc-binding motifs in enzymes and DNA-binding proteins. Faraday Discuss, 47–65 (1992)Google Scholar
  16. 16.
    Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 1–25 (1995)Google Scholar
  17. 17.
    Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)Google Scholar
  18. 18.
    Shawe-Taylor, J., Cristianini, N.: Kernel methods for pattern analysis. Cambridge Univ. Press, Cambridge (2004)Google Scholar
  19. 19.
    Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (eds.) Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)Google Scholar
  20. 20.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sauro Menchetti
    • 1
  • Andrea Passerini
    • 1
  • Paolo Frasconi
    • 1
  • Claudia Andreini
    • 2
  • Antonio Rosato
    • 2
  1. 1.Machine Learning and Neural Networks Group, Dipartimento di Sistemi e InformaticaUniversità degli Studi di FirenzeItaly
  2. 2.Magnetic Resonance Center (CERM), Dipartimento di ChimicaUniversità degli Studi di FirenzeItaly

Personalised recommendations