Advertisement

Structured Output Prediction of Novel Enzyme Function with Reaction Kernels

  • Katja Astikainen
  • Liisa Holm
  • Esa Pitkänen
  • Sandor Szedmak
  • Juho Rousu
Part of the Communications in Computer and Information Science book series (CCIS, volume 127)

Abstract

Enzyme function prediction is an important problem in post-genomic bioinformatics, needed for reconstruction of metabolic networks of organisms. Currently there are two general methods for solving the problem: annotation transfer from a similar annotated protein, and machine learning approaches that treat the problem as classification against a fixed taxonomy, such as Gene Ontology or the EC hierarchy. These methods are suitable in cases where the function of the new protein is indeed previously characterized and included in the taxonomy. However, given a new function that is not previously described, these approaches are not of significant assistance to the human expert. The goal of this paper is to bring forward structured output learning approaches for the case where the exactly correct function of the enzyme to be annotated may not be contained in the training set. Our approach hinges on fine-grained representation of the enzyme function via the so called reaction kernels that allow interpolation and extrapolation in the output (reaction) space. A kernel-based structured output prediction model is used to predict enzymatic reactions from sequence motifs. We bring forward several choices for constructing reaction kernels and experiment with them in the remote homology case where the functions in the test set have not been seen in the training phase.

Keywords

Kernel Density Estimation Polynomial Kernel Output Kernel Protein Function Prediction Annotation Transfer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Astikainen, K., Holm, L., Pitknen, E., Szedmak, S., Rousu, J.: Towards structured output prediction of enzyme function. In: BMC Proceedings, vol. 2(S4), S2 (2008)Google Scholar
  2. 2.
    Barutcuoglu, Z., Schapire, R., Troyanskaya, O.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)CrossRefGoogle Scholar
  3. 3.
    Blockeel, H., Schietgat, L., Struyf, J., Džeroski, S., Clare, A.: Decision trees for hierarchical multilabel classification: A case study in functional genomics. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 18–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Borgwardt, K.M., Ong, C.S., Schnauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics 21(1), 47–56 (2005)CrossRefGoogle Scholar
  5. 5.
    Clare, A., King, R.: Machine learning of functional class from phenotype data. Bioinformatics 18(1), 160–166 (2002)CrossRefGoogle Scholar
  6. 6.
    Gartner, T.: A survey of kernels for structured data. SIGKDD Explorations 5 (2003)Google Scholar
  7. 7.
    Goto, S., Okuno, Y., Hattori, M., Nishioka, T., Kanehisa, M.: Ligand: database of chemical compounds and reactions in biological pathways. Nucleic Acids Research 30(1), 402 (2002)CrossRefGoogle Scholar
  8. 8.
    Heger, A., Korpelainen, E., Hupponen, T., Mattila, K., Ollikainen, V., Holm, L.: Pairsdb atlas of protein sequence space. Nucl. Acids Res. 36, D276–D280 (2008)CrossRefGoogle Scholar
  9. 9.
    Heger, A., Mallick, S., Wilton, C., Holm, L.: The global trace graph, a novel paradigm for searching protein sequence databases. Bioinformatics 23(18) (2007)Google Scholar
  10. 10.
    Heinonen, M., Lappalainen, S., Mielikäinen, T., Rousu, J.: Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism. Journal of Computational Biology (to appear 2011)Google Scholar
  11. 11.
    Holm, L., Sander, C.: Dali/fssp classification of three-dimensional protein folds. Nucleic Acids Research 25(1), 231–234 (1996)CrossRefGoogle Scholar
  12. 12.
    Lanckriet, G., Deng, M., Cristianini, N., et al.: Kernel-based data fusion and its application to protein function prediction in yeast. In: PSB 2004 (2004)Google Scholar
  13. 13.
    Pitkänen, E., Jouhten, P., Rousu, J.: Inferring branching pathways in genome-scale metabolic networks. BMC Systems Biology 3(1), 103 (2009)CrossRefGoogle Scholar
  14. 14.
    Pitkänen, E., Rousu, J., Ukkonen, E.: Computational methods for metabolic reconstruction. Current Opinion in Biotechnology 21, 70–77 (2010)CrossRefGoogle Scholar
  15. 15.
    Punta, M., Ofran, Y.: The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Computational Biology 4(10) (2008)Google Scholar
  16. 16.
    Rantanen, A., Rousu, J., Jouhten, P., Zamboni, N., Maaheimo, H., Ukkonen, E.: An analytic and systematic framework for estimating metabolic flux ratios from 13 C tracer experiments. BMC bioinformatics 9(1), 266 (2008)CrossRefGoogle Scholar
  17. 17.
    Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. JMLR 7 (2006)Google Scholar
  18. 18.
    Schlkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)CrossRefGoogle Scholar
  19. 19.
    Sokolov, A., Ben-Hur, A.: A structured-outputs method for prediction of protein function. In: Proceedings of the 3rd International Workshop on Machine Learning in Systems Biology (2008)Google Scholar
  20. 20.
    Szedmak, S., Shawe-Taylor, J., Parado-Hernandez, E.: Learning via linear operators: Maximum margin regression. Tech. rep., Pascal (2005)Google Scholar
  21. 21.
    Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: NIPS 2003 (2004)Google Scholar
  22. 22.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Katja Astikainen
    • 1
  • Liisa Holm
    • 2
  • Esa Pitkänen
    • 1
  • Sandor Szedmak
    • 3
  • Juho Rousu
    • 1
  1. 1.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland
  2. 2.Institute of Biotechnology and Department of Biological SciencesUniversity of HelsinkiHelsinkiFinland
  3. 3.Electronics and Computer ScienceUniversity of SouthamptonSouthamptonU.K.

Personalised recommendations