Structured Output Prediction of Novel Enzyme Function with Reaction Kernels
Enzyme function prediction is an important problem in post-genomic bioinformatics, needed for reconstruction of metabolic networks of organisms. Currently there are two general methods for solving the problem: annotation transfer from a similar annotated protein, and machine learning approaches that treat the problem as classification against a fixed taxonomy, such as Gene Ontology or the EC hierarchy. These methods are suitable in cases where the function of the new protein is indeed previously characterized and included in the taxonomy. However, given a new function that is not previously described, these approaches are not of significant assistance to the human expert. The goal of this paper is to bring forward structured output learning approaches for the case where the exactly correct function of the enzyme to be annotated may not be contained in the training set. Our approach hinges on fine-grained representation of the enzyme function via the so called reaction kernels that allow interpolation and extrapolation in the output (reaction) space. A kernel-based structured output prediction model is used to predict enzymatic reactions from sequence motifs. We bring forward several choices for constructing reaction kernels and experiment with them in the remote homology case where the functions in the test set have not been seen in the training phase.
KeywordsKernel Density Estimation Polynomial Kernel Output Kernel Protein Function Prediction Annotation Transfer
Unable to display preview. Download preview PDF.
- 1.Astikainen, K., Holm, L., Pitknen, E., Szedmak, S., Rousu, J.: Towards structured output prediction of enzyme function. In: BMC Proceedings, vol. 2(S4), S2 (2008)Google Scholar
- 3.Blockeel, H., Schietgat, L., Struyf, J., Džeroski, S., Clare, A.: Decision trees for hierarchical multilabel classification: A case study in functional genomics. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 18–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
- 6.Gartner, T.: A survey of kernels for structured data. SIGKDD Explorations 5 (2003)Google Scholar
- 9.Heger, A., Mallick, S., Wilton, C., Holm, L.: The global trace graph, a novel paradigm for searching protein sequence databases. Bioinformatics 23(18) (2007)Google Scholar
- 10.Heinonen, M., Lappalainen, S., Mielikäinen, T., Rousu, J.: Computing Atom Mappings for Biochemical Reactions without Subgraph Isomorphism. Journal of Computational Biology (to appear 2011)Google Scholar
- 12.Lanckriet, G., Deng, M., Cristianini, N., et al.: Kernel-based data fusion and its application to protein function prediction in yeast. In: PSB 2004 (2004)Google Scholar
- 15.Punta, M., Ofran, Y.: The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Computational Biology 4(10) (2008)Google Scholar
- 17.Rousu, J., Saunders, C., Szedmak, S., Shawe-Taylor, J.: Kernel-based learning of hierarchical multilabel classification models. JMLR 7 (2006)Google Scholar
- 19.Sokolov, A., Ben-Hur, A.: A structured-outputs method for prediction of protein function. In: Proceedings of the 3rd International Workshop on Machine Learning in Systems Biology (2008)Google Scholar
- 20.Szedmak, S., Shawe-Taylor, J., Parado-Hernandez, E.: Learning via linear operators: Maximum margin regression. Tech. rep., Pascal (2005)Google Scholar
- 21.Taskar, B., Guestrin, C., Koller, D.: Max-margin markov networks. In: NIPS 2003 (2004)Google Scholar
- 22.Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML (2004)Google Scholar