An Automated Combination of Kernels for Predicting Protein Subcellular Localization
Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer.
Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs.
We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular localization, which are in agreement with biological reasoning. Data files, kernel matrices and open source software are available at http://www.fml.mpg.de/raetsch/projects/protsubloc .
KeywordsSupport Vector Machine Amino Acid Motif Multiple Kernel Learning Protein Subcellular Localization Multiclass Support Vector Machine
Unable to display preview. Download preview PDF.
- 1.Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)Google Scholar
- 6.Höglund, A., Dönnes, P., Blum, T., Adolph, H.-W., Kohlbacher, O.: MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition. Bioinfomatics (2006)Google Scholar
- 9.Zien, A., Ong, C.S.: Multiclass multiple kernel learning. In: International Conference on Machine Learning (2007)Google Scholar
- 10.Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. In: Proceedings of the National Academy of Sciences, pp. 10915–10919 (1992)Google Scholar
- 12.Cui, Q., Jiang, T., Liu, B., Ma, S.: Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics 5(66) (2004)Google Scholar
- 13.Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. In: Cowell, R., Ghahramani, Z. (eds.) Proceedings of AISTATS 2005, pp. 136–143 (2005)Google Scholar
- 15.Sonnenburg, S., Rätsch, G., Schäfer, C.: A general and efficient multiple kernel learning algorithm. In: Neural Information Processings Systems (2005)Google Scholar
- 24.Zien, A., Sonnenburg, S., Philips, P., Rätsch, G.: POIMS: Positional Oligomer Importance Matrices – Understanding Support Vector Machine Based Signal Detectors. In: Proceedings of the 16th International Conference on Intelligent Systems for Molecular Biology (2008)Google Scholar
- 25.Höglund, A., Blum, T., Brady, S., Dönnes, P., San Miguel, J., Rocheford, M., Kohlbacher, O., Shatkay, H.: Significantly improved prediction of subcellular localization by integrating text and protein sequence data. In: Pacific Symposium on Biocomputing, pp. 16–27 (2006)Google Scholar