Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach
Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.
KeywordsProtein function prediction support vector machine phylogenetic profiles
Unable to display preview. Download preview PDF.
- 11.Snitkin, E.S., Gustafson, A.M., Mellor, J., Wu, J., DeLisi, C.: Comparative assessment of performance and genome dependence among phylogenetic profiling methods. BMC Bioinformatics 7(420) (2006)Google Scholar
- 12.Zhao, X.-M.: Yong, W., Luonan, C., Kazuyuki, A.: Gene function prediction using labeled and unlabeled data. BMC Bioinformatics 9(57) (2008) Google Scholar
- 14.Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 169–184. MIT-Press, Cambridge (1999)Google Scholar
- 16.Narra, K., Liao, L.: Use of extended phylogenetic profiles with e-values and support vector machines for protein family classification. International Journal of Computer and Information Science 6(1), 58–63 (2005)Google Scholar