Classification of Phylogenetic Profiles for Protein Function Prediction: An SVM Approach

  • Appala Raju Kotaru
  • Ramesh C. Joshi
Part of the Communications in Computer and Information Science book series (CCIS, volume 40)


Predicting the function of an uncharacterized protein is a major challenge in post-genomic era due to problems complexity and scale. Having knowledge of protein function is a crucial link in the development of new drugs, better crops, and even the development of biochemicals such as biofuels. Recently numerous high-throughput experimental procedures have been invented to investigate the mechanisms leading to the accomplishment of a protein’s function and Phylogenetic profile is one of them. Phylogenetic profile is a way of representing a protein which encodes evolutionary history of proteins. In this paper we proposed a method for classification of phylogenetic profiles using supervised machine learning method, support vector machine classification along with radial basis function as kernel for identifying functionally linked proteins. We experimentally evaluated the performance of the classifier with the linear kernel, polynomial kernel and compared the results with the existing tree kernel. In our study we have used proteins of the budding yeast saccharomyces cerevisiae genome. We generated the phylogenetic profiles of 2465 yeast genes and for our study we used the functional annotations that are available in the MIPS database. Our experiments show that the performance of the radial basis kernel is similar to polynomial kernel is some functional classes together are better than linear, tree kernel and over all radial basis kernel outperformed the polynomial kernel, linear kernel and tree kernel. In analyzing these results we show that it will be feasible to make use of SVM classifier with radial basis function as kernel to predict the gene functionality using phylogenetic profiles.


Protein function prediction support vector machine phylogenetic profiles 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rost, B., Liu, J., Nair, R., Wrzeszczynski, K.O., Ofran, Y.: Automatic prediction of protein function. Cellular and Molecular Life Sciences 60, 2637–2650 (2003)CrossRefGoogle Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schffer, A.A., Zhang, j., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389–3402 (1997)CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3-4), 281–297 (1999)CrossRefPubMedGoogle Scholar
  4. 4.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. U.S.A. 96(4), 4285–4288 (1999)CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nature Biotechnology 18(12), 1257–1261 (2000)CrossRefPubMedGoogle Scholar
  6. 6.
    Vert, J.P.: A tree kernel to analyze phylogenetic profiles. Bioinformatics 18(1), S276–S284 (2002)CrossRefGoogle Scholar
  7. 7.
    Enault, F., Suhre, K., Abergel, C., Poirot, O., Claverie, J.M.: Annotation of bacterial genomes using improved phylogenomic profiles. Bioinformatics 19(1), i105–i107 (2003)CrossRefGoogle Scholar
  8. 8.
    Wu, J., Kasif, S., Delisi, C.: Identification of functional links between genes using phylogenetic profiles. Bioinformatics 19(12), 1524–1530 (2003)CrossRefPubMedGoogle Scholar
  9. 9.
    Sun, J., Xu, J., Liu, Z., Liu, Q., Zhao, A., Shi, T., Li, Y.: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics 21(16), 3409–3415 (2005)CrossRefPubMedGoogle Scholar
  10. 10.
    Loganantharaj, R., Atwi, M.: Towards validating the hypothesis of phylogenetic profiling. BMC Bioinformatics 8(7), S25 (2007)CrossRefGoogle Scholar
  11. 11.
    Snitkin, E.S., Gustafson, A.M., Mellor, J., Wu, J., DeLisi, C.: Comparative assessment of performance and genome dependence among phylogenetic profiling methods. BMC Bioinformatics 7(420) (2006)Google Scholar
  12. 12.
    Zhao, X.-M.: Yong, W., Luonan, C., Kazuyuki, A.: Gene function prediction using labeled and unlabeled data. BMC Bioinformatics 9(57) (2008) Google Scholar
  13. 13.
    Vapnik, V.N.: The nature of stastical learning theory. Springer, New York (1995)CrossRefGoogle Scholar
  14. 14.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 169–184. MIT-Press, Cambridge (1999)Google Scholar
  15. 15.
    Mewes, H.W., Fridhman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkoetter, M., Rudd, S., Weil, B.: MIPS: a databse for genomes and proteins sequences. Nucleic Acids Research 30, 31–34 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Narra, K., Liao, L.: Use of extended phylogenetic profiles with e-values and support vector machines for protein family classification. International Journal of Computer and Information Science 6(1), 58–63 (2005)Google Scholar
  17. 17.
    Gribskov, M., Robinson, N.: Use of receiver operating characteristic (roc) analysis to evaluate sequence matching. Computers and Chemistry 20, 25–33 (1996)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Appala Raju Kotaru
    • 1
  • Ramesh C. Joshi
    • 1
  1. 1.Department of Electronics and Computer EngineeringIndian Institute of TechnologyRoorkeeIndia

Personalised recommendations