Comparing Sequence Classification Algorithms for Protein Subcellular Localization

  • Fabrizio Costa
  • Sauro Menchetti
  • Paolo Frasconi
Part of the Studies in Computational Intelligence book series (SCI, volume 77)

We discuss and experimentally compare several alternative classification algorithms for biological sequences. The methods presented in this chapter are all essentially based on different forms of statistical learning, ranging from support vector machines with string kernels, to nearest neighbour using biologically motivated distances. We report about an extensive comparison of empirical results for the problem of protein subcellular localization.


Polypeptide Macromolecule Convolution Sorting Archaea 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: A large margin approach. In: Twenty Second International Conference on Machine Learning (ICML05), Bonn, Germany (2005)Google Scholar
  2. 2.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector ma-chine learning for interdependent and structured output spaces. In: International Conference on Machine Learning (ICML04). (2004)Google Scholar
  3. 3.
    Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)Google Scholar
  4. 4.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2 (2002) 419-444zbMATHCrossRefGoogle Scholar
  5. 5.
    Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symposium on Biocomputing. (2002) 566-575Google Scholar
  6. 6.
    Cortes, C., Haffner, P., Mohri, M.: Rational kernels: Theory and algorithms. J. of Machine Learning Research 5 (2004) 1035-1062MathSciNetGoogle Scholar
  7. 7.
    Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS 14. (2001)625-632Google Scholar
  8. 8.
    Vishwanathan, S., Smola, A.: Fast kernels on strings and trees. In: Advances in Neural Information Processing Systems 2002. (2002)Google Scholar
  9. 9.
    Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of ICML’03. (2003)Google Scholar
  10. 10.
    Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5(1) (2003) 49-58Google Scholar
  11. 11.
    Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data. Mach. Learning 57(3) (2004) 205-232zbMATHCrossRefGoogle Scholar
  12. 12.
    Passerini, A., Frasconi, P.: Kernels on prolog ground terms. In: Int. Joint Conf. on Artificial Intelligence (IJCAI’05), Edinburgh (2005)Google Scholar
  13. 13.
    Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statisti-cal learning in the ILP setting. Journal of Machine Learning Research 7 (2006) 307-342MathSciNetGoogle Scholar
  14. 14.
    Ben-David, S., Eiron, N., Simon, H.U.: Limitations of learning via embeddings in euclidean half spaces. J. of Mach. Learning Research 3 (2002) 441-461CrossRefMathSciNetGoogle Scholar
  15. 15.
    Schölkopf, B., Weston, J., Eskin, E., Leslie, C.S., Noble, W.S.: A kernel approach for learning from almost orthogonal patterns. In: Proc. of ECML’02. (2002) 511-528Google Scholar
  16. 16.
    Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proc. Int. Conf. on Machine Learning (ICML’05). (2005)Google Scholar
  17. 17.
    Jaakkola, T., Diekhans, M., Haussler, D.: A Discrimitive Framework for De-tecting Remote Protein Homologies. J. of Comp. Biology 7(1-2) (2000) 95-114CrossRefGoogle Scholar
  18. 18.
    Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the sub-cellular location of proteins. Nucleic Acids Research 26(9) (1998) 2230-2236CrossRefGoogle Scholar
  19. 19.
    Chou, K.C., Elrod, D.: Prediction of membrane protein types and subcellular locations. Proteins 34 (1999) 137-153CrossRefGoogle Scholar
  20. 20.
    Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their n-terminal amino acid sequence. J Mol. Biol. 300 (2000) 1005-1016CrossRefGoogle Scholar
  21. 21.
    Hua, S., Sun, Z.: Support Vector Machine for Protein Subcellular Localization Prediction. Bioinformatics 17(8) (2001) 721-728 CrossRefGoogle Scholar
  22. 22.
    Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Science 11 (2002) 2836 - 2847CrossRefGoogle Scholar
  23. 23.
    Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20(4) (2004) 547-556CrossRefGoogle Scholar
  24. 24.
    Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 348(1) (2005) 85-100CrossRefGoogle Scholar
  25. 25.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge Univ. Press (2004)Google Scholar
  26. 26.
    Varzi, A.: Parts, wholes, and part-whole relations: the prospects of mereotopol-ogy. Knowledge and Data Engineering 20 (1996) 259-286zbMATHCrossRefGoogle Scholar
  27. 27.
    Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and effi-cient alternatives. In Schölkopf, B., Warmuth, M.K., eds.: Proc. of COLT/Kernel ’03. (2003) 129-143Google Scholar
  28. 28.
    Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5 (2004) 819-844MathSciNetGoogle Scholar
  29. 29.
    Odone, F., Barla, A., Verri, A.: Building kernels from binary strings for image matching. IEEE Transactions on Image Processing 14(2) (2005) 169-180CrossRefMathSciNetGoogle Scholar
  30. 30.
    Wu, C., Berry, M., Shivakumar, S., McLarty, J.: Neural networks for full-scale protein sequence classification: Sequence en coding with singular value decom-position. Machine Learning 21(1) (1995) 177-193Google Scholar
  31. 31.
    Leslie, C., Eskin, E., Cohen, A., Weston, J., Stafford Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4) (2004) 467-476CrossRefGoogle Scholar
  32. 32.
    Devos, D., Valencia, A.: Practical limits of function prediction. Proteins: Struc-ture, Function, and Genetics 41 (2000) 98-107CrossRefGoogle Scholar
  33. 33.
    Webb, E.C.: Enzyme nomenclature 1992 : recommendations of the nomenclature committee of the international union of biochemistry and molecular biology on the nomenclature and classification of enzymes. San Diego : Published for the International Union of Biochemistry and Molecular Biology by Academic Press (1992)Google Scholar
  34. 34.
    Lewis, S., Ashburner, M., Reese, M.: Annotating eukaryote genomes. Current Opinion in Structural Biology 10(3) (2000) 349-354CrossRefGoogle Scholar
  35. 35.
    Doolittle, R.: Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley California (1986)Google Scholar
  36. 36.
    Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: A basic local align-ment search tool. J Mol. Biol. 215 (1990) 403-410Google Scholar
  37. 37.
    Rost, B.: Twilight zone of protein sequence alignment. Protein Engineering 12(2) (1999) 85-94CrossRefMathSciNetGoogle Scholar
  38. 38.
    Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1) (1991) 56-68CrossRefGoogle Scholar
  39. 39.
    Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Res 31(1) (2003) 365-370CrossRefGoogle Scholar
  40. 40.
    Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of representative protein data sets. Protein Science 1 (1992) 409-417CrossRefGoogle Scholar
  41. 41.
    Mika, S., Rost, B.: Uniqueprot: creating sequence-unique protein data sets. Nucleic Acids Res. 31(13) (2003) 3789-3791 CrossRefGoogle Scholar
  42. 42.
    Liò, P., Vannucci, M.: Wavelet change-point prediction of transmembrane pro-teins. Bioinformatics 16(4) (2000) 376-382CrossRefGoogle Scholar
  43. 43.
    Chen, C., Rost, B.: State-of-the-art in membrane protein prediction. Applied Bioinformatics 1(1) (2002) 21-35Google Scholar
  44. 44.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25(17) (1997) 3389-3402CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Fabrizio Costa
    • 1
  • Sauro Menchetti
    • 1
  • Paolo Frasconi
    • 1
  1. 1.Machine Learning and Neural Networks Group Dipartimento di Sistemi e InformaticaUniversità degli Studi di FirenzeItaly

Personalised recommendations