Skip to main content

Comparing Sequence Classification Algorithms for Protein Subcellular Localization

  • Chapter
  • 1032 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 77))

We discuss and experimentally compare several alternative classification algorithms for biological sequences. The methods presented in this chapter are all essentially based on different forms of statistical learning, ranging from support vector machines with string kernels, to nearest neighbour using biologically motivated distances. We report about an extensive comparison of empirical results for the problem of protein subcellular localization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: A large margin approach. In: Twenty Second International Conference on Machine Learning (ICML05), Bonn, Germany (2005)

    Google Scholar 

  2. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector ma-chine learning for interdependent and structured output spaces. In: International Conference on Machine Learning (ICML04). (2004)

    Google Scholar 

  3. Haussler, D.: Convolution kernels on discrete structures. Technical Report UCSC-CRL-99-10, University of California, Santa Cruz (1999)

    Google Scholar 

  4. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2 (2002) 419-444

    Article  MATH  Google Scholar 

  5. Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symposium on Biocomputing. (2002) 566-575

    Google Scholar 

  6. Cortes, C., Haffner, P., Mohri, M.: Rational kernels: Theory and algorithms. J. of Machine Learning Research 5 (2004) 1035-1062

    MathSciNet  Google Scholar 

  7. Collins, M., Duffy, N.: Convolution kernels for natural language. In: NIPS 14. (2001)625-632

    Google Scholar 

  8. Vishwanathan, S., Smola, A.: Fast kernels on strings and trees. In: Advances in Neural Information Processing Systems 2002. (2002)

    Google Scholar 

  9. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of ICML’03. (2003)

    Google Scholar 

  10. Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5(1) (2003) 49-58

    Google Scholar 

  11. Gärtner, T., Lloyd, J., Flach, P.: Kernels and distances for structured data. Mach. Learning 57(3) (2004) 205-232

    Article  MATH  Google Scholar 

  12. Passerini, A., Frasconi, P.: Kernels on prolog ground terms. In: Int. Joint Conf. on Artificial Intelligence (IJCAI’05), Edinburgh (2005)

    Google Scholar 

  13. Passerini, A., Frasconi, P., De Raedt, L.: Kernels on prolog proof trees: Statisti-cal learning in the ILP setting. Journal of Machine Learning Research 7 (2006) 307-342

    MathSciNet  Google Scholar 

  14. Ben-David, S., Eiron, N., Simon, H.U.: Limitations of learning via embeddings in euclidean half spaces. J. of Mach. Learning Research 3 (2002) 441-461

    Article  MathSciNet  Google Scholar 

  15. Schölkopf, B., Weston, J., Eskin, E., Leslie, C.S., Noble, W.S.: A kernel approach for learning from almost orthogonal patterns. In: Proc. of ECML’02. (2002) 511-528

    Google Scholar 

  16. Menchetti, S., Costa, F., Frasconi, P.: Weighted decomposition kernels. In: Proc. Int. Conf. on Machine Learning (ICML’05). (2005)

    Google Scholar 

  17. Jaakkola, T., Diekhans, M., Haussler, D.: A Discrimitive Framework for De-tecting Remote Protein Homologies. J. of Comp. Biology 7(1-2) (2000) 95-114

    Article  Google Scholar 

  18. Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the sub-cellular location of proteins. Nucleic Acids Research 26(9) (1998) 2230-2236

    Article  Google Scholar 

  19. Chou, K.C., Elrod, D.: Prediction of membrane protein types and subcellular locations. Proteins 34 (1999) 137-153

    Article  Google Scholar 

  20. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their n-terminal amino acid sequence. J Mol. Biol. 300 (2000) 1005-1016

    Article  Google Scholar 

  21. Hua, S., Sun, Z.: Support Vector Machine for Protein Subcellular Localization Prediction. Bioinformatics 17(8) (2001) 721-728

    Article  Google Scholar 

  22. Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Science 11 (2002) 2836 - 2847

    Article  Google Scholar 

  23. Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20(4) (2004) 547-556

    Article  Google Scholar 

  24. Nair, R., Rost, B.: Mimicking cellular sorting improves prediction of subcellular localization. J Mol Biol 348(1) (2005) 85-100

    Article  Google Scholar 

  25. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge Univ. Press (2004)

    Google Scholar 

  26. Varzi, A.: Parts, wholes, and part-whole relations: the prospects of mereotopol-ogy. Knowledge and Data Engineering 20 (1996) 259-286

    Article  MATH  Google Scholar 

  27. Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: Hardness results and effi-cient alternatives. In Schölkopf, B., Warmuth, M.K., eds.: Proc. of COLT/Kernel ’03. (2003) 129-143

    Google Scholar 

  28. Jebara, T., Kondor, R., Howard, A.: Probability product kernels. J. Mach. Learn. Res. 5 (2004) 819-844

    MathSciNet  Google Scholar 

  29. Odone, F., Barla, A., Verri, A.: Building kernels from binary strings for image matching. IEEE Transactions on Image Processing 14(2) (2005) 169-180

    Article  MathSciNet  Google Scholar 

  30. Wu, C., Berry, M., Shivakumar, S., McLarty, J.: Neural networks for full-scale protein sequence classification: Sequence en coding with singular value decom-position. Machine Learning 21(1) (1995) 177-193

    Google Scholar 

  31. Leslie, C., Eskin, E., Cohen, A., Weston, J., Stafford Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4) (2004) 467-476

    Article  Google Scholar 

  32. Devos, D., Valencia, A.: Practical limits of function prediction. Proteins: Struc-ture, Function, and Genetics 41 (2000) 98-107

    Article  Google Scholar 

  33. Webb, E.C.: Enzyme nomenclature 1992 : recommendations of the nomenclature committee of the international union of biochemistry and molecular biology on the nomenclature and classification of enzymes. San Diego : Published for the International Union of Biochemistry and Molecular Biology by Academic Press (1992)

    Google Scholar 

  34. Lewis, S., Ashburner, M., Reese, M.: Annotating eukaryote genomes. Current Opinion in Structural Biology 10(3) (2000) 349-354

    Article  Google Scholar 

  35. Doolittle, R.: Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley California (1986)

    Google Scholar 

  36. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: A basic local align-ment search tool. J Mol. Biol. 215 (1990) 403-410

    Google Scholar 

  37. Rost, B.: Twilight zone of protein sequence alignment. Protein Engineering 12(2) (1999) 85-94

    Article  MathSciNet  Google Scholar 

  38. Sander, C., Schneider, R.: Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1) (1991) 56-68

    Article  Google Scholar 

  39. Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., Schneider, M.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Res 31(1) (2003) 365-370

    Article  Google Scholar 

  40. Hobohm, U., Scharf, M., Schneider, R., Sander, C.: Selection of representative protein data sets. Protein Science 1 (1992) 409-417

    Article  Google Scholar 

  41. Mika, S., Rost, B.: Uniqueprot: creating sequence-unique protein data sets. Nucleic Acids Res. 31(13) (2003) 3789-3791

    Article  Google Scholar 

  42. Liò, P., Vannucci, M.: Wavelet change-point prediction of transmembrane pro-teins. Bioinformatics 16(4) (2000) 376-382

    Article  Google Scholar 

  43. Chen, C., Rost, B.: State-of-the-art in membrane protein prediction. Applied Bioinformatics 1(1) (2002) 21-35

    Google Scholar 

  44. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25(17) (1997) 3389-3402

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Costa, F., Menchetti, S., Frasconi, P. (2007). Comparing Sequence Classification Algorithms for Protein Subcellular Localization. In: Hammer, B., Hitzler, P. (eds) Perspectives of Neural-Symbolic Integration. Studies in Computational Intelligence, vol 77. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73954-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73954-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73953-1

  • Online ISBN: 978-3-540-73954-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics