g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines

  • Xiaonan Ji
  • James Bailey
  • Kotagiri Ramamohanarao
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)

Abstract

Classifying protein sequences has important applications in areas such as disease diagnosis, treatment development and drug design. In this paper we present a highly accurate classifier called the g-MARS (gapped Markov Chain with Support Vector Machine) protein classifier. It models the structure of a protein sequence by measuring the transition probabilities between pairs of amino acids. This results in a Markov chain style model for each protein sequence. Then, to capture the similarity among non-exactly matching protein sequences, we show that this model can be generalized to incorporate gaps in the Markov chain. We perform a thorough experimental study and compare g-MARS to several other state-of-the-art protein classifiers. Overall, we demonstrate that g-MARS has superior accuracy and operates efficiently on a diverse range of protein families.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001)Google Scholar
  2. 2.
    Cheng, B., Carbonell, J., Klein-Seetharaman, J.: Protein classification based on text document classification technique. PROTEINS: Structures, Function and Bioinformatics. 58, 955–970 (2005)CrossRefGoogle Scholar
  3. 3.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis-—Probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge (1998)CrossRefGoogle Scholar
  4. 4.
    Gardy, J.L., Laird, M.R., Chen, F., Rey, S., Walsh, C.J., Ester, M., Brinkman, F.S.L.: Psortb v.2.0: Expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21(5), 617–623 (2005)CrossRefPubMedGoogle Scholar
  5. 5.
  6. 6.
    Gromiha, M.M., Suwa, M.: A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics 21(7), 961–968 (2005)CrossRefPubMedGoogle Scholar
  7. 7.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  8. 8.
    Huang, S., Liu, R., Chen, C., Chao, Y., Chen., S.: Prediction of outer membrane proteins by support vector machines using combinations of gapped amino acid pair compositions. In: BIBE, pp. 113–120 (2005)Google Scholar
  9. 9.
    Jaakkola, T., Diekhans, M., Haussler, D.: Using the fisher kernel method to detect remote protein homologies. In: ISMB, pp. 149–158 (1999)Google Scholar
  10. 10.
    Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4) (2004)Google Scholar
  11. 11.
    Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symp. on Biocomputing, pp. 566–575 (2002)Google Scholar
  12. 12.
    Liu, Z.: Predicting protein subcellular localization from homologs using machine learning algorithms. Master’s thesis, Dept of Computer Science, University of Alberta (2002)Google Scholar
  13. 13.
    She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: KDD, pp. 436–445 (2003)Google Scholar
  14. 14.
    Wang, J., Hannenhalli, S.: Generalizations of markov model to characterize biological sequences. BMC Bioinformatics 6(219) (2005)Google Scholar
  15. 15.
    Zhou, S., Wang, K.: Localization site prediction for membrane proteins by integrating rule and svm classification. IEEE Trans. Knowl. Data Eng. 17(12), 1694–1705 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Xiaonan Ji
    • 1
  • James Bailey
    • 1
  • Kotagiri Ramamohanarao
    • 1
  1. 1.NICTA Victoria Laboratory Department of Computer Science and Software EngineeringUniversity of MelbourneAustralia

Personalised recommendations