g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines
Classifying protein sequences has important applications in areas such as disease diagnosis, treatment development and drug design. In this paper we present a highly accurate classifier called the g-MARS (gapped Markov Chain with Support Vector Machine) protein classifier. It models the structure of a protein sequence by measuring the transition probabilities between pairs of amino acids. This results in a Markov chain style model for each protein sequence. Then, to capture the similarity among non-exactly matching protein sequences, we show that this model can be generalized to incorporate gaps in the Markov chain. We perform a thorough experimental study and compare g-MARS to several other state-of-the-art protein classifiers. Overall, we demonstrate that g-MARS has superior accuracy and operates efficiently on a diverse range of protein families.
Unable to display preview. Download preview PDF.
- 1.Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001)Google Scholar
- 5.GPCRDB, http://www.gpcr.org
- 8.Huang, S., Liu, R., Chen, C., Chao, Y., Chen., S.: Prediction of outer membrane proteins by support vector machines using combinations of gapped amino acid pair compositions. In: BIBE, pp. 113–120 (2005)Google Scholar
- 9.Jaakkola, T., Diekhans, M., Haussler, D.: Using the fisher kernel method to detect remote protein homologies. In: ISMB, pp. 149–158 (1999)Google Scholar
- 10.Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4) (2004)Google Scholar
- 11.Leslie, C.S., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. In: Pacific Symp. on Biocomputing, pp. 566–575 (2002)Google Scholar
- 12.Liu, Z.: Predicting protein subcellular localization from homologs using machine learning algorithms. Master’s thesis, Dept of Computer Science, University of Alberta (2002)Google Scholar
- 13.She, R., Chen, F., Wang, K., Ester, M., Gardy, J.L., Brinkman, F.S.L.: Frequent-subsequence-based prediction of outer membrane proteins. In: KDD, pp. 436–445 (2003)Google Scholar
- 14.Wang, J., Hannenhalli, S.: Generalizations of markov model to characterize biological sequences. BMC Bioinformatics 6(219) (2005)Google Scholar