Abstract
A new kernel has been developed for vectors derived from a coding scheme of the tri-peptide composition for protein sequences. This kernel defines the sequence similarity through a mapping that transforms a tri-peptide coding vector into a new vector based on a matrix formed by the high BLOSUM scores associated with pairs of tri-peptides. In conjunction with the use of support vector machines, the effectiveness of the new kernel is evaluated against the conventional coding schemes of k-peptide (k ≤ 3) for the prediction of subcellular localizations of proteins in Gram-negative bacteria. It is demonstrated that the new method outperforms all the other methods in a 5-fold cross-validation.
This research is partially supported by National Science Foundation (EIA-022-0301) and Naval Research Laboratory (N00173-03-1-G016). The authors are thankful for Deepa Vijayraghavan for the assistant with computing environment.
Chapter PDF
Similar content being viewed by others
Keywords
References
Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., Miyano, S.: Extensive feature detection of N-terminal protein sorting signals. Bioinformatics 18, 298–305 (2002)
Cai, Y.D., Chou, K.C.: Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 20, 1151–1156 (2003)
Chou, K.C., Cai, Y.D.: Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location. J. Biol. Chem. 277, 45765–45769 (2002)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300, 1005–1016 (2000)
Emanuelsson, O.: Predicting protein subcellular localisation from amino acid sequence information. Brief. Bioinform. 3, 361–376 (2002)
Feng, Z.P.: Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 58, 491–499 (2001)
Gardy, J.L., et al.: PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 31, 3613–3617 (2003)
von Heijne, G.: Signals for protein targeting into and across membranes. Subcell. Biochem. 22, 1–19 (1994)
Horton, P., Nakai, K.: PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34–36 (1999)
Hua, S., Sun, Z.: Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721–728 (2001)
Joachims, T.: Making Large Scale SVM Learning Practical. Advances in Kernel Methods-Support vector learning. MIT Press, Cambridge (1999)
Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)
Lu, Z., Szafron, D., Greiner, R., Lu, P., Wishart, D.S., Poulin, B., Anvik, J., Macdonell, C., Eisner, R.: Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20, 547–556 (2004)
Menne, K.M.L., Hermjakob, H., Apweiler, R.: A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16, 741–742 (2000)
Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Sci. 11, 2836–2847 (2002)
Nakai, K.: Protein sorting signals and prediction of subcellular localization. Adv. Protein. Chem. 54, 277–344 (2000)
Nakai, K., Kanehisa, M.: Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11, 95–110 (1991)
Nielsen, H., Engelbrecht, J., Brunak, S., von Heijne, G.: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural Syst. 8, 581–599 (1997)
Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230–2236 (1998)
Tusnady, G.E., Simon, I.: Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506 (1998)
Tusnady, G.E., Simon, I.: The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849–850 (2001)
Yu, C.S., Lin, C.J., Hwang, J.K.: Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402–1406 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lei, Z., Dai, Y. (2005). A New Kernel Based on High-Scored Pairs of Tri-peptides and Its Application in Prediction of Protein Subcellular Localization. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds) Computational Science – ICCS 2005. ICCS 2005. Lecture Notes in Computer Science, vol 3515. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428848_115
Download citation
DOI: https://doi.org/10.1007/11428848_115
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26043-1
Online ISBN: 978-3-540-32114-9
eBook Packages: Computer ScienceComputer Science (R0)