Abstract
Prediction of transmembrane (TM) proteins from their sequence facilitates functional study of genomes and the search of potential membrane-associated therapeutic targets. Computational methods for predicting TM sequences have been developed. These methods achieve high prediction accuracy for many TM proteins but some of these methods are less effective for specific class of TM proteins. Moreover, their performance has been tested by using a relatively small set of TM and non-membrane (NM) proteins. Thus it is useful to evaluate TM protein prediction methods by using a more diverse set of proteins and by testing their performance on specific classes of TM proteins. This work extensively evaluated the capability of support vector machine (SVM) classification systems for the prediction of TM proteins and those of several TM classes. These SVM systems were trained and tested by using 14962 TM and 12168 NM proteins from Pfam protein families. An independent set of 3389 TM and 6063 NM proteins from curated Pfam families were used to further evaluate the performance of these SVM systems. 90.1% and 86.7% of TM and NM proteins were correctly predicted respectively, which are comparable to those from other studies. The prediction accuracies for proteins of specific TM classes are 95.6%, 90.0%, 92.7% and 73.9% for G-protein coupled receptors, envelope proteins, outer membrane proteins, and transporters/channels respectively; and 98.1%, 99.5%, 86.4%, and 98.6% for non-G-protein coupled receptors, non-envelope proteins, non-outer membrane proteins, and non-transporters/non-channels respectively. Tested by using a significantly larger number and more diverse range of proteins than in previous studies, SVM systems appear to be capable of prediction of TM proteins and proteins of specific TM classes at accuracies comparable to those from previous studies. Our SVM systems – SVMProt, can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.
Chapter PDF
Similar content being viewed by others
References
Stack, J.H., Horazdovsky, B., Emr, S.D.: Receptor-mediated Protein Sorting to the Vacuole in Yeast: Roles for a Protein Kinase, a Lipid Kinase and GTP-binding Proteins. Annu. Rev. Cell Dev. Biol. 11, 1–33 (1995)
Le Borgne, R., Hoflack, B.: Protein Transport from the Secretory to the Endocytic Pathway in Mammalian Cells. Biochim. Biophys. Acta 1404, 195–209 (1998)
Chen, X., Schnell, D.J.: Protein Import into Chloroplasts. Trends Cell Biol. 9, 222–227 (1999)
Thanassi, D.G., Hutltgren, S.J.: Multiple Pathways Allow Protein Secretion Across the Bacterial Outer Membrane. Curr. Opin. Cell Biol. 12, 420–430 (2000)
Heusser, C., Jardieu, P.: Therapeutic Potential of Anti-IgE Antibodies. Curr. Opin. Immunol. 9, 805–813 (1997)
Saragovi, H.U., Gehring, K.: Development of Pharmacological Agents for Targeting Neurotrophins and their Receptors. Trends Pharmacol. Sci. 21, 93–98 (2000)
Sedlacek, H.H.: Kinase Inhibitors in Cancer Therapy: A Look Ahead. Drugs 59, 435–476 (2000)
Zhang, L., Brett, C.M., Giacommi, K.M.: Role of Organic Cation Transporters in Drug Absorption and Elimination. Annu. Rev. Pharmacol. Toxicol. 38, 431–460 (1998)
Tamai, I., Tsuji, A.: Transporter-mediated Permeation of Drugs Across the Blood-brain Barrier. J. Pharmaceut. Sci. 89, 1371–1388 (2000)
McGovern, K., Ehrmann, M., Beckwith, J.: Decoding Signals for Membrane Proteins using Alkaline Phosphatase Fusions. EMBO J. 10, 2773–2782 (1991)
Amstutz, P., Forrer, P., Zahnd, C., Pluckthun, A.: In Vitro Display Technologies: Novel Developments and Applications. Curr. Opin. Biotechnol. 12, 400–405 (2001)
Wallin, E., von Heijne, G.: Genome-wide Analysis of Integral Membrane Proteins from Eubacterial, Archaean, and Eukaryotic Organisms. Protein Sci. 7, 1029–1038 (1998)
Chen, C.P., Kernytsky, A., Rost, B.: Transmembrane Helix Predictions Revisited. Protein Sci. 11, 2774–2791 (2002)
Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting Transmembrane Protein Topology with A Hidden Markov Model: Application to complete genomes. J. Mol. Biol. 305, 567–580 (2001)
Cai, Y.D., Zhou, G.P., Chou, K.C.: Support Vector Machine for Predicting Membrane Protein Types by using Functional Domain Composition. Biophys. J. 84, 3257–3263 (2003)
Gromiha, M.M., Ahmad, S., Suwa, M.: Neural Network-based Prediction of Transmembrane -strand Segments in Outer Membrane Proteins. J. Comput. Chem. 25, 762–767 (2004)
Yuan, Z., Mattick, J.S., Teasdale, R.D.: SVMtm: Support Vector Machines to Predict Transmembrane Segments. J. Comput. Chem. 25, 632–636 (2004)
Cserzo, M., Eisenhaber, F., Eisenhaber, B., Simon, I.: On Filtering False Positive Transmembrane Protein Predictions. Protein Eng. 15, 745–752 (2002)
Bairoch, A., Apweiler, R.: The SWISS-PROT Protein Sequence Database And Its Supplement Tremble In 2000. Nucleic Acids Res. 28, 45–48 (2000)
Saier, M.H.: A functional-phylogenetic Classification System for Transmembrane Solute Transporters. Microbiol. Mol. Biol. Rev. 64, 354–411 (2000)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)
Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein Coupled Receptors with Support Vector Machines. Bioinformatics 18, 147–159 (2002)
Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from Its Primary Sequence. Nucleic Acids Res. 31, 3692–3697 (2003)
Cai, C.Z., Han, L.Y., Chen, Y.Z.: Enzyme Family Classification by Support Vector Machines. Proteins 55, 66–76 (2004)
Cai, C.Z., Wang, W.L., Sun, L.Z., Chen, Y.Z.: Protein Function Classification via Support Vector Machine Approach. Math. Biosci. 185, 111–122 (2003)
Cai, C.Z., Han, L.Y., Chen, X., et al.: Prediction of Functional Class of the SARS Coronavirus Proteins by a Statistical Learning Method. J. Proteome Res. 4, 1855–1862 (2005)
Han, L.Y., Cai, C.Z., Lo, S.L., et al.: Prediction of RNA-binding Proteins from Primary Sequence by a Support Vector Machine Approach. RNA 10, 355–368 (2004)
Han, L.Y., Cai, C.Z., Ji, Z.L., Chen, Y.Z.: Prediction of Functional Class of Novel Viral Proteins by a Statistical Learning Method Irrespective of Sequence Similarity. Virology 331, 136–143 (2005)
Han, L.Y., Cai, C.Z., Ji, Z.L., et al.: Predicting Functional Family of Novel Enzymes Irrespective of Sequence Similarity: a Statistical Learning Approach. Nucleic Acids Res. 32, 6437–6444 (2004)
Cui, J., Han, L.Y., Cai, C.Z., et al.: Prediction of Functional Class of Novel Bacterial Proteins without the Use of Sequence Similarity by a Statistical Learning Method. J. Mol. Microbiol. Biotechnol. 9, 86–100 (2005)
Lin, H.H., Han, L.Y., Cai, C.Z., Ji, Z.L., Chen, Y.Z.: Prediction of Transporter Family from Protein Sequence by Support Vector Machine Approach. Proteins 62, 218–231 (2006)
Bateman, A., Birney, E., Cerruti, L., et al.: The Pfam Protein Families Database. Nucleic Acids Res. 30, 276–280 (2002)
Bock, J.R., Gough, D.A.: Predicting Protein-protein Interactions from Primary Structure. Bioinformatics 17, 455–460 (2001)
Lo, S.L., Cai, C.Z., Chen, Y.Z., Chung, M.C.M.: Effect of Training Datasets on Support Vector Machine Prediction of Protein-protein Interactions. Proteomics 5, 876–884 (2005)
Cai, Y.D., Liu, X.J., Xu, X.B., Chou, K.C.: Support Vector Machines for Predicting HIV Protease Cleavage Sites in Protein. J. Comput. Chem. 23, 267–274 (2002)
Cai, C.Z., Wang, W.L., Chen, Y.Z.: Support Vector Machine Classification of Physical and Biological Datasets. Inter. J. Mod. Phys. C 14, 575–585 (2003)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the Accuracy of Prediction Algorithms for Classification: An Overview. Bioinformatics 16, 412–424 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cai, C.Z., Yuan, Q.F., Xiao, H.G., Liu, X.H., Han, L.Y., Chen, Y.Z. (2006). Prediction of Transmembrane Proteins from Their Primary Sequence by Support Vector Machine Approach. In: Huang, DS., Li, K., Irwin, G.W. (eds) Computational Intelligence and Bioinformatics. ICIC 2006. Lecture Notes in Computer Science(), vol 4115. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816102_56
Download citation
DOI: https://doi.org/10.1007/11816102_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37277-6
Online ISBN: 978-3-540-37282-0
eBook Packages: Computer ScienceComputer Science (R0)