Prediction of Transmembrane Proteins from Their Primary Sequence by Support Vector Machine Approach

  • C. Z. Cai
  • Q. F. Yuan
  • H. G. Xiao
  • X. H. Liu
  • L. Y. Han
  • Y. Z. Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4115)


Prediction of transmembrane (TM) proteins from their sequence facilitates functional study of genomes and the search of potential membrane-associated therapeutic targets. Computational methods for predicting TM sequences have been developed. These methods achieve high prediction accuracy for many TM proteins but some of these methods are less effective for specific class of TM proteins. Moreover, their performance has been tested by using a relatively small set of TM and non-membrane (NM) proteins. Thus it is useful to evaluate TM protein prediction methods by using a more diverse set of proteins and by testing their performance on specific classes of TM proteins. This work extensively evaluated the capability of support vector machine (SVM) classification systems for the prediction of TM proteins and those of several TM classes. These SVM systems were trained and tested by using 14962 TM and 12168 NM proteins from Pfam protein families. An independent set of 3389 TM and 6063 NM proteins from curated Pfam families were used to further evaluate the performance of these SVM systems. 90.1% and 86.7% of TM and NM proteins were correctly predicted respectively, which are comparable to those from other studies. The prediction accuracies for proteins of specific TM classes are 95.6%, 90.0%, 92.7% and 73.9% for G-protein coupled receptors, envelope proteins, outer membrane proteins, and transporters/channels respectively; and 98.1%, 99.5%, 86.4%, and 98.6% for non-G-protein coupled receptors, non-envelope proteins, non-outer membrane proteins, and non-transporters/non-channels respectively. Tested by using a significantly larger number and more diverse range of proteins than in previous studies, SVM systems appear to be capable of prediction of TM proteins and proteins of specific TM classes at accuracies comparable to those from previous studies. Our SVM systems – SVMProt, can be accessed at


  1. 1.
    Stack, J.H., Horazdovsky, B., Emr, S.D.: Receptor-mediated Protein Sorting to the Vacuole in Yeast: Roles for a Protein Kinase, a Lipid Kinase and GTP-binding Proteins. Annu. Rev. Cell Dev. Biol. 11, 1–33 (1995)CrossRefGoogle Scholar
  2. 2.
    Le Borgne, R., Hoflack, B.: Protein Transport from the Secretory to the Endocytic Pathway in Mammalian Cells. Biochim. Biophys. Acta 1404, 195–209 (1998)CrossRefGoogle Scholar
  3. 3.
    Chen, X., Schnell, D.J.: Protein Import into Chloroplasts. Trends Cell Biol. 9, 222–227 (1999)CrossRefGoogle Scholar
  4. 4.
    Thanassi, D.G., Hutltgren, S.J.: Multiple Pathways Allow Protein Secretion Across the Bacterial Outer Membrane. Curr. Opin. Cell Biol. 12, 420–430 (2000)CrossRefGoogle Scholar
  5. 5.
    Heusser, C., Jardieu, P.: Therapeutic Potential of Anti-IgE Antibodies. Curr. Opin. Immunol. 9, 805–813 (1997)CrossRefGoogle Scholar
  6. 6.
    Saragovi, H.U., Gehring, K.: Development of Pharmacological Agents for Targeting Neurotrophins and their Receptors. Trends Pharmacol. Sci. 21, 93–98 (2000)CrossRefGoogle Scholar
  7. 7.
    Sedlacek, H.H.: Kinase Inhibitors in Cancer Therapy: A Look Ahead. Drugs 59, 435–476 (2000)CrossRefGoogle Scholar
  8. 8.
    Zhang, L., Brett, C.M., Giacommi, K.M.: Role of Organic Cation Transporters in Drug Absorption and Elimination. Annu. Rev. Pharmacol. Toxicol. 38, 431–460 (1998)CrossRefGoogle Scholar
  9. 9.
    Tamai, I., Tsuji, A.: Transporter-mediated Permeation of Drugs Across the Blood-brain Barrier. J. Pharmaceut. Sci. 89, 1371–1388 (2000)CrossRefGoogle Scholar
  10. 10.
    McGovern, K., Ehrmann, M., Beckwith, J.: Decoding Signals for Membrane Proteins using Alkaline Phosphatase Fusions. EMBO J. 10, 2773–2782 (1991)Google Scholar
  11. 11.
    Amstutz, P., Forrer, P., Zahnd, C., Pluckthun, A.: In Vitro Display Technologies: Novel Developments and Applications. Curr. Opin. Biotechnol. 12, 400–405 (2001)CrossRefGoogle Scholar
  12. 12.
    Wallin, E., von Heijne, G.: Genome-wide Analysis of Integral Membrane Proteins from Eubacterial, Archaean, and Eukaryotic Organisms. Protein Sci. 7, 1029–1038 (1998)CrossRefGoogle Scholar
  13. 13.
    Chen, C.P., Kernytsky, A., Rost, B.: Transmembrane Helix Predictions Revisited. Protein Sci. 11, 2774–2791 (2002)CrossRefGoogle Scholar
  14. 14.
    Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting Transmembrane Protein Topology with A Hidden Markov Model: Application to complete genomes. J. Mol. Biol. 305, 567–580 (2001)CrossRefGoogle Scholar
  15. 15.
    Cai, Y.D., Zhou, G.P., Chou, K.C.: Support Vector Machine for Predicting Membrane Protein Types by using Functional Domain Composition. Biophys. J. 84, 3257–3263 (2003)CrossRefGoogle Scholar
  16. 16.
    Gromiha, M.M., Ahmad, S., Suwa, M.: Neural Network-based Prediction of Transmembrane -strand Segments in Outer Membrane Proteins. J. Comput. Chem. 25, 762–767 (2004)CrossRefGoogle Scholar
  17. 17.
    Yuan, Z., Mattick, J.S., Teasdale, R.D.: SVMtm: Support Vector Machines to Predict Transmembrane Segments. J. Comput. Chem. 25, 632–636 (2004)CrossRefGoogle Scholar
  18. 18.
    Cserzo, M., Eisenhaber, F., Eisenhaber, B., Simon, I.: On Filtering False Positive Transmembrane Protein Predictions. Protein Eng. 15, 745–752 (2002)CrossRefGoogle Scholar
  19. 19.
    Bairoch, A., Apweiler, R.: The SWISS-PROT Protein Sequence Database And Its Supplement Tremble In 2000. Nucleic Acids Res. 28, 45–48 (2000)CrossRefGoogle Scholar
  20. 20.
    Saier, M.H.: A functional-phylogenetic Classification System for Transmembrane Solute Transporters. Microbiol. Mol. Biol. Rev. 64, 354–411 (2000)CrossRefGoogle Scholar
  21. 21.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1999)Google Scholar
  22. 22.
    Karchin, R., Karplus, K., Haussler, D.: Classifying G-protein Coupled Receptors with Support Vector Machines. Bioinformatics 18, 147–159 (2002)CrossRefGoogle Scholar
  23. 23.
    Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: Web-Based Support Vector Machine Software for Functional Classification of a Protein from Its Primary Sequence. Nucleic Acids Res. 31, 3692–3697 (2003)CrossRefGoogle Scholar
  24. 24.
    Cai, C.Z., Han, L.Y., Chen, Y.Z.: Enzyme Family Classification by Support Vector Machines. Proteins 55, 66–76 (2004)CrossRefGoogle Scholar
  25. 25.
    Cai, C.Z., Wang, W.L., Sun, L.Z., Chen, Y.Z.: Protein Function Classification via Support Vector Machine Approach. Math. Biosci. 185, 111–122 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Cai, C.Z., Han, L.Y., Chen, X., et al.: Prediction of Functional Class of the SARS Coronavirus Proteins by a Statistical Learning Method. J. Proteome Res. 4, 1855–1862 (2005)CrossRefGoogle Scholar
  27. 27.
    Han, L.Y., Cai, C.Z., Lo, S.L., et al.: Prediction of RNA-binding Proteins from Primary Sequence by a Support Vector Machine Approach. RNA 10, 355–368 (2004)CrossRefGoogle Scholar
  28. 28.
    Han, L.Y., Cai, C.Z., Ji, Z.L., Chen, Y.Z.: Prediction of Functional Class of Novel Viral Proteins by a Statistical Learning Method Irrespective of Sequence Similarity. Virology 331, 136–143 (2005)CrossRefGoogle Scholar
  29. 29.
    Han, L.Y., Cai, C.Z., Ji, Z.L., et al.: Predicting Functional Family of Novel Enzymes Irrespective of Sequence Similarity: a Statistical Learning Approach. Nucleic Acids Res. 32, 6437–6444 (2004)CrossRefGoogle Scholar
  30. 30.
    Cui, J., Han, L.Y., Cai, C.Z., et al.: Prediction of Functional Class of Novel Bacterial Proteins without the Use of Sequence Similarity by a Statistical Learning Method. J. Mol. Microbiol. Biotechnol. 9, 86–100 (2005)CrossRefGoogle Scholar
  31. 31.
    Lin, H.H., Han, L.Y., Cai, C.Z., Ji, Z.L., Chen, Y.Z.: Prediction of Transporter Family from Protein Sequence by Support Vector Machine Approach. Proteins 62, 218–231 (2006)CrossRefGoogle Scholar
  32. 32.
    Bateman, A., Birney, E., Cerruti, L., et al.: The Pfam Protein Families Database. Nucleic Acids Res. 30, 276–280 (2002)CrossRefGoogle Scholar
  33. 33.
    Bock, J.R., Gough, D.A.: Predicting Protein-protein Interactions from Primary Structure. Bioinformatics 17, 455–460 (2001)CrossRefGoogle Scholar
  34. 34.
    Lo, S.L., Cai, C.Z., Chen, Y.Z., Chung, M.C.M.: Effect of Training Datasets on Support Vector Machine Prediction of Protein-protein Interactions. Proteomics 5, 876–884 (2005)CrossRefGoogle Scholar
  35. 35.
    Cai, Y.D., Liu, X.J., Xu, X.B., Chou, K.C.: Support Vector Machines for Predicting HIV Protease Cleavage Sites in Protein. J. Comput. Chem. 23, 267–274 (2002)CrossRefGoogle Scholar
  36. 36.
    Cai, C.Z., Wang, W.L., Chen, Y.Z.: Support Vector Machine Classification of Physical and Biological Datasets. Inter. J. Mod. Phys. C 14, 575–585 (2003)CrossRefGoogle Scholar
  37. 37.
    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the Accuracy of Prediction Algorithms for Classification: An Overview. Bioinformatics 16, 412–424 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • C. Z. Cai
    • 1
    • 2
  • Q. F. Yuan
    • 1
    • 2
  • H. G. Xiao
    • 1
    • 2
  • X. H. Liu
    • 1
  • L. Y. Han
    • 2
  • Y. Z. Chen
    • 2
  1. 1.Department of Applied PhysicsChongqing UniversityChongqingChina
  2. 2.Department of PharmacyNational University of SingaporeSingapore

Personalised recommendations