Skip to main content
Log in

An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Given a particular membrane protein, it is very important to know which membrane type it belongs to because this kind of information can provide clues for better understanding its function. In this work, we propose a system for predicting the membrane protein type directly from the amino acid sequence. The feature extraction step is based on an encoding technique that combines the physicochemical amino acid properties with the residue couple model. The residue couple model is a method inspired by Chou’s quasi-sequence-order model that extracts the features by utilizing the sequence order effect indirectly. A set of support vector machines, each trained using a different physicochemical amino acid property combined with the residue couple model, are combined by vote rule. The success rate obtained by our system on a difficult dataset, where the sequences in a given membrane type have a low sequence identity to any other proteins of the same membrane type, are quite high, indicating that the proposed method, where the features are extracted directly from the amino acid sequence, is a feasible system for predicting the membrane protein type.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. This database currently contains 544 such indices and 94 substitution matrices.

  2. To avoid any problem in the re-implementation of the method, the Matlab code is available in the Appendix of this paper.

  3. Implemented as in the PRTools 3.1.7 Matlab toolbox.

  4. The support vector machine is implemented as in the OSU svm Matlab toolbox; the parameters of RSVM are C = 0.1 and gamma = 100.

  5. In the vote rule, all votes from the classifiers are tallied, and the class with the most votes represents the final prediction (Nanni and Lumini 2006c).

  6. The properties selected for ENS2-BORDA are normalized hydrophobicity scales for alpha/beta-proteins, a parameter of charge transfer capability, short- and medium-range non-bonded energy per residue, hydrophobicity factor, net charge, free energy of solution in water (kcal/mole), long-range non-bonded energy per atom, partition coefficient, hydropathy index, transfer free energy, average non-bonded energy per atom, weights for beta-sheet at the window position of −1, positive charge, transfer energy, organic solvent/water, weights for alpha-helix at the window position of −2, direction of hydrophobic moment.

  7. Borda Count is defined as a mapping from a set of individual rankings to a combined ranking leading to the most relevant decision. Each class gets one point for each last place vote received, two points for each next-to-last point vote, etc., all the way up to M points for each first place vote (where M is the number of candidates/alternatives).

References

  • Cai YD, Liu XJ, Xu XB, Chou KC (2000) Support vector machines for prediction of protein subcellular location. Mol Cell Biol Res Commun 4:230–233

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem 84:343–348

    Article  PubMed  CAS  Google Scholar 

  • Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K (2006) Prediction of protein structural class with Rough Sets. BMC Bioinformatics 7:20

    Article  PubMed  CAS  Google Scholar 

  • Cedano J, Aloy P, P’erez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  PubMed  CAS  Google Scholar 

  • Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Zhou X, Tian Y, Zou X, Cai P (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121

    Article  PubMed  CAS  Google Scholar 

  • Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 278:477–483

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins Struct Funct Genet 43:246–255 (Erratum: 44:60)

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2004) Review: structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134

    PubMed  CAS  Google Scholar 

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2003) A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem Biophys Res Commun 311:743–747

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2004a) Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. J Cell Biochem 91:1197–1203

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2004b) Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 320:1236–1239

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2005) Predicting protein localization in budding yeast. Bioinformatics 21:944–950

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1998) Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun 252:63–68

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999a) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999b) Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 34:137–153

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007a) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007b) Large-scale plant protein subcellular location prediction. J Cell Biochem 100:665–678

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007c) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Comm 360:339–345

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007d) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Google Scholar 

  • Diao Y, Li M, Feng Z, Yin J, Pan Y (2007a) The community structure of human cellular signaling network. J Theor Biol 247:608–615

    Article  PubMed  Google Scholar 

  • Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids. doi:10.1007/s00726-007-0550-z

  • Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 14:811–815

    Article  PubMed  CAS  Google Scholar 

  • Douglas SM, Chou JJ, Shih WM (2007) DNA-nanotube-induced alignment of membrane proteins for NMR structure determination. Proc Natl Acad Sci USA 104:6644–6648

    Article  PubMed  CAS  Google Scholar 

  • Doyle DA, Morais CJ, Pfuetzner RA, Kuo A, Gulbis JM, Cohen SL, Chait BT, MacKinnon R (1998) The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science 280:69–77

    Article  PubMed  CAS  Google Scholar 

  • Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518

    Article  PubMed  CAS  Google Scholar 

  • Fang Y, Guo Y, Feng Y, Li M (2007) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids. doi:10.1007/s00726-007-0568-2

  • Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516

    Article  PubMed  CAS  Google Scholar 

  • Gao QB, Wang ZZ, Yan C, Du YH (2005a) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448

    Article  PubMed  CAS  Google Scholar 

  • Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005b) Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376

    Article  PubMed  CAS  Google Scholar 

  • Guo J, Lin Y, Liu X (2006a) GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6:5099–5105

    Article  PubMed  CAS  Google Scholar 

  • Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006b) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402

    Article  PubMed  CAS  Google Scholar 

  • Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28

    Article  PubMed  CAS  Google Scholar 

  • Jahandideh S, Abdolmaleki P, Jahandideh M, Asadabadi EB (2007) Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys Chem 128:87–93

    Article  PubMed  CAS  Google Scholar 

  • Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res 28:374

    Article  PubMed  CAS  Google Scholar 

  • Kedarisetti KD, Kurgan LA, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988

    Article  PubMed  CAS  Google Scholar 

  • Lee K, Kim DW, Na D, Lee KH, Lee D (2006) PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 34:4655–4666

    Article  PubMed  CAS  Google Scholar 

  • Li FM, Li QZ (2007) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids. doi:10.1007/s00726-007-0545-9

  • Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551

    Article  PubMed  CAS  Google Scholar 

  • Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466

    Article  PubMed  CAS  Google Scholar 

  • Liu H, Wang M, Chou KC (2005) Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun 336:737–739

    Article  PubMed  CAS  Google Scholar 

  • Liu DQ, Liu H, Shen HB, Yang J, Chou KC (2007) Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids 32:493–496

    Article  PubMed  CAS  Google Scholar 

  • Lodish H, Baltimore D, Berk A, Zipursky SL, Matsudaira P, Darnell J (1995) Molecular cell biology, 3rd edn. Scientific American, New York

    Google Scholar 

  • Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–260

    Article  PubMed  CAS  Google Scholar 

  • Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD (2007) Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognition Lett 28:1610–1615

    Article  Google Scholar 

  • Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24:34–36

    Article  PubMed  CAS  Google Scholar 

  • Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14:897–911

    Article  PubMed  CAS  Google Scholar 

  • Nanni L (2006) Comparison among feature extraction methods for HIV-1 protease cleavage site prediction. Pattern Recognition 39:711–713

    Article  Google Scholar 

  • Nanni L, Lumini A (2006a) An ensemble of K-local hyperplane for predicting protein-protein interactions. BioInformatics 22(10):1207–1210

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2006b) MppS: an ensemble of support vector machine based on multiple physicochemical properties of amino-acids. NeuroComputing 69:1688–1690

    Article  Google Scholar 

  • Nanni L, Lumini A (2006c) Detector of image orientation based on Borda-count. Pattern Recognition Lett 27:180–186

    Article  Google Scholar 

  • Nanni L, Lumini A (2008a) Combing ontologies and dipeptide composition for predicting DNA-binding proteins. Amino Acids. doi:10.1007/s00726-007-0018-1

  • Nanni L, Lumini A (2008b) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids. doi:10.1007/s00726-007-0016-3

  • Niu B, Cai YD, Lu WC, Zheng GY, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Pept Lett 13:489–492

    Article  PubMed  CAS  Google Scholar 

  • Pu X, Guo J, Leung H, Lin Y (2007) Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 247:259–265

    Article  PubMed  CAS  Google Scholar 

  • Pugalenthi G, Tang K, Suganthan PN, Archunan G, Sowdhamini R (2007) A machine learning approach for the identification of odorant binding proteins from sequence-derived properties. BMC Bioinformatics 8:351

    Article  PubMed  CAS  Google Scholar 

  • Schnell JR, Chou JJ (2008) Structure and mechanism of the M2 proton channel of influenza A virus. Nature 451:591–595

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007a) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007b) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33:57–67

    Article  PubMed  CAS  Google Scholar 

  • Shi JY, Zhang SW, Pan Q, Cheng Y-M, Xie J (2007a) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74

    Article  PubMed  CAS  Google Scholar 

  • Shi JY, Zhang SW, Pan Q, Zhou GP (2007b) Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution. Amino Acids. doi:10.1007/s00726-007-0623-z

  • Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475

    Article  PubMed  CAS  Google Scholar 

  • Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids 33:669–675

    Article  PubMed  CAS  Google Scholar 

  • Wang M, Yang J, Liu GP, Xu ZJ, Chou KC (2004) Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng Des Sel 17:509–516

    Article  PubMed  CAS  Google Scholar 

  • Wang M, Yang J, Chou KC (2005) Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids 28:395–402 (Erratum: 29:301)

    Article  PubMed  CAS  Google Scholar 

  • Wen Z, Li M, Li Y, Guo Y, Wang K (2007) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Pept Lett 14:871–875

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Letters 451:23–26

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 33:623–629

    Article  PubMed  CAS  Google Scholar 

  • Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006a) Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30:461–468

    Article  PubMed  CAS  Google Scholar 

  • Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006b) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–6174

    Article  PubMed  CAS  Google Scholar 

  • Zhang SW, Zhang YL, Yang HF, Zhao CH, Pan Q (2007) Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies. Amino Acids. doi:10.1007/s00726-007-0010-9

  • Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins Struct Funct Genet 44:57–59

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins Struct Funct Genet 50:44–48

    Article  PubMed  CAS  Google Scholar 

  • Zhou XB, Chen C, Li ZC, Zou XY (2007a) Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine. Amino Acids. doi:10.1007/s00726-007-0608-y

  • Zhou XB, Chen C, Li ZC, Zou XY (2007b) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loris Nanni.

Appendix: Matlab code of the quasi residue couple

Appendix: Matlab code of the quasi residue couple

The following function implements the base feature extraction method as detailed in Materials and methods.

figure a
figure b

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nanni, L., Lumini, A. An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence. Amino Acids 35, 573–580 (2008). https://doi.org/10.1007/s00726-008-0083-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-008-0083-0

Keywords

Navigation