Skip to main content
Log in

Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion

  • Published:
Amino Acids Aims and scope Submit manuscript

Summary.

The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence. Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • CB Anfisen (1973) ArticleTitlePrinciples that govern the folding of protein chains Science 181 223–230

    Google Scholar 

  • CB Anfinsen E Haber M Sela FH White (1961) ArticleTitleThe kinetics of the formation of native ribonuclease during oxidation of the reduced polypeptide chain Proc Natl Acad Sci USA 47 1309–1314 Occurrence Handle13683522 Occurrence Handle1:CAS:528:DyaF38XjtlegsQ%3D%3D Occurrence Handle10.1073/pnas.47.9.1309

    Article  PubMed  CAS  Google Scholar 

  • I Bahar AR Atilgan RL Jernigan B Erman (1997) ArticleTitleUnderstanding the recognition of protein structural classes by amino acid composition Proteins 29 172–185 Occurrence Handle9329082 Occurrence Handle1:CAS:528:DyaK2sXntVOhsLg%3D Occurrence Handle10.1002/(SICI)1097-0134(199710)29:2<172::AID-PROT5>3.0.CO;2-F

    Article  PubMed  CAS  Google Scholar 

  • A Bairoch R Apweiler (1996) ArticleTitleThe SWISS-PROT protein data bank and its new supplement TrEMBL Nucleic Acids Res 24 21–25 Occurrence Handle8594581 Occurrence Handle1:CAS:528:DyaK28XnsV2lug%3D%3D Occurrence Handle10.1093/nar/24.1.21

    Article  PubMed  CAS  Google Scholar 

  • J Cedano P Aloy JA Pérez-Pons E Querol (1997) ArticleTitleRelation between amino acid composition and cellular location of proteins J Mol Biol 266 594–600 Occurrence Handle9067612 Occurrence Handle1:CAS:528:DyaK2sXhslKksL4%3D Occurrence Handle10.1006/jmbi.1996.0804

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (1988) ArticleTitleReview: Low-frequency collective motion in biomacromolecules and its biological functions Biophys Chem 30 3–48 Occurrence Handle3046672 Occurrence Handle1:CAS:528:DyaL1cXks1yru7k%3D Occurrence Handle10.1016/0301-4622(88)85002-6

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (1995) ArticleTitleA novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space Proteins 21 319–344 Occurrence Handle7567954 Occurrence Handle1:CAS:528:DyaK2MXls12rsb0%3D Occurrence Handle10.1002/prot.340210406

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (2000a) ArticleTitleReview: Prediction of protein structural classes and subcellular locations Curr Protein Peptide Sci 1 171–208 Occurrence Handle1:CAS:528:DC%2BD3cXnsVeisL0%3D Occurrence Handle10.2174/1389203003381379

    Article  CAS  Google Scholar 

  • KC Chou (2000b) ArticleTitlePrediction of protein subcellular locations by incorporating quasi-sequence-order effect Biochem Biophys Res Commun 278 477–483 Occurrence Handle1:CAS:528:DC%2BD3cXotlKksbs%3D Occurrence Handle10.1006/bbrc.2000.3815

    Article  CAS  Google Scholar 

  • KC Chou (2001) ArticleTitlePrediction of protein cellular attributes using pseudo-amino acid composition Proteins Struct Funct Genet 43 246–255 Occurrence Handle11288174 Occurrence Handle1:CAS:528:DC%2BD3MXjtFOls74%3D Occurrence Handle10.1002/prot.1035

    Article  PubMed  CAS  Google Scholar 

  • KC Chou (2004a) ArticleTitleMolecular therapeutic target for type-2 diabetes J Proteome Res 3 1284–1288 Occurrence Handle1:CAS:528:DC%2BD2cXosVOktbs%3D Occurrence Handle10.1021/pr049849v

    Article  CAS  Google Scholar 

  • KC Chou (2004b) ArticleTitleInsights from modelling three-dimensional structures of the human potassium and sodium channels J Proteome Res 3 856–861 Occurrence Handle1:CAS:528:DC%2BD2cXktVKmtL8%3D Occurrence Handle10.1021/pr049931q

    Article  CAS  Google Scholar 

  • KC Chou (2004c) ArticleTitleInsights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor Biochem Biophys Res Commun 319 433–438 Occurrence Handle1:CAS:528:DC%2BD2cXkslSjtLg%3D Occurrence Handle10.1016/j.bbrc.2004.05.016

    Article  CAS  Google Scholar 

  • KC Chou (2004d) ArticleTitleModelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5 Biochem Biophys Res Commun 316 636–642 Occurrence Handle1:CAS:528:DC%2BD2cXitFKgs7w%3D Occurrence Handle10.1016/j.bbrc.2004.02.098

    Article  CAS  Google Scholar 

  • KC Chou (2004e) ArticleTitleReview: Structural bioinformatics and its impact to biomedical science Curr Med Chem 11 2105–2134 Occurrence Handle1:CAS:528:DC%2BD2cXlslWltbw%3D

    CAS  Google Scholar 

  • KC Chou (2005) ArticleTitleUsing amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Bioinformatics 21 10–19 Occurrence Handle15308540 Occurrence Handle1:CAS:528:DC%2BD2MXisVWitw%3D%3D Occurrence Handle10.1093/bioinformatics/bth466

    Article  PubMed  CAS  Google Scholar 

  • KC Chou YD Cai (2003a) ArticleTitlePrediction and classification of protein subcellular location: sequence-order effect and pseudo amino acid composition J Cell Biochem 90 1250–1260 Occurrence Handle1:CAS:528:DC%2BD3sXpslSgsb4%3D Occurrence Handle10.1002/jcb.10719

    Article  CAS  Google Scholar 

  • KC Chou YD Cai (2003b) ArticleTitlePredicting protein quaternary structure by pseudo amino acid composition Proteins Struct Func Gene 53 282–289 Occurrence Handle1:CAS:528:DC%2BD3sXotVSqurk%3D Occurrence Handle10.1002/prot.10500

    Article  CAS  Google Scholar 

  • KC Chou YD Cai (2003c) ArticleTitleA new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology Biochem Biophys Res Commun 311 743–747 Occurrence Handle1:CAS:528:DC%2BD3sXos12lurs%3D Occurrence Handle10.1016/j.bbrc.2003.10.062

    Article  CAS  Google Scholar 

  • KC Chou YD Cai (2004a) ArticleTitlePredicting enzyme family class in a hybridization space Protein Sci 13 2857–2863 Occurrence Handle1:CAS:528:DC%2BD2cXpsVWhsLY%3D Occurrence Handle10.1110/ps.04981104

    Article  CAS  Google Scholar 

  • KC Chou YD Cai (2004b) ArticleTitlePredicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition J Cell Biochem 91 1197–1203 Occurrence Handle1:CAS:528:DC%2BD2cXjt1yntrY%3D Occurrence Handle10.1002/jcb.10790

    Article  CAS  Google Scholar 

  • KC Chou DW Elord (1999) ArticleTitlePrediction of membrane protein types and subcellular locations Proteins Struct Funct Genet 34 137–153 Occurrence Handle10336379 Occurrence Handle1:CAS:528:DyaK1MXjtFGisg%3D%3D Occurrence Handle10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO;2-O

    Article  PubMed  CAS  Google Scholar 

  • KC Chou CT Zhang (1994) ArticleTitlePredicting protein folding types by distance functions that make allowances for amino acid interactions J Biol Chem 269 22014–22020 Occurrence Handle8071322 Occurrence Handle1:CAS:528:DyaK2cXlslCls7o%3D

    PubMed  CAS  Google Scholar 

  • KC Chou CT Zhang (1995) ArticleTitleReview: Prediction of protein structural classes Crit Rev Biochem Mol Biol 30 275–349 Occurrence Handle7587280 Occurrence Handle1:CAS:528:DyaK2MXosFentb8%3D

    PubMed  CAS  Google Scholar 

  • JL Cornette KB Cease H Margali JL Spouge JA Berzofsky C Delisi (1987) ArticleTitleHydrophobicity scales and computational techniques for detecting amphipathic structures in proteins J Mol Biol 195 659–685 Occurrence Handle3656427 Occurrence Handle1:CAS:528:DyaL2sXlt12gur0%3D Occurrence Handle10.1016/0022-2836(87)90189-6

    Article  PubMed  CAS  Google Scholar 

  • O Emanuelsson H Nielsen S Brunak G von Heijne (2000) ArticleTitlePredicting subcellular localization of proteins based on their N-terminal amino acid sequence J Mol Biol 300 1005–1016 Occurrence Handle10891285 Occurrence Handle1:CAS:528:DC%2BD3cXks1OntrY%3D Occurrence Handle10.1006/jmbi.2000.3903

    Article  PubMed  CAS  Google Scholar 

  • GD Fasman (Eds) (1976) Handbook of biochemistry and molecular biology EditionNumber3 CRC Press Boca Raton

    Google Scholar 

  • ZP Feng (2001) ArticleTitlePrediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition Biopolymers 58 491–509 Occurrence Handle11241220 Occurrence Handle1:CAS:528:DC%2BD3MXisVSntb8%3D Occurrence Handle10.1002/1097-0282(20010415)58:5<491::AID-BIP1024>3.0.CO;2-I

    Article  PubMed  CAS  Google Scholar 

  • Y Gao SH Shao X Xiao YS Ding YS Huang ZD Huang KC Chou (2005) ArticleTitleUsing pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter Amino Acids 28 373–376 Occurrence Handle15889221 Occurrence Handle1:CAS:528:DC%2BD2MXlt1Kmurw%3D Occurrence Handle10.1007/s00726-005-0206-9

    Article  PubMed  CAS  Google Scholar 

  • R Garian (2001) ArticleTitlePrediction of quaternary structure from primary structure Bioinformatics 17 551–556 Occurrence Handle11395433 Occurrence Handle1:CAS:528:DC%2BD3MXltFOgsb0%3D Occurrence Handle10.1093/bioinformatics/17.6.551

    Article  PubMed  CAS  Google Scholar 

  • F Glase DM Steinberg IA Vakser N Ben-Tal (2001) ArticleTitleResidue frequencies and pairing preferences at protein–protein interfaces Proteins Struct Funct Genet 43 89–102 Occurrence Handle10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-H

    Article  Google Scholar 

  • SJ Hua ZR Sun (2001) ArticleTitleSupport vector machine approach for protein subcellular localization prediction Bioinformatics 17 721–728 Occurrence Handle11524373 Occurrence Handle1:CAS:528:DC%2BD3MXntFKjsb0%3D Occurrence Handle10.1093/bioinformatics/17.8.721

    Article  PubMed  CAS  Google Scholar 

  • S Jones JM Thornton (1997a) ArticleTitleAnalysis of protein–protein interaction sites using surface patches J Mol Biol 272 121–132 Occurrence Handle1:CAS:528:DyaK2sXmt1WgtrY%3D Occurrence Handle10.1006/jmbi.1997.1234

    Article  CAS  Google Scholar 

  • S Jones JM Thornton (1997b) ArticleTitlePrediction of protein–protein interaction sites using patch analysis J Mol Biol 272 133–143 Occurrence Handle1:CAS:528:DyaK2sXmt1Wgtrc%3D Occurrence Handle10.1006/jmbi.1997.1233

    Article  CAS  Google Scholar 

  • LI Kuncheva (2002) ArticleTitleSwitching between selection and fusion in combining classifiers: an experiment IEEE Trans 32 146–156

    Google Scholar 

  • W Liu KC Chou (1999) ArticleTitleProtein secondary structural content prediction Protein Eng 12 1041–1050 Occurrence Handle10611397 Occurrence Handle1:CAS:528:DC%2BD3cXmt1agug%3D%3D Occurrence Handle10.1093/protein/12.12.1041

    Article  PubMed  CAS  Google Scholar 

  • JL Meek ZL Rossetti (1981) ArticleTitleFactors affecting retention and resolution of peptides in HPLC J Chromatogr 211 15–28 Occurrence Handle1:CAS:528:DyaL3MXkslKht74%3D Occurrence Handle10.1016/S0021-9673(00)81169-3

    Article  CAS  Google Scholar 

  • SM Muskal SH Kim (1992) ArticleTitlePredicting protein secondary structure content: a tandem neural network approach J Mol Biol 225 713–727 Occurrence Handle1602478 Occurrence Handle1:CAS:528:DyaK38Xks1KjsLk%3D Occurrence Handle10.1016/0022-2836(92)90396-2

    Article  PubMed  CAS  Google Scholar 

  • H Nakashima K Nishikawa (1994) ArticleTitleDiscrimination of intracellular and extracellular proteins using amino acid composition and residuepair frequencies J Mol Biol 238 54–61 Occurrence Handle8145256 Occurrence Handle1:CAS:528:DyaK2cXivFemtrw%3D Occurrence Handle10.1006/jmbi.1994.1267

    Article  PubMed  CAS  Google Scholar 

  • H Nakashima K Nishikawa T Ooi (1986) ArticleTitleThe folding type of a protein is relevant to the amino acid composition J Biochem 99 152–162

    Google Scholar 

  • K Oxenoid JJ Chou (2005) ArticleTitleThe structure of phospholamban pentamer reveals a channel-like architecture in membranes Proc Natl Acad Sci USA 102 10870–10875 Occurrence Handle16043693 Occurrence Handle1:CAS:528:DC%2BD2MXnvVWjtrY%3D Occurrence Handle10.1073/pnas.0504920102

    Article  PubMed  CAS  Google Scholar 

  • YX Pan ZZ Zhang ZM Guo GY Feng ZD Huang L He (2003) ArticleTitleApplication of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach J Protein Chem 22 395–402 Occurrence Handle13678304 Occurrence Handle1:CAS:528:DC%2BD3sXmsFejs7s%3D Occurrence Handle10.1023/A:1025350409648

    Article  PubMed  CAS  Google Scholar 

  • N Qian TJ Sejnowski (1988) ArticleTitlePredicting the secondary structure of globular proteins using neural network models J Mal Dial 202 865–884 Occurrence Handle1:CAS:528:DyaL1MXhtlWksb0%3D

    CAS  Google Scholar 

  • A Reinhardt T Hubbard (1998) ArticleTitleUsing neural network for prediction of the subcellular location of proteins Nucleic Acids Res 26 2230–2236 Occurrence Handle9547285 Occurrence Handle1:CAS:528:DyaK1cXjtFylsLw%3D Occurrence Handle10.1093/nar/26.9.2230

    Article  PubMed  CAS  Google Scholar 

  • B Robson DJ Osguthorpe (1979) ArticleTitleRefined models for computer simulation of protein folding. Applications to the study of conserved secondary structure and flexible hinge points during the folding of pancreatic trypsin inhibitor J Mol Biol 132 19–51 Occurrence Handle513136 Occurrence Handle1:CAS:528:DyaE1MXlsFeis7s%3D Occurrence Handle10.1016/0022-2836(79)90494-7

    Article  PubMed  CAS  Google Scholar 

  • B Rost C Sander (1993) ArticleTitlePrediction of secondary structure at better than 70% accuracy J Mol Biol 232 584–599 Occurrence Handle8345525 Occurrence Handle1:CAS:528:DyaK3sXmt1WjurY%3D Occurrence Handle10.1006/jmbi.1993.1413

    Article  PubMed  CAS  Google Scholar 

  • K Shuichi O Hiroyuki K Minoru (1999) ArticleTitleAaindex: amino acid index database Nucleic Acids Res 27 368–369 Occurrence Handle10.1093/nar/27.1.368

    Article  Google Scholar 

  • PH Sneath (1966) ArticleTitleRelations between chemical structure and biological activity in peptides J Theor Biol 12 157–195 Occurrence Handle4291386 Occurrence Handle1:CAS:528:DyaF2sXjsFSqug%3D%3D Occurrence Handle10.1016/0022-5193(66)90112-3

    Article  PubMed  CAS  Google Scholar 

  • K Tomii M Kanehisa (1996) ArticleTitleAnalysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins Protein Eng 9 27–36 Occurrence Handle9053899 Occurrence Handle1:CAS:528:DyaK28XitlCntLg%3D

    PubMed  CAS  Google Scholar 

  • V Vapnik (Eds) (1995) The nature of statistical learning theory Springer New York

    Google Scholar 

  • V Vapnik (Eds) (1998) Statistical learning theory Wiley New York

    Google Scholar 

  • M Wang J Yang GP Liu ZJ Xu KC Chou (2004) ArticleTitleWeighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition Protein Eng Des Select 17 509–516 Occurrence Handle1:CAS:528:DC%2BD2cXos1GisLY%3D Occurrence Handle10.1093/protein/gzh061

    Article  CAS  Google Scholar 

  • M Wang J Yang ZJ Xu KC Chou (2005) ArticleTitleSLLE for predicting membrane protein types J Theor Biol 232 7–15 Occurrence Handle15498588 Occurrence Handle1:CAS:528:DC%2BD2cXovVKkur4%3D Occurrence Handle10.1016/j.jtbi.2004.07.023

    Article  PubMed  CAS  Google Scholar 

  • X Xiao S Shao Y Ding Z Huang X Chen KC Chou (2005a) ArticleTitleUsing cellular automata to generate image representation for biological sequences Amino Acids 28 29–35 Occurrence Handle1:CAS:528:DC%2BD2MXhsVKqs70%3D Occurrence Handle10.1007/s00726-004-0154-9

    Article  CAS  Google Scholar 

  • X Xiao S Shao Y Ding Z Huang Y Huang KC Chou (2005b) ArticleTitleUsing complexity measure factor to predict protein subcellular location Amino Acids 28 57–61 Occurrence Handle1:CAS:528:DC%2BD2MXhsVKqsro%3D Occurrence Handle10.1007/s00726-004-0148-7

    Article  CAS  Google Scholar 

  • X Xiao SH Shao YS Ding ZD Huang KC Chou (2006) ArticleTitleUsing cellular automata images and pseudo amino acid composition to predict protein sub-cellular location Amino Acids 30 49–54 Occurrence Handle16044193 Occurrence Handle1:CAS:528:DC%2BD28XhsFCksrk%3D Occurrence Handle10.1007/s00726-005-0225-6

    Article  PubMed  CAS  Google Scholar 

  • CT Zhang R Zhang (1998) ArticleTitleA new quantitative criterion to distinguish between α/β and α + β proteins FEBS Lett 440 153–157 Occurrence Handle9862445 Occurrence Handle1:CAS:528:DyaK1cXotVSktLY%3D Occurrence Handle10.1016/S0014-5793(98)01433-1

    Article  PubMed  CAS  Google Scholar 

  • SW Zhang P Quan HC Zhang YL Zhang HY Wang (2003) ArticleTitleClassification of protein quaternary structure with support vector machine Bioinformatics 19 2390–2396 Occurrence Handle14668222 Occurrence Handle1:CAS:528:DC%2BD3sXpvVSisrs%3D Occurrence Handle10.1093/bioinformatics/btg331

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou (1998) ArticleTitleAn intriguing controversy over protein structural class prediction J Protein Chem 17 729–738 Occurrence Handle9988519 Occurrence Handle1:CAS:528:DyaK1MXnslaltw%3D%3D Occurrence Handle10.1023/A:1020713915365

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou N Assa-Munt (2001) ArticleTitleSome insights into protein structural class prediction Proteins Struct Funct Genet 44 57–59 Occurrence Handle11354006 Occurrence Handle1:CAS:528:DC%2BD3MXktlSnsbk%3D Occurrence Handle10.1002/prot.1071

    Article  PubMed  CAS  Google Scholar 

  • GP Zhou K Doctor (2003) ArticleTitleSubcellular location prediction of apoptosis proteins Proteins 50 44–48 Occurrence Handle12471598 Occurrence Handle1:CAS:528:DC%2BD3sXlsVKmug%3D%3D Occurrence Handle10.1002/prot.10251

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, SW., Pan, Q., Zhang, HC. et al. Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids 30, 461–468 (2006). https://doi.org/10.1007/s00726-006-0263-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-006-0263-8

Navigation