Skip to main content

Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies

Abstract

The rapidly increasing number of sequence entering into the genome databank has called for the need for developing automated methods to analyze them. Information on the subcellular localization of new found protein sequences is important for helping to reveal their functions in time and conducting the study of system biology at the cellular level. Based on the concept of Chou’s pseudo-amino acid composition, a series of useful information and techniques, such as residue conservation scores, von Neumann entropies, multi-scale energy, and weighted auto-correlation function were utilized to generate the pseudo-amino acid components for representing the protein samples. Based on such an infrastructure, a hybridization predictor was developed for identifying uncharacterized proteins among the following 12 subcellular localizations: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole. Compared with the results reported by the previous investigators, higher success rates were obtained, suggesting that the current approach is quite promising, and may become a useful high-throughput tool in the relevant areas.

This is a preview of subscription content, access via your institution.

Fig. 1

Abbreviations

Chou’s PseAA composition:

Chou’s pseudo-amino acid composition

MSA:

Multiple sequence alignments

VNE:

von Neumann entropy

IS:

Information score

MSE:

Multi-scale energy

AAC:

Amino acid composition

JACK:

Jackknife tests

INDE:

Independent dataset tests

MD:

Moment descriptors

SVM:

Support vector machine

References

  • Altschul S, Madden T, Schffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  PubMed  CAS  Google Scholar 

  • Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES (2004) Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–202

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Chou KC (2003) Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 305:407–411

    Article  PubMed  CAS  Google Scholar 

  • Cai YD, Chou KC (2004) Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 20:1151–1156

    Article  PubMed  CAS  Google Scholar 

  • Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K (2006) Prediction of protein structural class with Rough Sets. BMC Bioinform 7:20

    Article  CAS  Google Scholar 

  • Cedano J, Aloy P, P’erez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448

    Article  PubMed  CAS  Google Scholar 

  • Chen C, Zhou X, Tian Y, Zou X, Cai P (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121

    Article  PubMed  CAS  Google Scholar 

  • Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428

    Article  PubMed  CAS  Google Scholar 

  • Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 278:477–483

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct Funct Genet (Erratum: ibid, 2001, vol 44, 60) 43:246–255

  • Chou KC (2004) Review: structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134

    PubMed  CAS  Google Scholar 

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2002) Using functional-domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 29:45765–45769

    Article  CAS  Google Scholar 

  • Chou KC, Cai YD (2003a) A new hybrid approach to predict subcellular localization of proteins by incorporating gene oncology composition. Biochem Biophys Res Comm 311:743–747

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2003b) Prediction and classification of protein subcellular location: sequence-order effect and pseudo amino acid composition. J Cell Biochem 90:1250–1260

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006a) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006b) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006c) Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 99:517–527

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007a) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007b) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nature Protocols. http://chou.med.harvard.edu/bioinf/Cell-PLoc/ (in press)

  • Chou KC, Shen HB (2007c) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Cui Q, Jiang T, Liu B, Ma S (2004) Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinform 5:66–72

    Article  Google Scholar 

  • Diao Y, Li M, Feng Z, Yin J, Pan Y (2007a) The community structure of human cellular signaling network. J Theor Biol 247:608–615

    Article  PubMed  Google Scholar 

  • Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids. doi:10.1007/s00726-007-0550-z

  • Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 14:811–815

    Article  PubMed  CAS  Google Scholar 

  • Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform 7:518

    Article  CAS  Google Scholar 

  • Fang Y, Guo Y, Feng Y, Li M (2007) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids. doi:10.1007/s00726-007-0568-2

  • Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005a) Using pseudo amino acid composition to predict protein subcellular localization: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376

    Article  PubMed  CAS  Google Scholar 

  • Gao QB, Wang ZZ, Yan C, Du YH (2005b) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448

    Article  PubMed  CAS  Google Scholar 

  • Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516

    Article  PubMed  CAS  Google Scholar 

  • Gardy JL, Brinkman FS (2006) Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 4:741–751

    Article  PubMed  CAS  Google Scholar 

  • Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS (2005) PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21:617–623

    Article  PubMed  CAS  Google Scholar 

  • Guo J, Lin Y, Liu X (2006a) GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6:5099–5105

    Article  PubMed  CAS  Google Scholar 

  • Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006b) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402

    Article  PubMed  CAS  Google Scholar 

  • Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28

    Article  PubMed  CAS  Google Scholar 

  • Jahandideh S, Abdolmaleki P, Jahandideh M, Asadabadi EB (2007) Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys Chem 128:87–93

    Article  PubMed  CAS  Google Scholar 

  • Julenius K, Pedersen AG (2006) Protein evolution is faster outside the cell. Mol Biol Evol 23:2039–2048

    Article  PubMed  CAS  Google Scholar 

  • Kedarisetti KD, Kurgan LA, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988

    Article  PubMed  CAS  Google Scholar 

  • Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Machine Intell 20:226–239

    Article  Google Scholar 

  • Kurgan LA, Stach W, Ruan J (2007) Novel scales based on hydrophobicity indices for secondary protein structure. J Theor Biol 248:354–366

    Article  PubMed  CAS  Google Scholar 

  • Li FM, Li QZ (2007) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids. doi:10.1007/s00726-007-0545-9

  • Lichtarge O, Bourne H, Cohen F (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358

    Article  PubMed  CAS  Google Scholar 

  • Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551

    Article  PubMed  CAS  Google Scholar 

  • Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466

    Article  PubMed  CAS  Google Scholar 

  • Liu DQ, Liu H, Shen HB, Yang J, Chou KC (2007) Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids 32:493–496

    Article  PubMed  CAS  Google Scholar 

  • Lubec G, Afjehi-Sadat L, Yang JW, John JP (2005) Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 77:90–127

    Article  PubMed  CAS  Google Scholar 

  • Mihalek I, Reš I, Lichtarge O (2004) A family of evolution–entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336:1265–1282

    Article  PubMed  CAS  Google Scholar 

  • Mintseris J, Weng ZP (2005) Structure function, and evolution of transient and obligate protein-protein interactions. PNAS 102:10930–10935

    Article  PubMed  CAS  Google Scholar 

  • Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–60

    Article  PubMed  CAS  Google Scholar 

  • Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD (2007) Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM. Pattern Recogn Lett 28:1610–1615

    Article  Google Scholar 

  • Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24:34–36

    Article  PubMed  CAS  Google Scholar 

  • Niu B, Cai YD, Lu WC, Zheng GY, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Pept Lett 13:489–492

    Article  PubMed  CAS  Google Scholar 

  • Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang Z, He L (2003) Application of pseudo amino acid composition for predicting protein subcellular localization: stochastic signal processing approach. J Protein Chem 22:395–402

    Article  PubMed  CAS  Google Scholar 

  • Parker JM, Guo D, Hodges RS (1986) New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochem 25:5425–5432

    Article  CAS  Google Scholar 

  • Pittner S, Kamarthi SV (1999) Feature extraction from wavelet coeffi-cients for pattern recognition tasks. IEEE Trans Pattern Anal Mach Intell 2:83–88

    Article  Google Scholar 

  • Pu X, Guo J, Leung H, Lin Y (2007) Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 247:259–265

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2005a) Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. Biochem Biophys Res Comm 337:752–756

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2005b) Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types. Biochem Biophys Res Commun 334:288–292

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007a) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Chou KC (2007b) PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Anal Biochem. doi:10.10.1016/j.ab.2007.10.012

  • Shen HB, Chou KC (2007c) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488

    Article  PubMed  CAS  Google Scholar 

  • Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33:57–67

    Article  PubMed  CAS  Google Scholar 

  • Shi JY, Zhang SW, Liang Y, Pan Q (2006) Prediction of protein subcellular localizations using moment descriptors and support vector machine. In: PRIB: 2006. Springer, Berlin, pp 105–114

  • Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J (2007) SVM-based method for subcellular localization of protein using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74

    Article  PubMed  CAS  Google Scholar 

  • Soyer OS, Goldstein RA (2004) Predicting functional sites in proteins: site-specific evolutionary models and their application to neurotransmitter transporters. J Mol Biol 339:227–242

    Article  PubMed  CAS  Google Scholar 

  • Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475

    Article  PubMed  CAS  Google Scholar 

  • Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids. doi:10.1007/s00726-006-0465-0

  • Thompson J, Higgins D, Gibson T (1994) Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    Article  PubMed  CAS  Google Scholar 

  • Wang M, Yang J, Chou KC (2005) Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids (Erratum, ibid. 2005 29:301) 28:395–402

  • Wen Z, Li M, Li Y, Guo Y, Wang K (2006) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Ding YS, Huang ZD, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular localization. Amino Acids 28:57–61

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006) Using cellular automata images and pseudo amino acid composition to predict protein subcellular localization. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Pept Lett 14:871–875

    Article  PubMed  CAS  Google Scholar 

  • Zhang SW, Quan Pan, Zhang HC, Zhang YL, Wang HY (2003) Classification of protein quaternary structure with support vector machine. Bioinformatics 19:2390–2396

    Article  PubMed  CAS  Google Scholar 

  • Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006a) Prediction protein homo-oligomer types by pesudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion Amino Acids 30:461–468

    Article  PubMed  CAS  Google Scholar 

  • Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006b) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–74

    Article  PubMed  CAS  Google Scholar 

  • Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids. doi:10.1007/s00726-007-0496-1

  • Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins: Struct Funct Genet 44:57–59

    Article  CAS  Google Scholar 

  • Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins: Struct Funct Genet 50:44–48

    Article  CAS  Google Scholar 

  • Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

This paper was supported in part by the National Natural Science Foundation of China (No. 60775012 and 60634030) and the Technological Innovation Foundation of Northwestern Polytechnical University (No. KC02), and the Science Technology Research and Development Program of Shaanxi (No. 2006k04-G14).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shao-Wu Zhang.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zhang, SW., Zhang, YL., Yang, HF. et al. Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies. Amino Acids 34, 565–572 (2008). https://doi.org/10.1007/s00726-007-0010-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-007-0010-9

Keywords

  • Chou’s pseudo-amino acid composition
  • Residue evolutionary conservation
  • von Neumann entropies
  • Multi-scale energy
  • Weighted auto-correlation function