Amino Acids

, Volume 34, Issue 4, pp 565–572 | Cite as

Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies

  • Shao-Wu Zhang
  • Yun-Long Zhang
  • Hui-Fang Yang
  • Chun-Hui Zhao
  • Quan Pan
Original Article


The rapidly increasing number of sequence entering into the genome databank has called for the need for developing automated methods to analyze them. Information on the subcellular localization of new found protein sequences is important for helping to reveal their functions in time and conducting the study of system biology at the cellular level. Based on the concept of Chou’s pseudo-amino acid composition, a series of useful information and techniques, such as residue conservation scores, von Neumann entropies, multi-scale energy, and weighted auto-correlation function were utilized to generate the pseudo-amino acid components for representing the protein samples. Based on such an infrastructure, a hybridization predictor was developed for identifying uncharacterized proteins among the following 12 subcellular localizations: chloroplast, cytoplasm, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole. Compared with the results reported by the previous investigators, higher success rates were obtained, suggesting that the current approach is quite promising, and may become a useful high-throughput tool in the relevant areas.


Chou’s pseudo-amino acid composition Residue evolutionary conservation von Neumann entropies Multi-scale energy Weighted auto-correlation function 


Chou’s PseAA composition

Chou’s pseudo-amino acid composition


Multiple sequence alignments


von Neumann entropy


Information score


Multi-scale energy


Amino acid composition


Jackknife tests


Independent dataset tests


Moment descriptors


Support vector machine



This paper was supported in part by the National Natural Science Foundation of China (No. 60775012 and 60634030) and the Technological Innovation Foundation of Northwestern Polytechnical University (No. KC02), and the Science Technology Research and Development Program of Shaanxi (No. 2006k04-G14).


  1. Altschul S, Madden T, Schffer A, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  2. Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES (2004) Are protein-protein interfaces more conserved in sequence than the rest of the protein surface? Protein Sci 13:190–202PubMedCrossRefGoogle Scholar
  3. Cai YD, Chou KC (2003) Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 305:407–411PubMedCrossRefGoogle Scholar
  4. Cai YD, Chou KC (2004) Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 20:1151–1156PubMedCrossRefGoogle Scholar
  5. Cao Y, Liu S, Zhang L, Qin J, Wang J, Tang K (2006) Prediction of protein structural class with Rough Sets. BMC Bioinform 7:20CrossRefGoogle Scholar
  6. Cedano J, Aloy P, P’erez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600PubMedCrossRefGoogle Scholar
  7. Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448PubMedCrossRefGoogle Scholar
  8. Chen C, Zhou X, Tian Y, Zou X, Cai P (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121PubMedCrossRefGoogle Scholar
  9. Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428PubMedCrossRefGoogle Scholar
  10. Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381PubMedCrossRefGoogle Scholar
  11. Chou KC (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 278:477–483PubMedCrossRefGoogle Scholar
  12. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins: Struct Funct Genet (Erratum: ibid, 2001, vol 44, 60) 43:246–255Google Scholar
  13. Chou KC (2004) Review: structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134PubMedGoogle Scholar
  14. Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19PubMedCrossRefGoogle Scholar
  15. Chou KC, Cai YD (2002) Using functional-domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 29:45765–45769CrossRefGoogle Scholar
  16. Chou KC, Cai YD (2003a) A new hybrid approach to predict subcellular localization of proteins by incorporating gene oncology composition. Biochem Biophys Res Comm 311:743–747PubMedCrossRefGoogle Scholar
  17. Chou KC, Cai YD (2003b) Prediction and classification of protein subcellular location: sequence-order effect and pseudo amino acid composition. J Cell Biochem 90:1250–1260PubMedCrossRefGoogle Scholar
  18. Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118PubMedCrossRefGoogle Scholar
  19. Chou KC, Shen HB (2006a) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157PubMedCrossRefGoogle Scholar
  20. Chou KC, Shen HB (2006b) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897PubMedCrossRefGoogle Scholar
  21. Chou KC, Shen HB (2006c) Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 99:517–527PubMedCrossRefGoogle Scholar
  22. Chou KC, Shen HB (2007a) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734PubMedCrossRefGoogle Scholar
  23. Chou KC, Shen HB (2007b) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nature Protocols. (in press)
  24. Chou KC, Shen HB (2007c) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16PubMedCrossRefGoogle Scholar
  25. Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349PubMedCrossRefGoogle Scholar
  26. Cui Q, Jiang T, Liu B, Ma S (2004) Esub8: a novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinform 5:66–72CrossRefGoogle Scholar
  27. Diao Y, Li M, Feng Z, Yin J, Pan Y (2007a) The community structure of human cellular signaling network. J Theor Biol 247:608–615PubMedCrossRefGoogle Scholar
  28. Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids. doi: 10.1007/s00726-007-0550-z
  29. Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 14:811–815PubMedCrossRefGoogle Scholar
  30. Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform 7:518CrossRefGoogle Scholar
  31. Fang Y, Guo Y, Feng Y, Li M (2007) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids. doi: 10.1007/s00726-007-0568-2
  32. Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005a) Using pseudo amino acid composition to predict protein subcellular localization: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376PubMedCrossRefGoogle Scholar
  33. Gao QB, Wang ZZ, Yan C, Du YH (2005b) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448PubMedCrossRefGoogle Scholar
  34. Gao QB, Wang ZZ (2006) Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 19:511–516PubMedCrossRefGoogle Scholar
  35. Gardy JL, Brinkman FS (2006) Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 4:741–751PubMedCrossRefGoogle Scholar
  36. Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS (2005) PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21:617–623PubMedCrossRefGoogle Scholar
  37. Guo J, Lin Y, Liu X (2006a) GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6:5099–5105PubMedCrossRefGoogle Scholar
  38. Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006b) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402PubMedCrossRefGoogle Scholar
  39. Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28PubMedCrossRefGoogle Scholar
  40. Jahandideh S, Abdolmaleki P, Jahandideh M, Asadabadi EB (2007) Novel two-stage hybrid neural discriminant model for predicting proteins structural classes. Biophys Chem 128:87–93PubMedCrossRefGoogle Scholar
  41. Julenius K, Pedersen AG (2006) Protein evolution is faster outside the cell. Mol Biol Evol 23:2039–2048PubMedCrossRefGoogle Scholar
  42. Kedarisetti KD, Kurgan LA, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988PubMedCrossRefGoogle Scholar
  43. Kittler J, Hatef M, Duin RPW, Matas J (1998) On combining classifiers. IEEE Trans Pattern Anal Machine Intell 20:226–239CrossRefGoogle Scholar
  44. Kurgan LA, Stach W, Ruan J (2007) Novel scales based on hydrophobicity indices for secondary protein structure. J Theor Biol 248:354–366PubMedCrossRefGoogle Scholar
  45. Li FM, Li QZ (2007) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids. doi: 10.1007/s00726-007-0545-9
  46. Lichtarge O, Bourne H, Cohen F (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358PubMedCrossRefGoogle Scholar
  47. Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551PubMedCrossRefGoogle Scholar
  48. Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466PubMedCrossRefGoogle Scholar
  49. Liu DQ, Liu H, Shen HB, Yang J, Chou KC (2007) Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids 32:493–496PubMedCrossRefGoogle Scholar
  50. Lubec G, Afjehi-Sadat L, Yang JW, John JP (2005) Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 77:90–127PubMedCrossRefGoogle Scholar
  51. Mihalek I, Reš I, Lichtarge O (2004) A family of evolution–entropy hybrid methods for ranking protein residues by importance. J Mol Biol 336:1265–1282PubMedCrossRefGoogle Scholar
  52. Mintseris J, Weng ZP (2005) Structure function, and evolution of transient and obligate protein-protein interactions. PNAS 102:10930–10935PubMedCrossRefGoogle Scholar
  53. Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–60PubMedCrossRefGoogle Scholar
  54. Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD (2007) Using pseudo amino acid composition to predict protein subnuclear localization: Approached with PSSM. Pattern Recogn Lett 28:1610–1615CrossRefGoogle Scholar
  55. Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24:34–36PubMedCrossRefGoogle Scholar
  56. Niu B, Cai YD, Lu WC, Zheng GY, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Pept Lett 13:489–492PubMedCrossRefGoogle Scholar
  57. Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang Z, He L (2003) Application of pseudo amino acid composition for predicting protein subcellular localization: stochastic signal processing approach. J Protein Chem 22:395–402PubMedCrossRefGoogle Scholar
  58. Parker JM, Guo D, Hodges RS (1986) New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites. Biochem 25:5425–5432CrossRefGoogle Scholar
  59. Pittner S, Kamarthi SV (1999) Feature extraction from wavelet coeffi-cients for pattern recognition tasks. IEEE Trans Pattern Anal Mach Intell 2:83–88CrossRefGoogle Scholar
  60. Pu X, Guo J, Leung H, Lin Y (2007) Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 247:259–265PubMedCrossRefGoogle Scholar
  61. Shen HB, Chou KC (2005a) Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. Biochem Biophys Res Comm 337:752–756PubMedCrossRefGoogle Scholar
  62. Shen HB, Chou KC (2005b) Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types. Biochem Biophys Res Commun 334:288–292PubMedCrossRefGoogle Scholar
  63. Shen HB, Chou KC (2007a) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011PubMedCrossRefGoogle Scholar
  64. Shen HB, Chou KC (2007b) PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Anal Biochem. doi: 10.10.1016/j.ab.2007.10.012
  65. Shen HB, Chou KC (2007c) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488PubMedCrossRefGoogle Scholar
  66. Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33:57–67PubMedCrossRefGoogle Scholar
  67. Shi JY, Zhang SW, Liang Y, Pan Q (2006) Prediction of protein subcellular localizations using moment descriptors and support vector machine. In: PRIB: 2006. Springer, Berlin, pp 105–114Google Scholar
  68. Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J (2007) SVM-based method for subcellular localization of protein using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74PubMedCrossRefGoogle Scholar
  69. Soyer OS, Goldstein RA (2004) Predicting functional sites in proteins: site-specific evolutionary models and their application to neurotransmitter transporters. J Mol Biol 339:227–242PubMedCrossRefGoogle Scholar
  70. Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475PubMedCrossRefGoogle Scholar
  71. Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids. doi: 10.1007/s00726-006-0465-0
  72. Thompson J, Higgins D, Gibson T (1994) Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedCrossRefGoogle Scholar
  73. Wang M, Yang J, Chou KC (2005) Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids (Erratum, ibid. 2005 29:301) 28:395–402Google Scholar
  74. Wen Z, Li M, Li Y, Guo Y, Wang K (2006) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283PubMedCrossRefGoogle Scholar
  75. Xiao X, Shao SH, Ding YS, Huang ZD, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular localization. Amino Acids 28:57–61PubMedCrossRefGoogle Scholar
  76. Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006) Using cellular automata images and pseudo amino acid composition to predict protein subcellular localization. Amino Acids 30:49–54PubMedCrossRefGoogle Scholar
  77. Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Pept Lett 14:871–875PubMedCrossRefGoogle Scholar
  78. Zhang SW, Quan Pan, Zhang HC, Zhang YL, Wang HY (2003) Classification of protein quaternary structure with support vector machine. Bioinformatics 19:2390–2396PubMedCrossRefGoogle Scholar
  79. Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006a) Prediction protein homo-oligomer types by pesudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion Amino Acids 30:461–468PubMedCrossRefGoogle Scholar
  80. Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006b) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–74PubMedCrossRefGoogle Scholar
  81. Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids. doi: 10.1007/s00726-007-0496-1
  82. Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738PubMedCrossRefGoogle Scholar
  83. Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins: Struct Funct Genet 44:57–59CrossRefGoogle Scholar
  84. Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins: Struct Funct Genet 50:44–48CrossRefGoogle Scholar
  85. Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  • Shao-Wu Zhang
    • 1
  • Yun-Long Zhang
    • 2
  • Hui-Fang Yang
    • 1
  • Chun-Hui Zhao
    • 1
  • Quan Pan
    • 1
  1. 1.College of AutomationNorthwestern Polytechnical UniversityXi’anChina
  2. 2.Department of ComputerFirst Aeronautical Institute of Air ForceXinyangChina

Personalised recommendations