Amino Acids

, Volume 34, Issue 4, pp 653–660 | Cite as

Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization

  • Loris NanniEmail author
  • Alessandra Lumini
Original Article


Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou’s pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 “artificial” features is created. The feature creation is performed by genetic programming combining one or more “original” features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the “original” features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.


Submitochondria localization Chou’s pseudo amino acid Genetic programming 


  1. Cai YD, Liu XJ, Xu XB, Chou KC (2000) Support vector machines for prediction of protein subcellular location. Mol Cell Biol Res Commun 4:230–233PubMedCrossRefGoogle Scholar
  2. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Support vector machines for prediction of protein subcellular location by incorporating quasi-sequence-order effect. J Cell Biochem 84:343–348PubMedCrossRefGoogle Scholar
  3. Cedano J, Aloy P, P’erez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600PubMedCrossRefGoogle Scholar
  4. Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006a) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448PubMedCrossRefGoogle Scholar
  5. Chen C, Zhou X, Tian Y, Zou X, Cai P (2006b) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121PubMedCrossRefGoogle Scholar
  6. Chen J, Liu H, Yang J, Chou KC (2007) Prediction of linear B-cell epitopes using amino acid pair antigenicity scale. Amino Acids 33:423–428PubMedCrossRefGoogle Scholar
  7. Chen YL, Li QZ (2007a) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381PubMedCrossRefGoogle Scholar
  8. Chen YL, Li QZ (2007b) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245:775–783PubMedCrossRefGoogle Scholar
  9. Chou KC (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 278:477–483PubMedCrossRefGoogle Scholar
  10. Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: Struct, Funct, Genet (Erratum: ibid., 2001, 44:60) 43:246–255Google Scholar
  11. Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19PubMedCrossRefGoogle Scholar
  12. Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769PubMedCrossRefGoogle Scholar
  13. Chou KC, Cai YD (2003) A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem Biophys Res Commun 311:743–747PubMedCrossRefGoogle Scholar
  14. Chou KC, Cai YD (2004a) Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition. J Cell Biochem 91:1197–1203PubMedCrossRefGoogle Scholar
  15. Chou KC, Cai YD (2004b) Prediction of protein subcellular locations by GO-FunD-PseAA predicor. Biochem Biophys Res Commun 320:1236–1239PubMedCrossRefGoogle Scholar
  16. Chou KC, Cai YD (2005) Predicting protein localization in budding yeast. Bioinformatics 21:944–950PubMedCrossRefGoogle Scholar
  17. Chou KC, Elrod DW (1998) Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun 252:63–68PubMedCrossRefGoogle Scholar
  18. Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118PubMedCrossRefGoogle Scholar
  19. Chou KC, Shen HB (2006) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157PubMedCrossRefGoogle Scholar
  20. Chou KC, Shen HB (2007a) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734PubMedCrossRefGoogle Scholar
  21. Chou KC, Shen HB (2007b) Large-scale plant protein subcellular location prediction. J Cell Biochem 100:665–678PubMedCrossRefGoogle Scholar
  22. Chou KC, Shen HB (2007c) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345PubMedCrossRefGoogle Scholar
  23. Chou KC, Shen HB (2007d) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16PubMedCrossRefGoogle Scholar
  24. Chou KC, Shen HB (2007e) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Commun 357:633–640PubMedCrossRefGoogle Scholar
  25. Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349PubMedCrossRefGoogle Scholar
  26. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  27. Diao Y, Li M, Feng Z, Yin J, Pan Y (2007a) The community structure of human cellular signaling network. J Theor Biol 247:608–615PubMedCrossRefGoogle Scholar
  28. Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2007b) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids. doi: 10.1007/s00726-007-0550-z
  29. Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Peptide Lett 14:811–815CrossRefGoogle Scholar
  30. Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518PubMedCrossRefGoogle Scholar
  31. Fang Y, Guo Y, Feng Y, Li M (2007) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids. doi: 10.1007/s00726-007-0568-2
  32. Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005) Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376PubMedCrossRefGoogle Scholar
  33. Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006) Classifying G protein-coupled receptors and nuclear receptors based on protein power spectrum from fast Fourier transform. Amino Acids 30:397–402PubMedCrossRefGoogle Scholar
  34. Gottlieb RA (2000) Programmed cell death. Drug News Perspect 13:471–476PubMedGoogle Scholar
  35. Jassem W, Fuggle SV, Rela M, Koo DD, Heaton ND (2000) The role of mitochondria in ischemia/reperfusion injury. Transplantation 73:493–499CrossRefGoogle Scholar
  36. Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucleic Acids Res. 28:374PubMedCrossRefGoogle Scholar
  37. Kontijevskis A, Wikberg JES, Komorowski J (2007) Computational proteomics analysis of HIV-1 protease interactome. Proteins: Struct, Funct, Bioinformatics 68(1):305–312CrossRefGoogle Scholar
  38. Kurgan LA, Stach W, Ruan J (2007) Novel scales based on hydrophobicity indices for secondary protein structure. J Theor Biol 248:354–366PubMedCrossRefGoogle Scholar
  39. Lee K, Kim DW, Na D, Lee KH, Lee D (2006) PLPD: reliable protein localization prediction from imbalanced and overlapped datasets. Nucleic Acids Res 34:4655–4666PubMedCrossRefGoogle Scholar
  40. Li FM, Li QZ (2007) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids. doi: 10.1007/s00726-007-0545-9
  41. Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551PubMedCrossRefGoogle Scholar
  42. Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466PubMedCrossRefGoogle Scholar
  43. Liu DQ, Liu H, Shen HB, Yang J, Chou KC (2007) Predicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments. Amino Acids 32:493–496PubMedCrossRefGoogle Scholar
  44. Liu H, Wang M, Chou KC (2005) Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun 336:737–739PubMedCrossRefGoogle Scholar
  45. Lumini A, Nanni L (2007) Over-complete feature generation and feature selection for biometry. Expert Syst Appl. doi: 10.1016/j.eswa.2007.08.097
  46. Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–60PubMedCrossRefGoogle Scholar
  47. Mundra P, Kumar M, Kumar KK, Jayaraman VK, Kulkarni BD (2007) Using pseudo amino acid composition to predict protein subnuclear localization: approached with PSSM. Pattern Recognit Lett 28:1610–1615CrossRefGoogle Scholar
  48. Nakai K, Horton P (1999) PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24:34–36PubMedCrossRefGoogle Scholar
  49. Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14:897–911PubMedCrossRefGoogle Scholar
  50. Nanni L, Lumini A (2006a) An ensemble of K-Local Hyperplane for predicting protein-protein interactions. BioInformatics 22(10):1207–1210PubMedCrossRefGoogle Scholar
  51. Nanni L, Lumini A (2006b) MppS: an ensemble of support vector machine based on multiple physicochemical properties of amino-acids. NeuroComputing 69:1688–1690CrossRefGoogle Scholar
  52. Niu B, Cai YD, Lu WC, Zheng GY, Chou KC (2006) Predicting protein structural class with AdaBoost learner. Protein Peptide Lett 13:489–492CrossRefGoogle Scholar
  53. Paul TK, Iba H (2007) Prediction of cancer class with majority voting genetic programming classifier using gene expression data. IEEE Trans Comp Biol Bioinformatics.
  54. Pu X, Guo J, Leung H, Lin Y (2007) Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 247:259–265PubMedCrossRefGoogle Scholar
  55. Rögnvaldsson T, You L (2003) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709CrossRefGoogle Scholar
  56. Shen HB, Chou KC (2007a) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364:53–59PubMedCrossRefGoogle Scholar
  57. Shen HB, Chou KC (2007b) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1111PubMedCrossRefGoogle Scholar
  58. Shen HB, Chou KC (2007c) Signal-3L: a 3-layer approach for predicting signal peptide. Biochem Biophys Res Commun 363:297–303PubMedCrossRefGoogle Scholar
  59. Shen HB, Chou KC (2007d) Using ensemble classifier to identify membrane protein types. Amino Acids 32:483–488PubMedCrossRefGoogle Scholar
  60. Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33:57–67PubMedCrossRefGoogle Scholar
  61. Shi JY, Zhang SW, Pan Q, Cheng Y-M, Xie J (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74PubMedCrossRefGoogle Scholar
  62. Sun XD, Huang RB (2006) Prediction of protein structural classes using support vector machines. Amino Acids 30:469–475PubMedCrossRefGoogle Scholar
  63. Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2007) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids. doi: 10.1007/s00726-006-0465-0
  64. Yu, Bhanu B (2006) Evolutionary feature synthesis for facial expression recognition. Pattern Recognit Lett 27:1289–1298CrossRefGoogle Scholar
  65. Wang M, Yang J, Chou KC (2005) Using string kernel to predict signal peptide cleavage site based on subsite coupling model. Amino Acids (Erratum: ibid., 2005, 29:301) 28:395–402Google Scholar
  66. Wang M, Yang J, Liu GP, Xu ZJ, Chou KC (2004) Weighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition. Protein Eng, Des, Sel 17:509–516CrossRefGoogle Scholar
  67. Wen Z, Li M, Li Y, Guo Y, Wang K (2006) Delaunay triangulation with partial least squares projection to latent structures: a model for G-protein coupled receptors classification and fast structure recognition. Amino Acids 32:277–283PubMedCrossRefGoogle Scholar
  68. Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Peptide Lett 14:871–875CrossRefGoogle Scholar
  69. Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61PubMedCrossRefGoogle Scholar
  70. Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006a) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54PubMedCrossRefGoogle Scholar
  71. Xiao X, Shao SH, Huang ZD, Chou KC (2006b) Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor. J Comput Chem 27:478–482PubMedCrossRefGoogle Scholar
  72. Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 451:23–26PubMedCrossRefGoogle Scholar
  73. Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006) Prediction protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion. Amino Acids 30:461–468PubMedCrossRefGoogle Scholar
  74. Zhang TL, Ding YS (2007) Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids. doi: 10.1007/s00726-007-0496-1
  75. Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.DEIS, IEIIT - CNR, Università di BolognaBolognaItaly

Personalised recommendations