Amino Acids

, Volume 42, Issue 4, pp 1309–1316 | Cite as

Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet

  • Ying-Li Chen
  • Qian-Zhong LiEmail author
  • Li-Qing Zhang
Original Article


Due to the complexity of Plasmodium falciparum (PF) genome, predicting mitochondrial proteins of PF is more difficult than other species. In this study, using the n-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as feature parameter, the increment of diversity (ID) is firstly developed to predict mitochondrial proteins. By choosing the 1-peptide compositions on the N-terminal regions with 20 residues as the only input vector, the prediction performance achieves 86.86% accuracy with 0.69 Mathew’s correlation coefficient (MCC) by the jackknife test. Moreover, by combining with the hydropathy distribution along protein sequence and several reduced amino acid alphabets, we achieved maximum MCC 0.82 with accuracy 92% in the jackknife test by using the developed ID model. When evaluating on an independent dataset our method performs better than existing methods. The results indicate that the ID is a simple and efficient prediction method for mitochondrial proteins of malaria parasite.


Plasmodium falciparum Mitochondrial proteins Increment of diversity Reduced amino acid alphabet Hydropathy distribution 



The authors would like to thank the reviewers for their comments that help improve the manuscript. This work was supported by the National Natural Science Foundation of China (No. 61063016), the Natural Science Foundation of Inner Mongolia Autonomous Region (No. 200607010101, 20080404MS0105) and the National Science Foundation grant of the United States (No. IIS-0710945).


  1. Bender A, van Dooren GG, Ralph SA, McFadden GI, Schneider G (2003) Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 132:59–66PubMedCrossRefGoogle Scholar
  2. Bendtsen JD, Jensen LJ, Blom N, von Heijne G, Brunak S (2004) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17:349–356PubMedCrossRefGoogle Scholar
  3. Cai YD, Chou KC (2006) Predicting membrane protein type by functional domain composition and pseudo amino acid composition. J Theor Biol 238:395–400PubMedCrossRefGoogle Scholar
  4. Chen YL, Li QZ (2007a) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245:775–783PubMedCrossRefGoogle Scholar
  5. Chen YL, Li QZ (2007b) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo amino acid composition. J Theor Biol 248:377–381PubMedCrossRefGoogle Scholar
  6. Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43:246–255PubMedCrossRefGoogle Scholar
  7. Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19PubMedCrossRefGoogle Scholar
  8. Chou KC, Shen HB (2006a) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897PubMedCrossRefGoogle Scholar
  9. Chou KC, Shen HB (2006b) Large-scale predictions of Gram-negative bacterial protein subcellular locations. J Proteome Res 5:3420–3428PubMedCrossRefGoogle Scholar
  10. Chou KC, Shen HB (2007) Euk-mPLoc: a fusion classifier for largescale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734PubMedCrossRefGoogle Scholar
  11. Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349PubMedCrossRefGoogle Scholar
  12. Claros MG, Vincens P (1996) Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur J Biochem 241:770–786CrossRefGoogle Scholar
  13. de Brevern AG (2005) New assessment of a structural alphabet. In Silico Biol 5:283–289PubMedGoogle Scholar
  14. de Brevern AG, Etchebest C, Hazout S (2000) Bayesian probabilistic approach for prediction backbone structures in terms of protein blocks. Protein Struct Funct Genet 41:271–287CrossRefGoogle Scholar
  15. Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016PubMedCrossRefGoogle Scholar
  16. Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG (2007) A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 36:1059–1069PubMedCrossRefGoogle Scholar
  17. Feng ZP (2001) Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers 58:491–499PubMedCrossRefGoogle Scholar
  18. Garg A, Raghava GPS (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinform 9:503CrossRefGoogle Scholar
  19. Garg A, Bhasin M, Raghava GPS (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280:14427–14432PubMedCrossRefGoogle Scholar
  20. Goldenberg DP (1999) Finding the right fold. Nat Struct Biol 6:987–990PubMedCrossRefGoogle Scholar
  21. Guda C, Fahy E, Subramaniam S (2004) MITOPRED: a genomescale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics 20:1785–1794PubMedCrossRefGoogle Scholar
  22. Höglund A, Doennes P, Blum T, Adolph HW, Kohlbacher O (2006) MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition. BMC Bioinform 22:1158–1165Google Scholar
  23. Hu XZ, Li QZ (2008) Using support vector machine to predict β- and γ-turns in proteins. J Comput Chem 29:1867–1875PubMedCrossRefGoogle Scholar
  24. Joseph AP, Agarwal G, Mahajan S, Gelly JC, Swapna LS, Offmann B, Cadet F, Bornot A, Tyagi M, Valadié H, Schneider B, Etchebest C, Srinivasan N, de Brevern AG (2010) A short survey on Protein Blocks. Biophys Rev 2:137–145PubMedCrossRefGoogle Scholar
  25. Kumar M, Verma R, Raghava GPS (2006) Prediction of mitochondrial proteins using support vector machine and hidden markov model. J Biol Chem 281:5357–5363PubMedCrossRefGoogle Scholar
  26. Laxton RR (1978) The measure of diversity. J Theor Biol 71:51–67CrossRefGoogle Scholar
  27. Li FM, Li QZ (2008a) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids 34:119–125PubMedCrossRefGoogle Scholar
  28. Li FM, Li QZ (2008b) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept Lett 15:612–616PubMedCrossRefGoogle Scholar
  29. Li QZ, Lu ZQ (2001) The prediction of the structural class of protein: application of the measure of diversity. J Theor Biol 213:493–502PubMedCrossRefGoogle Scholar
  30. Li J, Wang W (2007) Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids. Sci China C Life Sci 50:392–402PubMedCrossRefGoogle Scholar
  31. Lin H, Li QZ (2007a) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466PubMedCrossRefGoogle Scholar
  32. Lin H, Li QZ (2007b) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551PubMedCrossRefGoogle Scholar
  33. Nanni L, Lumini A (2008) A genetic approach for building different alphabets for peptide and protein classification. BMC Bioinform 9:45CrossRefGoogle Scholar
  34. Ogul H, Mumcuogu EU (2007) Subcellular localization prediction with new protein encoding schemes. IEEE/ACM Trans Comput Biol Bioinform 24:227–232CrossRefGoogle Scholar
  35. Pánek J, Eidhammer I, Aasland R (2005) A new method for identification of protein (sub)families in a set of proteins based on hydropathy distribution in proteins. Proteins Struct Funct Genet 58:923–934PubMedCrossRefGoogle Scholar
  36. Rashid M, Saha S, Raghava GPS (2007) Support vector machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinform 8:337CrossRefGoogle Scholar
  37. Russell RB, Saqi MA, Sayle RA, Bates PA, Sternberg MJ (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mol Biol 269:423–439PubMedCrossRefGoogle Scholar
  38. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423Google Scholar
  39. Snow RW, Guerra CA, Noor AM, Myint HY, Hay SI (2005) The global distribution of clinical episodes of Plasmodium falciparum malaria. Nature 434:214–217PubMedCrossRefGoogle Scholar
  40. Vaidya AB, Mather MW (2005) A post-genomic view of the mitochondrion in malaria parasites. Curr Top Microbiol Immunol 295:233–250PubMedCrossRefGoogle Scholar
  41. Vaidya AB, Mather MW (2009) Mitochondrial evolution and functions in malaria parasites. Annu Rev Microbiol 63:249–267PubMedCrossRefGoogle Scholar
  42. Verma R, Varshney GC, Raghava GPS (2010) Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile. Amino Acids 39:101–110PubMedCrossRefGoogle Scholar
  43. Wang M, Yang J, Xu ZJ, Chou KC (2005) SLLE for predicting membrane protein types. J Theor Biol 232:7–15PubMedCrossRefGoogle Scholar
  44. Wang T, Yang J, Shen HB, Chou KC (2008) Predicting membrane protein types by the LLDA algorithm. Protein Pept Lett 15:915–921PubMedCrossRefGoogle Scholar
  45. Zhang GY, Fang BS (2008) Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition. J Theor Biol 253:310–315PubMedCrossRefGoogle Scholar
  46. Zhang SW, Pan Q, Zhang HC, Shao ZC, Shi JY (2006) Prediction of protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and Naive Bayes Feature Fusion. Amino Acids 30:461–468PubMedCrossRefGoogle Scholar
  47. Zhang TL, Ding YS, Chou KC (2008) Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 250:186–193PubMedCrossRefGoogle Scholar
  48. Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738PubMedCrossRefGoogle Scholar
  49. Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins Struct Funct Genet 44:57–59PubMedCrossRefGoogle Scholar
  50. Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551PubMedCrossRefGoogle Scholar
  51. Zuo YC, Li QZ (2009) Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides 30:1788–1793PubMedCrossRefGoogle Scholar
  52. Zuo YC, Li QZ (2010) Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids. Amino Acids 38:859–867PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Laboratory of Theoretical Biophysics, School of Physical Science and TechnologyInner Mongolia UniversityHohhotChina
  2. 2.Department of Computer ScienceVirginia TechBlacksburgUSA
  3. 3.Program in Genetics, Bioinformatics, and Computational BiologyVirginia TechBlacksburgUSA

Personalised recommendations