Amino Acids

, Volume 39, Issue 1, pp 101–110 | Cite as

Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile

  • Ruchi Verma
  • Grish C. Varshney
  • G. P. S. Raghava
Original Article


The rate of human death due to malaria is increasing day-by-day. Thus the malaria causing parasite Plasmodium falciparum (PF) remains the cause of concern. With the wealth of data now available, it is imperative to understand protein localization in order to gain deeper insight into their functional roles. In this manuscript, an attempt has been made to develop prediction method for the localization of mitochondrial proteins. In this study, we describe a method for predicting mitochondrial proteins of malaria parasite using machine-learning technique. All models were trained and tested on 175 proteins (40 mitochondrial and 135 non-mitochondrial proteins) and evaluated using five-fold cross validation. We developed a Support Vector Machine (SVM) model for predicting mitochondrial proteins of P. falciparum, using amino acids and dipeptides composition and achieved maximum MCC 0.38 and 0.51, respectively. In this study, split amino acid composition (SAAC) is used where composition of N-termini, C-termini, and rest of protein is computed separately. The performance of SVM model improved significantly from MCC 0.38 to 0.73 when SAAC instead of simple amino acid composition was used as input. In addition, SVM model has been developed using composition of PSSM profile with MCC 0.75 and accuracy 91.38%. We achieved maximum MCC 0.81 with accuracy 92% using a hybrid model, which combines PSSM profile and SAAC. When evaluated on an independent dataset our method performs better than existing methods. A web server PFMpred has been developed for predicting mitochondrial proteins of malaria parasites (


Plasmodium falciparum Mitochondria Support vector machine Position specific scoring matrix Online server 



The authors gratefully acknowledged the financial support provided by the Council of Science and Industrial Research (CSIR) and Department of Biotechnology (DBT), Government of India. This paper has IMTECH communication number 048/2007.


  1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29CrossRefPubMedGoogle Scholar
  2. Bender A, van Dooren GG, Ralph SA, McFadden GI, Schneider G (2003) Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum. Mol Biochem Parasitol 132:59–66CrossRefPubMedGoogle Scholar
  3. Bhasin M, Raghava GPS (2004) ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res 32:W414–W419CrossRefPubMedGoogle Scholar
  4. Cai YD, Liu XJ, Xu XB, Chou KC (2002) Prediction of protein structural classes by support vector machines. Comput Chem 26:293–296CrossRefPubMedGoogle Scholar
  5. Cai YD, Lin S, Chou KC (2005) Support vector machines for prediction of protein signal sequences and their cleavage sites. Peptides 24:159–161CrossRefGoogle Scholar
  6. Chen C, Chen LX, Zou XY, Cai PX (2008) Predicting protein structural class based on multi-features fusion. J Theor Biol 253:388–392CrossRefPubMedGoogle Scholar
  7. Chou KC, Shen HB (2006a) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897CrossRefPubMedGoogle Scholar
  8. Chou KC, Shen HB (2006b) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157CrossRefPubMedGoogle Scholar
  9. Chou KC, Shen HB (2007a) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345CrossRefPubMedGoogle Scholar
  10. Chou KC, Shen HB (2007b) Large-scale plant protein subcellular location prediction. J Cell Biochem 100:665–678CrossRefPubMedGoogle Scholar
  11. Chou KC, Shen HB (2007c) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16CrossRefPubMedGoogle Scholar
  12. Chou KC, Shen HB (2007d) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734CrossRefPubMedGoogle Scholar
  13. Chou KC, Shen HB (2008a) ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. Biochem Biophys Res Commun 376:321–325CrossRefPubMedGoogle Scholar
  14. Chou KC, Shen HB (2008b) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162CrossRefPubMedGoogle Scholar
  15. Chou KC, Shen HB (2009) FoldRate: a web-server for predicting protein folding rates from primary sequence. Open Bioinform J 3:31–50. Accessible at Scholar
  16. Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349CrossRefPubMedGoogle Scholar
  17. Claros MG, Vincens P (1996) Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur J Biochem 241:770–786CrossRefGoogle Scholar
  18. Ding YS, Zhang TL (2008) Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognit Lett 29:1887–1892CrossRefGoogle Scholar
  19. Ding YS, Zhang TL, Gu Q, Zhao PY, Chou KC (2009) Using maximum entropy model to predict protein secondary structure with single sequence. Protein Pept Lett 16:552–560CrossRefPubMedGoogle Scholar
  20. Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016CrossRefPubMedGoogle Scholar
  21. Gardner MJ et al (2002) Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419:498–511CrossRefPubMedGoogle Scholar
  22. Garg A, Raghava GPS (2008) ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins. BMC Bioinform 9:503CrossRefGoogle Scholar
  23. Garg A, Bhasin M, Raghava GPS (2005) Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search. J Biol Chem 280:14427–14432CrossRefPubMedGoogle Scholar
  24. Guda C, Fahy E, Subramaniam S (2004) MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics 20:1785–1794CrossRefPubMedGoogle Scholar
  25. Guo J, Lin Y, Liu X (2006) GNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins. Proteomics 6:5099–5105CrossRefPubMedGoogle Scholar
  26. Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY (2008) ProLoc-GO: utilizing informative gene ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinform 9:80CrossRefGoogle Scholar
  27. Joachims T (1999) Making large-scale SVM learning practical. In: Scholkopf B, Burges C, Smola A (eds) Advances in Kernel methods—support vector learning. MIIT Press, Cambridge, MA; London, EnglandGoogle Scholar
  28. Kaur H, Raghava GPS (2003) Prediction of beta-turns in proteins from multiple alignment using neural network. Protein Sci 12:627–634CrossRefPubMedGoogle Scholar
  29. Kaur H, Raghava GPS (2004a) A neural network method for prediction of beta-turn types in proteins using evolutionary information. Bioinformatics 16:2751–2758CrossRefGoogle Scholar
  30. Kaur H, Raghava GPS (2004b) Role of evolutionary information in prediction of aromatic-backbone NH interactions in proteins. FEBS Lett 564:47–57CrossRefPubMedGoogle Scholar
  31. Kumar M, Verma R, Raghava GPS (2006) Prediction of mitochondrial proteins using support vector machine and hidden markov model. J Biol Chem 281:5357–5363CrossRefPubMedGoogle Scholar
  32. Kumar M, Gromiha MM, Raghava GPS (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinform 8:463CrossRefGoogle Scholar
  33. Kumar M, Gromiha MM, Raghava GPS (2008) Prediction of RNA binding sites in a protein using SVM and PSSM profile. Proteins 71:189–194CrossRefPubMedGoogle Scholar
  34. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659CrossRefPubMedGoogle Scholar
  35. Li FM, Li QZ (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept Lett 15:612–616CrossRefPubMedGoogle Scholar
  36. Mather MW, Vaidya AB (2008) Mitochondria in malaria and related parasites: ancient, diverse and streamlined. J Bioenerg Biomembr 40:425–433CrossRefPubMedGoogle Scholar
  37. Rashid M, Saha S, Raghava GPS (2007) Support vector machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs. BMC Bioinform 8:337CrossRefGoogle Scholar
  38. Shen HB, Chou KC (2007a) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Commun 364:53–59CrossRefPubMedGoogle Scholar
  39. Shen HB, Chou KC (2007b) Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng Des Sel 20:561–567CrossRefPubMedGoogle Scholar
  40. Shen HB, Chou KC (2009) QuatIdent: a web server for identifying protein quaternary structural attribute by fusing functional domain and sequential evolution information. J Proteome Res 8:1577–1584CrossRefPubMedGoogle Scholar
  41. Shen HB, Song JN, Chou KC (2009) Prediction of protein folding rates from primary sequence by fusing multiple sequential features. J Biomed Sci Eng 2:136–143. Accessible at Scholar
  42. Vaidya AB, Mather MW (2005) A post-genomic view of the mitochondrion in malaria parasites. Curr Top Microbiol Immunol 295:233–250CrossRefPubMedGoogle Scholar
  43. Vaidya AB, Mather MW (2009) Mitochondrial evolution and functions in malaria parasites. Annu Rev Microbiol 63:249–267CrossRefPubMedGoogle Scholar
  44. Verma R, Tiwari A, Kaur S, Varshney GC, Raghava GPS (2008) Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles. BMC Bioinform 9:201CrossRefGoogle Scholar
  45. Xiao X, Wang P, Chou KC (2009a) GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes. J Comput Chem 30:1414–1423CrossRefPubMedGoogle Scholar
  46. Xiao X, Wang P, Chou KC (2009b) Predicting protein quaternary structural attribute by hybridizing functional domain composition and pseudo amino acid composition. J Appl Crystallogr 42:169–173CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Ruchi Verma
    • 1
  • Grish C. Varshney
    • 2
  • G. P. S. Raghava
    • 1
  1. 1.Bioinformatics CentreInstitute of Microbial TechnologyChandigarhIndia
  2. 2.Cell biology and ImmunologyInstitute of Microbial TechnologyChandigarhIndia

Personalised recommendations