Skip to main content
Log in

Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Mitochondria are all-important organelles of eukaryotic cells since they are involved in processes associated with cellular mortality and human diseases. Therefore, trustworthy techniques are highly required for the identification of new mitochondrial proteins. We propose Mito-GSAAC system for prediction of mitochondrial proteins. The aim of this work is to investigate an effective feature extraction strategy and to develop an ensemble approach that can better exploit the advantages of this feature extraction strategy for mitochondria classification. We investigate four kinds of protein representations for prediction of mitochondrial proteins: amino acid composition, dipeptide composition, pseudo amino acid composition, and split amino acid composition (SAAC). Individual classifiers such as support vector machine (SVM), k-nearest neighbor, multilayer perceptron, random forest, AdaBoost, and bagging are first trained. An ensemble classifier is then built using genetic programming (GP) for evolving a complex but effective decision space from the individual decision spaces of the trained classifiers. The highest prediction performance for Jackknife test is 92.62% using GP-based ensemble classifier on SAAC features, which is the highest accuracy, reported so far on the Mitochondria dataset being used. While on the Malaria Parasite Mitochondria dataset, the highest accuracy is obtained by SVM using SAAC and it is further enhanced to 93.21% using GP-based ensemble. It is observed that SAAC has better discrimination power for mitochondria prediction over the rest of the feature extraction strategies. Thus, the improved prediction performance is largely due to the better capability of SAAC for discriminating between mitochondria and non-mitochondria proteins at the N and C terminus and the effective combination capability of GP. Mito-GSAAC can be accessed at http://111.68.99.218/Mito-GSAAC. It is expected that the novel approach and the accompanied predictor will have a major impact to Molecular Cell Biology, Proteomics, Bioinformatics, System Biology, and Drug Development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340:783–795

    Article  PubMed  Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Cai YD, Zhou GP, Chou KC (2003) Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J 84:3257–3263

    Article  PubMed  CAS  Google Scholar 

  • Cameron JM, Hurd T, Robinson BH (2005) Computational identification of human mitochondrial proteins based on homology to yeast mitochondrially targeted proteins. Bioinformatics 21:1825–1830

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2000) Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem Biophys Res Commun 278:477–483

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins Struct Funct Genet 43:246–255, erratum 44:60

    Google Scholar 

  • Chou KC (2005a) Review: progress in protein structural class prediction and its impact to bioinformatics and proteomics. Curr Protein Pept Sci 6:423–436

    Google Scholar 

  • Chou KC (2005b) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Google Scholar 

  • Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2006) Predicting protein–protein interactions from sequences in a hybridization space. J Proteome Res 5:316–322

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006a) Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 99:517–527

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2006b) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162

    Google Scholar 

  • Claros MG, Vincens P (1996) Computational method to predict mitochondrial proteins and their targeting sequences. Eur J Biochem 241:779–786

    Article  PubMed  CAS  Google Scholar 

  • Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform 7:518

    Article  Google Scholar 

  • Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016

    Article  PubMed  CAS  Google Scholar 

  • Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, Massachusetts, pp 48–156

  • Gerbitz KD, Gempel K, Brdiczka D (1996) Mitochondria and diabetes: genetic, biochemical, and clinical implications of the cellular energy circuit. Diabetes 45:113–126

    Article  PubMed  CAS  Google Scholar 

  • Gottlieb RA (2000) Programmed cell death. Drug News Perspect 13:471–476

    PubMed  CAS  Google Scholar 

  • Guda C, Fahy E, Subramaniam S (2004) MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins. Bioinformatics 20:1785–1794

    Article  PubMed  CAS  Google Scholar 

  • Guo YZ, Li M, Lu M, Wen Z, Wang K, Li G, Wu J (2006) Classifying GPCRs and NRs based on protein power spectrum from fast fourier transform. Amino Acids 30:397–402

    Article  PubMed  CAS  Google Scholar 

  • Hayat M, Khan A (2010) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271(1):10–17

    Article  Google Scholar 

  • Höglund A, Dönnes P, Blum T, Adolph HW, Kohlbacher O (2006) MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition. Bioinformatics 22:1158–1165

    Article  PubMed  Google Scholar 

  • Horton P, Park KJ, Obayashi T, Nakai K (2006) Protein subcellular localization prediction with WoLF PSORT. In: Proceedings of the fourth Annual Asia Pacific Bioinformatics Conference APBC06, Taipei, Taiwan, pp 39–48

  • Hu J, Fan Z (2009) Improving protein localization prediction using amino acid group based physiochemical encoding. BICoB 2009. LNBI 5462:248–258

    CAS  Google Scholar 

  • Hua SJ, Sun ZR (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17:721–728

    Article  PubMed  CAS  Google Scholar 

  • Huang Y, Li Y (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28

    Article  PubMed  CAS  Google Scholar 

  • Huang WL, Tung CW, Ho SW (2008) ProLoc-GO: utilizing informative gene ontology terms for sequences-based prediction of protein subcellular localization. BMC Bioinform 9:80

    Article  Google Scholar 

  • Hutchin T, Cortopassi GA (1995) A mitochondrial DNA clone is associated with increased risk for Alzheimer disease. Proc Natl Acad Sci USA 92:6892–6895

    Article  PubMed  CAS  Google Scholar 

  • Jassem W, Fuggle SV, Rela M, Koo DD, Heaton ND (2002) The role of mitochondria in ischemia/reperfusion injury. Transplantation 73:493–499

    Article  PubMed  CAS  Google Scholar 

  • Jiang L, Li ML, Wen ZN, Wang KL, Diao YB, Guo YZ, Liu LX (2006) Prediction of mitochondrial proteins using discrete wavelet transform. Protein J 25:241–249

    Article  PubMed  CAS  Google Scholar 

  • Khan A, Mirza AM (2007) Genetic perceptual shaping: utilizing cover image and conceivable attack information using genetic programming. Inform Fus 8(4):354–365

    Article  Google Scholar 

  • Khan A, Majid A, Mirza AM (2005) Combination and optimization of classifiers in gender classification using genetic programming. Int J Knowl Based Intell Eng Syst 9:11

    Google Scholar 

  • Khan A, Khan MF, Choi TS (2008a) Proximity based GPCRs prediction in transform domain. Biochem Biophys Res Commun 371(3):411–415

    Google Scholar 

  • Khan A, Tahir SF, Majid A, Choi TS (2008b) Machine learning based adaptive watermark decoding in view of an anticipated attack. Pattern Recogn 41:2594–2610

    Article  Google Scholar 

  • Khan A, Majid A, Choi TS (2010) Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers. Amino Acids 38:347–350

    Article  PubMed  CAS  Google Scholar 

  • Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

  • Kumar M, Verma R, Raghava GPS (2006) Prediction of mitochondrial proteins using support vector machine and hidden Markov model. J Biol Chem 281:5357–5363

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2008a) Using ensemble of classifiers in bioinformatics. In: Peters H, Vogel M Machine Learning Research Progress. Nova publisher, New York

  • Nanni L, Lumini A (2008b) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34(4):653–660

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Brahnam S, Lumini A (2010) High performance set of PseAAC and sequence based descriptors for protein classification. J Theor Biol 266(1):1–10

    Article  PubMed  CAS  Google Scholar 

  • Rodríguez JJ, Ludmila IK, Carlos JA (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630

    Article  PubMed  Google Scholar 

  • Shen HB, Chou KC (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355:1006–1011

    Article  PubMed  CAS  Google Scholar 

  • Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2006) Prediction of mitochondrial proteins based on genetic algorithm—partial least squares and support vector machine. Amino Acids (published online Oct 15 2006. doi:10.1007/s00726-006-0465-0)

  • Vapnik VN (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  • Verma R, Varshney Grish C, Raghava GPS (2009) Prediction of mitochondrial proteins of malaria parasite using split amino acid composition and PSSM profile 39(1):101–110

  • Wooten GF, Currie LJ, Bennett JP, Harrison MB, Trugman JM, Parker WD Jr (1997) Maternal inheritance in Parkinson’s disease. Ann Neurol 41:265–268

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao S, Ding Y, Huang Z, Chen X, Chou KC (2005) An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J Theor Biol 235:555–565

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Chou KC (2006a) A probability cellular automaton model for hepatitis B viral infections. Biochem Biophys Res Commun 342:605–610

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao S, Ding Y, Huang Z, Chou KC (2006b) Using cellular automata images and pseudo amino acid composition to predict protein sub-cellular location. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Zhang CX, Zhang JS (2008) RotBoost: a technique for combining Rotation Forest and AdaBoost. Pattern Recogn Lett. doi:10.1016/j.patrec.2008.03.006

Download references

Acknowledgments

This work was supported by the Department of Computer and Information Sciences (DCIS) at Pakistan Institute of Engineering and Applied Science (PIEAS), Pakistan.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yeon Soo Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Afridi, T.H., Khan, A. & Lee, Y.S. Mito-GSAAC: mitochondria prediction using genetic ensemble classifier and split amino acid composition. Amino Acids 42, 1443–1454 (2012). https://doi.org/10.1007/s00726-011-0888-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-011-0888-0

Keywords

Navigation