Abstract
The mitochondrion is a key organelle of eukaryotic cell that provides the energy for cellular activities. Correctly identifying submitochondria locations of proteins can provide plentiful information for understanding their functions. However, using web-experimental methods to recognize submitochondria locations of proteins are time-consuming and costly. Thus, it is highly desired to develop a bioinformatics method to predict the submitochondria locations of mitochondrion proteins. In this work, a novel method based on support vector machine was developed to predict the submitochondria locations of mitochondrion proteins by using over-represented tetrapeptides selected by using binomial distribution. A reliable and rigorous benchmark dataset including 495 mitochondrion proteins with sequence identity ≤25 % was constructed for testing and evaluating the proposed model. Jackknife cross-validated results showed that the 91.1 % of the 495 mitochondrion proteins can be correctly predicted. Subsequently, our model was estimated by three existing benchmark datasets. The overall accuracies are 94.0, 94.7 and 93.4 %, respectively, suggesting that the proposed model is potentially useful in the realm of mitochondrion proteome research. Based on this model, we built a predictor called TetraMito which is freely available at http://lin.uestc.edu.cn/server/TetraMito.
Similar content being viewed by others
References
Chou KC, Shen HB (2007) Recent progress in protein subcellular location prediction. Anal Biochem 370:1–16
Chou KC, Shen HB (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Ding H, Liu L, Guo FB, Huang J, Lin H (2011) Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett 18:58–63
Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform 7:518
Du P, Cao S, Li Y (2009) SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 261:330–335
Du P, Li T, Wang X (2011) Recent progress in predicting protein sub-subcellular locations. Expert Rev Proteomics 8:391–404
Fan GL, Li QZ (2012) Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 43:545–555
Fan RE, Chen PH, Lin CJ (2005) Working set selection using the second order information for training SVM. J Mach Learn Res 6:1889–1918
Feng Y, Luo L (2008) Use of tetrapeptide signals for protein secondary-structure prediction. Amino Acids 35:607–614
Henze K, Martin W (2003) Evolutionary biology: essence of mitochondria. Nature 426:127–128
Huang WL, Tung CW, Huang HL, Hwang SF, Ho SY (2007) ProLoc: prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features. Biosystems 90:573–581
Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY (2008) ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinform 9:80
Huang WL, Tung CW, Huang HL, Ho SY (2009) Predicting protein subnuclear localization using GO-amino-acid composition features. Biosystems 98:73–79
Jiang X, Wei R, Zhao Y, Zhang T (2008) Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location. Amino Acids 34:669–675
Lei Z, Dai Y (2005) An SVM-based system for predicting protein subnuclear localizations. BMC Bioinform 6:291
Lei Z, Dai Y (2006) Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction. BMC Bioinform 7:491
Li FM, Li QZ (2008) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids 34:119–125
Mei S (2012) Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization. J Theor Biol 293:121–130
Mei S, Fei W (2010) Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinform Suppl 1:S17
Nanni L, Lumini A (2008) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34:653–660
Polianskyte Z, Peitsaro N, Dapkunas A, Liobikas J, Soliymani R, Lalowski M, Speer O, Seitsonen J, Butcher S, Cereghetti GM, Linder MD, Merckel M, Thompson J, Eriksson O (2009) LACTB is a filament-forming protein localized in mitochondria. Proc Natl Acad Sci USA 106:18960–18965
Rackovsky S (1993) On the nature of protein folding code. Proc Natl Acad Sci USA 90:644–648
Shen HB, Chou KC (2005) Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition. Biochem Biophys Res Commun 337:752–756
Shen HB, Chou KC (2007) Nuc-PLoc: a new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM. Protein Eng Des Sel 20:561–567
Shi Y (2002) A conserved tetrapeptide motif: potentiating apoptosis through IAP-binding. Cell Death Differ 9:93–95
Shi SP, Qiu JD, Sun XY, Huang JH, Huang SY, Suo SB, Liang RP, Zhang L (2011) Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction. Biochim Biophys Acta 1813:424–430
Stuart GW, Moffett K, Leader JJ (2002) A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. Mol Biol Evol 19:554–562
UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40:D71–D75
van Dijk AD, Bosch D, ter Braak CJ, van der Krol AR, van Ham RC (2008) Predicting sub-Golgi localization of type II membrane proteins. Bioinformatics 24:1779–1786
Verhagen AM, Kratina TK, Hawkins CJ, Silke J, Ekert PG, Vaux DL (2007) Identification of mammalian mitochondrial proteins that interact with IAPs via N-terminal IAP binding motifs. Cell Death Differ 14:348–357
Wang G, Dunbrack RL Jr (2005) PISCES: recent improvements to a PDB sequence culling server. Nucleic Acids Res 33:W94–W98
Zakeri P, Moshiri B, Sadeghi M (2011) Prediction of protein submitochondria locations based on data fusion of various features of sequences. J Theor Biol 269:208–216
Zeng YH, Guo YZ, Xiao RQ, Yang L, Yu LZ, Li ML (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259:366–372
Acknowledgments
We are grateful to Dr. Loris Nanni for his help. This work was supported by the National Nature Scientific Foundation of China (No. 61202256, 61100092), the Project of Education Department in Sichuan (12ZA112), the Fundamental Research Funds for the Central Universities (ZYGX2012J113) and the Scientific Research Startup Foundation of UESTC.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Lin, H., Chen, W., Yuan, LF. et al. Using Over-Represented Tetrapeptides to Predict Protein Submitochondria Locations. Acta Biotheor 61, 259–268 (2013). https://doi.org/10.1007/s10441-013-9181-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10441-013-9181-9