Abstract
Knowledge of the submitochondria location of protein is integral to understanding its function and a necessity in the proteomics era. In this work, a new submitochondria data set is constructed, and an approach for predicting protein submitochondria locations is proposed by combining the amino acid composition, dipeptide composition, reduced physicochemical properties, gene ontology, evolutionary information, and pseudo-average chemical shift. The overall prediction accuracy is 93.57% for the submitochondria location and 97.79% for the three membrane protein types in the mitochondria inner membrane using the algorithm of the increment of diversity combined with the support vector machine. The performance of the pseudo-average chemical shift is excellent. For contrast, the method is also used to predict submitochondria locations in the data set constructed by Du and Li; an accuracy of 94.95% is obtained by our method, which is better than that of other existing methods.
Similar content being viewed by others
References
Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P (2002) Molecular biology of the cell, 4th edn. Garland, New York
Andrade MA, O’Donoghue SI, Rost B (1998) Adaption of protein surface to subcellular location. J Mol Biol 276:517–525
Ashburner M, Ball CA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29
Berman HM, Westbrook J et al (2000) The protein data bank. Nucleic Acids Res 28:235–242
Bhasin M, Raghava GP (2004) ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST. Nucleic Acids Res 32:W414–W419 (Web Server issue)
Bi J, Yang H, Yan H, Song R, Fan J (2011) Knowledge-based virtual screening of HLA-A*0201-restricted CD8(+) T-cell epitope peptides from herpes simplex virus genome. J Theor Biol 281:133–139
Cai YD, Chou KC (2000) Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Cell Biol Res Commun 4:172–173
Cai YD, Chou KC (2003) Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 305:407–411
Cai YD, Liu XJ et al (2000) Support vector machines for prediction of protein subcellular location. Mol Cell Biol Res Commun 4:230–233
Cai YD, Liu XJ et al (2002a) Support vector machines for the classification and prediction of β-turn types. J Pept Sci 8:297–301
Cai YD, Liu XJ, Xu XB, Chou KC (2002b) Support vector machines for predicting HIV protease cleavage sites in protein. J Comput Chem 23:267–274
Cai YD, Liu XJ, Xu XB, Chou KC (2002c) Support vector machines for predicting the specificity of GalNAc-transferase. Peptides 23:205–208
Cai YD, Liu XJ et al (2002d) Prediction of protein structural classes by support vector machines. Comput Chem 26:293–296
Cai YD, Lin S, Chou KC (2003a) Support vector machines for prediction of protein signal sequences and their cleavage sites. Peptides 24:159–161
Cai YD, Zhou GP, Chou KC (2003b) Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J 84:3257–3263
Cai YD, Feng KY, Li YX, Chou KC (2003c) Support vector machine for predicting α-turn types. Peptides 24:629–630
Cai YD, Zhou GP, Jen CH, Lin SL, Chou KC (2004a) Identify catalytic triads of serine hydrolases by support vector machines. J Theor Biol 228:551–557
Cai YD, Pong-Wong R, Feng K, Jen JCH, Chou KC (2004b) Application of SVM to predict membrane protein types. J Theor Biol 226:373–376
Cai YD, Ricardo PW et al (2004c) Application of SVM to predict membrane protein types. J Theor Biol 226:373–376
Cai YD, Lu L et al (2010) Predicting subcellular location of proteins using integrated-algorithm method. Mol Divers 14:551–558
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Transact Intell Syst Technol 2:27:1–27:27. doi: 10.1145/1961189.1961199. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chen YL, Li QZ (2007a) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 248:377–381
Chen YL, Li QZ (2007b) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245:775–783
Chen C, Chen L, Zou X, Cai P (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Pept Lett 16:27–31
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43:246–255
Chou KC (2009) Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr Proteomics 6:262–274
Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 273:236–247
Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769
Chou KC, Cai YD (2003) A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem Biophys Res Commun 311:743–747
Chou KC, Cai YD (2004) Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 320:1236–1239
Chou KC, Cai YD (2005) Using GO-PseAA predictor to identify membrane proteins and their types. Biochem Biophys Res Commun 327:845–847
Chou KC, Shen HB (2006a) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897
Chou KC, Shen HB (2006b) Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 99:517–527
Chou KC, Shen HB (2007) Recent progress in protein subcellular location prediction. Anal Biochem 370:1–16
Chou KC, Shen HB (2008) Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
Chou KC, Shen HB (2009) Review: recent advances in developing web-servers for predicting protein attributes. Nat Sci 2:63–92 (openly accessible at http://www.scirp.org/journal/NS/)
Chou KC, Shen HB (2010a) Cell-PLoc2.: a improved package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Sci 2:1090–1103
Chou KC, Shen HB (2010b) A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One 5:e9931
Chou KC, Shen HB (2010c) Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PLoS One 5:e11335
Chou KC, Zhang CT (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Chou KC, Wu ZC, Xiao X (2011) iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6:e18258 (50th Anniversary Year Review)
Cotter D, Guda P et al (2004) MitoProteome: mitochondrial protein sequence database and annotation system. Nucleic Acids Res 32:D463–D467 (Database issue)
Ding YS, Zhang TL, Chou KC (2007) Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. Protein Pept Lett 14:811–815
Ding H, Luo L, Lin H (2009) Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition. Protein Pept Lett 16:351–355
Ding H, Liu L, Guo FB, Huang J, Lin H (2011) Identify Golgi protein types with modified mahalanobis discriminant algorithm and pseudo amino acid composition. Protein Pept Lett 18:58–63
Du P, Li YD (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinforma 7:518–525
Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263:203–209
Feng ZP (2002) An overview on predicting the subcellular location of a protein. In Silico Biol 2:291–303
Fyshe A, Liu Y et al (2008) Improving subcellular localization prediction using text classification and the gene ontology. Bioinformatics 24:2512–2517
Gao QB, Ye XF et al (2010) Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. Anal Biochem 398:52–59
Georgiou DN, Karakasidis TE, Nieto JJ, Torres A (2009) Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. J Theor Biol 257:17–26
Gottlieb RA (2000) Programmed cell death. Drug News Perspect 13:471–476
Gu Q, Ding YS, Zhang TL (2010a) Prediction of G-protein-coupled receptor classes in low homology using chou’s pseudo amino acid composition with approximate entropy and hydrophobicity patterns. Protein Pept Lett 17:559–567
Gu Q, Ding YS et al (2010b) Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection. Amino Acids 38:975–983
Hayat M, Khan A (2011) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271:10–17
Hu L, Zheng L, Wang Z, Li B, Liu L (2011) Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features. Protein Pept Lett 18:552–558
Huang WL, Tung CW et al (2008) ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization. BMC Bioinforma 9:80
Jassem W, Heaton ND (2004) The role of mitochondria in ischemia/reperfusion injury in organ transplantation. Kidney Int 66:514–517
Jiang X, Wei R, Zhang TL, Gu Q (2008a) Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. Protein Pept Lett 15:392–396
Jiang X, Wei R et al (2008b) Using Chou’s pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location. Amino Acids 34:669–675
Joshi RR, Sekharan S (2010) Characteristic peptides of protein secondary structural motifs. Protein Pept Lett 17:1198–1206
Kandaswamy KK, Pugalenthi G, Moller S, Hartmann E, Kalies KU, Suganthan PN, Martinetz T (2010) Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. Protein Pept Lett 17:1473–1479
Kandaswamy KK, Chou KC, Martinetz T, Moller S, Suganthan PN, Sridharan S, Pugalenthi G (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270:56–62
Lee K, Chuang HY et al (2008) Protein networks markedly improve prediction of subcellular localization in multiple eukaryotic species. Nucleic Acids Res 36:e136
Li FM, Li QZ (2008a) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Pept Lett 15:612–616
Li FM, Li QZ (2008b) Using pseudo amino acid composition to predict protein subnuclear location with improved hybrid approach. Amino Acids 34:119–125
Li QZ, Lu ZQ (2001) The prediction of the structural class of protein: application of the measure of diversity. J Theor Biol 213:493–502
Li W, Jaroszewski L et al (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17:282–283
Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252:350–356
Lin H, Ding H (2011) Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. J Theor Biol 269:64–69
Lin H, Ding H et al (2008) Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Pept Lett 15:739–744
Liu T, Zheng X, Wang C, Wang J (2010) Prediction of subcellular location of apoptosis proteins using pseudo amino acid composition: an approach from auto covariance transformation. Protein Pept Lett 17:1263–1269
Luginbuhl P, Szyperski T, Wuthrich K (1995) Statistical basis for the use of 13C a chemical shifts in protein structure determination. J Magn Reson B 109:229–233
Matsuda S, Vert JP et al (2005) A novel representation of protein sequences for prediction of subcellular location using support vector machines. Protein Sci 14:2804–2813
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451
Mielke SP, Krishnan VV (2003) Protein structural class identification directly from NMR spectra using averaged chemical shifts. Bioinformatics 19:2054–2064
Mohabatkar H (2010) Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept Lett 17:1207–1214
Mohabatkar H, Beigi MM, Esmaeili A (2011) Prediction of GABA (A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281:18–23
Nair R, Rost B (2003) Better prediction of sub-cellular localization by combining evolutionary and structural information. Proteins 53:917–930
Nanni L, Lumini A (2008) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34:653–660
Nanni L, Brahnam S, Lumini A (2010) High performance set of PseAAC and sequence based descriptors for protein classification. J Theor Biol 266:1–10
Park KJ, Kanehisa M (2003) Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19:1656–1663
Pollastri G, McLysaght A (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21:1719–1720
Pollastri G, Martin AJ et al (2007) Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinforma 8:201
Qiu JD, Huang JH, Shi SP, Liang RP (2010) Using the concept of Chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett 17:715–722
Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26:2230–2236
Schaffer AA, Aravind L et al (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29:2994–3005
Scharfe C, Zaccaria P et al (2000) MITOP, the mitochondrial proteome database: 2000 update. Nucleic Acids Res 28:155–158
Seavey BR, Farr EA et al (1991) A relational database for sequence-specific protein NMR data. J Biomol NMR 1:217–236
Shi JY, Zhang SW et al (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74
Sibley AB, Cosman M, Krishnan VV (2003) An empirical correlation between secondary structure content and averaged chemical shifts in proteins. Biophys J 84(2):1223–1227
Spera S, Bax A (1991) Empirical correlation between protein backbone conformation and C a and C β 13C nuclear magnetic resonance chemical shifts. J Am Chem Soc 113:5490–5492
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wang W, Geng XB et al (2011) Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach. Protein Pept Lett (e-pub ahead of print)
Wishart DS, Sykes BD, Richards FM (1991) Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. J Mol Biol 222:311–333
Wu CH, Apweiler R et al (2006) The universal protein resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34:D187–D191 (Database issue)
Xiao X, Wu ZC, Chou KC (2011a) A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 6:e20592
Xiao X, Wu ZC, Chou KC (2011b) iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. J Theor Biol 284:42–51
Yu L, Guo Y, Li Y, Li G, Li M, Luo J, Xiong W, Qin W (2010) SecretP: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition. J Theor Biol 267:1–6
Zakeri P, Moshiri B, Sadeghi M (2011) Prediction of protein submitochondria locations based on data fusion of various features of sequences. J Theor Biol 269:208–216
Zeng YH, Guo YZ et al (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259:366–372
Zhang GY, Fang BS (2008) Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo-amino acid composition. J Theor Biol 253:310–315
Zhang GY, Li HC et al (2008) Predicting lipase types by improved Chou’s pseudo-amino acid composition. Protein Pept Lett 15:1132–1137
Zhao Y, Alipanahi B et al (2010) Protein secondary structure prediction using NMR chemical shift data. J Bioinform Comput Biol 8:867–884
Zhou GP (1998) An intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738
Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins 44:57–59
Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins 50:44–48
Zhou XB, Chen C et al (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551
Acknowledgments
The authors would like to thank the reviewers for their helpful comments on our manuscript. This work was supported by a grant from the National Natural Science Foundation of China (61063016, 31160188), The Research Fund for the Doctoral Program of Higher Education of China (no.20101501110004), The Project for ‘211’ Innovative Talents of Inner Mongolia University (no. 2-1.2.1_035), and the Inner Mongolia University Fund for Young Scholars (no. 208152).
Conflict of interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fan, GL., Li, QZ. Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 43, 545–555 (2012). https://doi.org/10.1007/s00726-011-1143-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-011-1143-4