Abstract
Mitochondrion is the key organelle of eukaryotic cell, which provides energy for cellular activities. Submitochondrial locations of proteins play crucial role in understanding different biological processes such as energy metabolism, program cell death, and ionic homeostasis. Prediction of submitochondrial locations through conventional methods are expensive and time consuming because of the large number of protein sequences generated in the last few decades. Therefore, it is intensively desired to establish an automated model for identification of submitochondrial locations of proteins. In this regard, the current study is initiated to develop a fast, reliable, and accurate computational model. Various feature extraction methods such as dipeptide composition (DPC), Split Amino Acid Composition, and Composition and Translation were utilized. In order to overcome the issue of biasness, oversampling technique SMOTE was applied to balance the datasets. Several classification learners including K-Nearest Neighbor, Probabilistic Neural Network, and support vector machine (SVM) are used. Jackknife test is applied to assess the performance of classification algorithms using two benchmark datasets. Among various classification algorithms, SVM achieved the highest success rates in conjunction with the condensed feature space of DPC, which are 95.20 % accuracy on dataset SML3-317 and 95.11 % on dataset SML3-983. The empirical results revealed that our proposed model obtained the highest results so far in the literatures. It is anticipated that our proposed model might be useful for future studies.
This is a preview of subscription content, access via your institution.
References
Ahmad S, Kabir M, Hayat M (2015) Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou’s general PseAAC. Comput Methods Programs Biomed 122:165–174
Ali S, Majid A, Khan A (2014) IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46:977–993
Asifullah K, Tahir SF (2008) Intelligent extraction of a digital watermark from a distorted image. IEICE Trans Inf Syst 91:2072–2075
Bartenhagen C, Klein H-U, Ruckert C, Jiang X, Dugas M (2010) Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics 11:567
Berardi MJ, Chou JJ (2014) Fatty acid flippase activity of UCP2 is essential for its proton transport in mitochondria. Cell Metab 20:541–552
Berardi MJ, Shih WM, Harrison SC, Chou JJ (2011) Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching. Nature 476:109–113
Cao D-S, Xu Q-S, Liang Y-Z (2013) Propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29:960–962
Chen W, Feng P-M, Deng E-Z, Lin H, Chou K-C (2014a) iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem 462:76–83
Chen W, Feng P-M, Lin H, Chou K-C (2014b) iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res Int. doi:10.1155/2014/623149
Chen W, Feng P, Ding H, Lin H, Chou K-C (2015) iRNA-Methyl: identifying N 6-methyladenosine sites using pseudo nucleotide composition. Anal Biochem 490:26–33
Chou K-C (2001a) Using subsite coupling to predict signal peptides. Protein Eng 14:75–79
Chou KC (2001b) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43:246–255
Chou K-C (2004) Structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134
Chou K-C (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol BioSyst 9:1092–1100
Chou K-C (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11:218–234
Chou K-C, Shen H-B (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
Chou K-C, Zhang C-T (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349
Chou K-C, Wu Z-C, Xiao X (2011) iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6:e18258
Chou K-C, Wu Z-C, Xiao X (2012) iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol BioSyst 8:629–641
Ding H, Deng E-Z, Yuan L-F, Liu L, Lin H, Chen W, Chou K-C (2014) iCTX-Type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res Int. doi:10.1155/2014/286419
Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518
Du P, Yu Y (2013) SubMito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions. BioMed Res Int. doi:10.1155/2013/263829
Du P, Wang X, Xu C, Gao Y (2012) PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425:117–119
Du P, Gu S, Jiao Y (2014) PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15:3495–3506
Duda R (2001) PE hart and DG Stork, pattern classification. Wiley-Interscience, New York
Fan G-L, Li Q-Z (2012) Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 43:545–555
Feng P-M, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442:118–125
Gao Q-B, Ye X-F, Jin Z-C, He J (2010) Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. Anal Biochem 398:52–59
Georgiou V, Pavlidis N, Parsopoulos K, Alevizos PD, Vrahatis M (2004) Optimizing the performance of probabilistic neural networks in a bioinformatics task. In: Proceedings of the EUNITE 2004 Conference, pp 34–40
Gottlieb RA (2000) Programmed cell death. Drug News Perspect 13:471–476
Han J, Kamber M, Pei J (2006) Data mining, southeast asia edition: concepts and techniques Morgan kaufmann
Hayat M, Iqbal N (2014) Discriminating protein structure classes by incorporating pseudo average chemical shift to Chou’s general PseAAC and support vector machine. Comput Methods Programs Biomed 116:184–192
Hayat M, Khan A (2012a) MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. J Theor Biol 292:93–102
Hayat M, Khan A (2012b) Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features. Commun IET 6:3257–3264
He X, Han K, Hu J, Yan H, Yang J-Y, Shen H-B, Yu D-J (2015) TargetFreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition. J Membr Biol 248:1–10
Huang T, Shi X-H, Wang P, He Z, Feng K-Y, Hu L, Kong X, Li Y-X, Cai Y-D, Chou K-C (2010) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5:e10972
Huang T, Wan S, Xu Z, Zheng Y, Feng K-Y, Li H-P, Kong X, Cai Y-D (2011) Analysis and prediction of translation rate based on sequence and functional features of the mRNA. PLoS One 6:e16036
Jassem W, Fuggle SV, Rela M, Koo DD, Heaton ND (2002) The role of mitochondria in ischemia/reperfusion injury. Transplantation 73:493–499
Kabir Muhammad HM (2015). iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genomics 1–12
Kabir M, Iqbal M, Ahmad S, Hayat M (2015) iTIS-PseKNC: identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput Biol Med 66:252–257
Khan A, Khan M, Choi T-S (2008) Proximity based GPCRs prediction in transform domain. Biochem Biophys Res Commun 371:411–415
Khan ZU, Hayat M, Khan MA (2015) Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J Theor Biol 365:197–203
Lakhina S, Joseph S, Verma B (2010) Feature reduction using principal component analysis for effective anomaly–based intrusion detection on NSL-KDD. Int J Eng Sci Technol 2(6):1790–1799
Li Z-R, Lin HH, Han L, Jiang L, Chen X, Chen YZ (2006) PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 34:W32–W37
Li W-C, Deng E-Z, Ding H, Chen W, Lin H (2015) iORI-PseKNC: a predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemometr Intell Lab Syst 141:100–106
Lin H, Chen W, Yuan L-F, Li Z-Q, Ding H (2013a) Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor 61:259–268
Lin W-Z, Fang J-A, Xiao X, Chou K-C (2013b) iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol BioSyst 9:634–644
Liu W, Chou K (1999) Protein secondary structural content prediction. Protein Eng 12:1041–1050
Liu B, Chen J, Wang X (2015a) Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics. doi:10.1007/s00438-015-1044-4
Liu B, Fang L, Chen J, Liu F, Wang X (2015b) miRNA-dis: microRNA precursor identification based on distance structure status pairs. Mol BioSyst 11:1194–1204
Liu B, Fang L, Liu F, Wang X, Chou K-C (2015c) iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J Biomol Struct Dyn 3:1–13
Liu B, Fang L, Long R, Lan X, Chou K-C (2015d) iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics. doi:10.1093/bioinformatics/btv604
Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C (2015e) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. doi:10.1093/nar/gkv458
Liu Z, Xiao X, Qiu W-R, Chou K-C (2015f) iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem 474:69–77
Nanni L, Lumini A (2008) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34:653–660
Qiu W-R, Xiao X, Chou K-C (2014a) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15:1746–1766
Qiu W-R, Xiao X, Lin W-Z, Chou K-C (2014b) iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn 33:1–12
Shi S-P, Qiu J-D, Sun X-Y, Huang J-H, Huang S-Y, Suo S-B, Liang R-P, Zhang L (2011) Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction. Biochim Biophy Acta 1813:424–430
Sounier R, Bellot G, Chou JJ (2015) Mapping conformational heterogeneity of mitochondrial nucleotide transporter in uninhibited states. Angew Chem 127:2466–2471
Specht DF (1990) Probabilistic neural networks. Neural networks 3:109–118
Vapnik V (1998) Statistical learning theory. Wiley, New York
Vapnik V (2000) The nature of statistical learning theory. Springer Science & Business Media, Berlin
Wu C, Apweiler R, Bairoch A, Natale D, Barker W, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R (2005) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34:187–191
Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C (2013) iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177
Xiao X, Hui M-J, Liu Z, Qiu W-R (2015a) iCataly-PseAAC: identification of enzymes catalytic sites using sequence evolution information with grey model GM (2, 1). J Membr Biol 248:1–9
Xiao X, Min J-L, Lin W-Z, Liu Z, Cheng X, Chou K-C (2015b) iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. J Biomol Struct Dyn 33:1–13
Xu Y, Ding J, Wu L-Y, Chou K-C (2013) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE 8:e55844
Yang Q, Brüschweiler S, Chou JJ (2014) A self-sequestered calmodulin-like Ca2+ sensor of mitochondrial SCaMC carrier and its implication to Ca2+-dependent ATP-Mg/P i transport. Structure 22:209–217
Zakeri P, Moshiri B, Sadeghi M (2011) Prediction of protein submitochondria locations based on data fusion of various features of sequences. J Theor Biol 269:208–216
Zeng Y-h, Guo Y-z, Xiao R-q, Yang L, Yu L-z, Li M-l (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259:366–372
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Authors have no conflict of interest.
Rights and permissions
About this article
Cite this article
Ahmad, K., Waris, M. & Hayat, M. Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou’s General Pseudo Amino Acid Composition. J Membrane Biol 249, 293–304 (2016). https://doi.org/10.1007/s00232-015-9868-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00232-015-9868-8
Keywords
- Mitochondria
- Dipeptide composition
- SAAC
- SVM
- SMOTE
- PCA