Skip to main content

Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou’s General Pseudo Amino Acid Composition

Abstract

Mitochondrion is the key organelle of eukaryotic cell, which provides energy for cellular activities. Submitochondrial locations of proteins play crucial role in understanding different biological processes such as energy metabolism, program cell death, and ionic homeostasis. Prediction of submitochondrial locations through conventional methods are expensive and time consuming because of the large number of protein sequences generated in the last few decades. Therefore, it is intensively desired to establish an automated model for identification of submitochondrial locations of proteins. In this regard, the current study is initiated to develop a fast, reliable, and accurate computational model. Various feature extraction methods such as dipeptide composition (DPC), Split Amino Acid Composition, and Composition and Translation were utilized. In order to overcome the issue of biasness, oversampling technique SMOTE was applied to balance the datasets. Several classification learners including K-Nearest Neighbor, Probabilistic Neural Network, and support vector machine (SVM) are used. Jackknife test is applied to assess the performance of classification algorithms using two benchmark datasets. Among various classification algorithms, SVM achieved the highest success rates in conjunction with the condensed feature space of DPC, which are 95.20 % accuracy on dataset SML3-317 and 95.11 % on dataset SML3-983. The empirical results revealed that our proposed model obtained the highest results so far in the literatures. It is anticipated that our proposed model might be useful for future studies.

This is a preview of subscription content, access via your institution.

References

  • Ahmad S, Kabir M, Hayat M (2015) Identification of Heat Shock Protein families and J-protein types by incorporating Dipeptide Composition into Chou’s general PseAAC. Comput Methods Programs Biomed 122:165–174

    Article  PubMed  Google Scholar 

  • Ali S, Majid A, Khan A (2014) IDM-PhyChm-Ens: intelligent decision-making ensemble methodology for classification of human breast cancer using physicochemical properties of amino acids. Amino Acids 46:977–993

    Article  CAS  PubMed  Google Scholar 

  • Asifullah K, Tahir SF (2008) Intelligent extraction of a digital watermark from a distorted image. IEICE Trans Inf Syst 91:2072–2075

    Google Scholar 

  • Bartenhagen C, Klein H-U, Ruckert C, Jiang X, Dugas M (2010) Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data. BMC Bioinformatics 11:567

    Article  PubMed  PubMed Central  Google Scholar 

  • Berardi MJ, Chou JJ (2014) Fatty acid flippase activity of UCP2 is essential for its proton transport in mitochondria. Cell Metab 20:541–552

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Berardi MJ, Shih WM, Harrison SC, Chou JJ (2011) Mitochondrial uncoupling protein 2 structure determined by NMR molecular fragment searching. Nature 476:109–113

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Cao D-S, Xu Q-S, Liang Y-Z (2013) Propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29:960–962

    Article  CAS  PubMed  Google Scholar 

  • Chen W, Feng P-M, Deng E-Z, Lin H, Chou K-C (2014a) iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal Biochem 462:76–83

    Article  CAS  PubMed  Google Scholar 

  • Chen W, Feng P-M, Lin H, Chou K-C (2014b) iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. BioMed Res Int. doi:10.1155/2014/623149

    Google Scholar 

  • Chen W, Feng P, Ding H, Lin H, Chou K-C (2015) iRNA-Methyl: identifying N 6-methyladenosine sites using pseudo nucleotide composition. Anal Biochem 490:26–33

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C (2001a) Using subsite coupling to predict signal peptides. Protein Eng 14:75–79

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2001b) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 43:246–255

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C (2004) Structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C (2013) Some remarks on predicting multi-label attributes in molecular biosystems. Mol BioSyst 9:1092–1100

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11:218–234

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C, Shen H-B (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C, Zhang C-T (1995) Prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  CAS  PubMed  Google Scholar 

  • Chou K-C, Wu Z-C, Xiao X (2011) iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One 6:e18258

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chou K-C, Wu Z-C, Xiao X (2012) iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. Mol BioSyst 8:629–641

    Article  CAS  PubMed  Google Scholar 

  • Ding H, Deng E-Z, Yuan L-F, Liu L, Lin H, Chen W, Chou K-C (2014) iCTX-Type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res Int. doi:10.1155/2014/286419

    Google Scholar 

  • Du P, Li Y (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinformatics 7:518

    Article  PubMed  PubMed Central  Google Scholar 

  • Du P, Yu Y (2013) SubMito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions. BioMed Res Int. doi:10.1155/2013/263829

    Google Scholar 

  • Du P, Wang X, Xu C, Gao Y (2012) PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425:117–119

    Article  CAS  PubMed  Google Scholar 

  • Du P, Gu S, Jiao Y (2014) PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15:3495–3506

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Duda R (2001) PE hart and DG Stork, pattern classification. Wiley-Interscience, New York

    Google Scholar 

  • Fan G-L, Li Q-Z (2012) Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 43:545–555

    Article  CAS  PubMed  Google Scholar 

  • Feng P-M, Chen W, Lin H, Chou K-C (2013) iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 442:118–125

    Article  CAS  PubMed  Google Scholar 

  • Gao Q-B, Ye X-F, Jin Z-C, He J (2010) Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. Anal Biochem 398:52–59

    Article  CAS  PubMed  Google Scholar 

  • Georgiou V, Pavlidis N, Parsopoulos K, Alevizos PD, Vrahatis M (2004) Optimizing the performance of probabilistic neural networks in a bioinformatics task. In: Proceedings of the EUNITE 2004 Conference, pp 34–40

  • Gottlieb RA (2000) Programmed cell death. Drug News Perspect 13:471–476

    CAS  PubMed  Google Scholar 

  • Han J, Kamber M, Pei J (2006) Data mining, southeast asia edition: concepts and techniques Morgan kaufmann

  • Hayat M, Iqbal N (2014) Discriminating protein structure classes by incorporating pseudo average chemical shift to Chou’s general PseAAC and support vector machine. Comput Methods Programs Biomed 116:184–192

    Article  PubMed  Google Scholar 

  • Hayat M, Khan A (2012a) MemHyb: predicting membrane protein types by hybridizing SAAC and PSSM. J Theor Biol 292:93–102

    Article  CAS  PubMed  Google Scholar 

  • Hayat M, Khan A (2012b) Prediction of membrane protein types by using dipeptide and pseudo amino acid composition-based composite features. Commun IET 6:3257–3264

    Article  Google Scholar 

  • He X, Han K, Hu J, Yan H, Yang J-Y, Shen H-B, Yu D-J (2015) TargetFreeze: identifying antifreeze proteins via a combination of weights using sequence evolutionary information and pseudo amino acid composition. J Membr Biol 248:1–10

    Article  Google Scholar 

  • Huang T, Shi X-H, Wang P, He Z, Feng K-Y, Hu L, Kong X, Li Y-X, Cai Y-D, Chou K-C (2010) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS One 5:e10972

    Article  PubMed  PubMed Central  Google Scholar 

  • Huang T, Wan S, Xu Z, Zheng Y, Feng K-Y, Li H-P, Kong X, Cai Y-D (2011) Analysis and prediction of translation rate based on sequence and functional features of the mRNA. PLoS One 6:e16036

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Jassem W, Fuggle SV, Rela M, Koo DD, Heaton ND (2002) The role of mitochondria in ischemia/reperfusion injury. Transplantation 73:493–499

    Article  CAS  PubMed  Google Scholar 

  • Kabir Muhammad HM (2015). iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genomics 1–12

  • Kabir M, Iqbal M, Ahmad S, Hayat M (2015) iTIS-PseKNC: identification of Translation Initiation Site in human genes using pseudo k-tuple nucleotides composition. Comput Biol Med 66:252–257

    Article  CAS  PubMed  Google Scholar 

  • Khan A, Khan M, Choi T-S (2008) Proximity based GPCRs prediction in transform domain. Biochem Biophys Res Commun 371:411–415

    Article  CAS  PubMed  Google Scholar 

  • Khan ZU, Hayat M, Khan MA (2015) Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J Theor Biol 365:197–203

    Article  CAS  PubMed  Google Scholar 

  • Lakhina S, Joseph S, Verma B (2010) Feature reduction using principal component analysis for effective anomaly–based intrusion detection on NSL-KDD. Int J Eng Sci Technol 2(6):1790–1799

    Google Scholar 

  • Li Z-R, Lin HH, Han L, Jiang L, Chen X, Chen YZ (2006) PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 34:W32–W37

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li W-C, Deng E-Z, Ding H, Chen W, Lin H (2015) iORI-PseKNC: a predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemometr Intell Lab Syst 141:100–106

    Article  CAS  Google Scholar 

  • Lin H, Chen W, Yuan L-F, Li Z-Q, Ding H (2013a) Using over-represented tetrapeptides to predict protein submitochondria locations. Acta Biotheor 61:259–268

    Article  PubMed  Google Scholar 

  • Lin W-Z, Fang J-A, Xiao X, Chou K-C (2013b) iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. Mol BioSyst 9:634–644

    Article  CAS  PubMed  Google Scholar 

  • Liu W, Chou K (1999) Protein secondary structural content prediction. Protein Eng 12:1041–1050

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Chen J, Wang X (2015a) Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics. doi:10.1007/s00438-015-1044-4

    Google Scholar 

  • Liu B, Fang L, Chen J, Liu F, Wang X (2015b) miRNA-dis: microRNA precursor identification based on distance structure status pairs. Mol BioSyst 11:1194–1204

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Fang L, Liu F, Wang X, Chou K-C (2015c) iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J Biomol Struct Dyn 3:1–13

    Google Scholar 

  • Liu B, Fang L, Long R, Lan X, Chou K-C (2015d) iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics. doi:10.1093/bioinformatics/btv604

    Google Scholar 

  • Liu B, Liu F, Wang X, Chen J, Fang L, Chou K-C (2015e) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. doi:10.1093/nar/gkv458

    Google Scholar 

  • Liu Z, Xiao X, Qiu W-R, Chou K-C (2015f) iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem 474:69–77

    Article  CAS  PubMed  Google Scholar 

  • Nanni L, Lumini A (2008) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34:653–660

    Article  CAS  PubMed  Google Scholar 

  • Qiu W-R, Xiao X, Chou K-C (2014a) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15:1746–1766

    Article  PubMed  PubMed Central  Google Scholar 

  • Qiu W-R, Xiao X, Lin W-Z, Chou K-C (2014b) iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model. J Biomol Struct Dyn 33:1–12

    Google Scholar 

  • Shi S-P, Qiu J-D, Sun X-Y, Huang J-H, Huang S-Y, Suo S-B, Liang R-P, Zhang L (2011) Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction. Biochim Biophy Acta 1813:424–430

    Article  CAS  Google Scholar 

  • Sounier R, Bellot G, Chou JJ (2015) Mapping conformational heterogeneity of mitochondrial nucleotide transporter in uninhibited states. Angew Chem 127:2466–2471

    Article  Google Scholar 

  • Specht DF (1990) Probabilistic neural networks. Neural networks 3:109–118

    Article  Google Scholar 

  • Vapnik V (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  • Vapnik V (2000) The nature of statistical learning theory. Springer Science & Business Media, Berlin

    Book  Google Scholar 

  • Wu C, Apweiler R, Bairoch A, Natale D, Barker W, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R (2005) The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 34:187–191

    Article  Google Scholar 

  • Xiao X, Wang P, Lin W-Z, Jia J-H, Chou K-C (2013) iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 436:168–177

    Article  CAS  PubMed  Google Scholar 

  • Xiao X, Hui M-J, Liu Z, Qiu W-R (2015a) iCataly-PseAAC: identification of enzymes catalytic sites using sequence evolution information with grey model GM (2, 1). J Membr Biol 248:1–9

    Article  Google Scholar 

  • Xiao X, Min J-L, Lin W-Z, Liu Z, Cheng X, Chou K-C (2015b) iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach. J Biomol Struct Dyn 33:1–13

    Google Scholar 

  • Xu Y, Ding J, Wu L-Y, Chou K-C (2013) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS ONE 8:e55844

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Yang Q, Brüschweiler S, Chou JJ (2014) A self-sequestered calmodulin-like Ca2+ sensor of mitochondrial SCaMC carrier and its implication to Ca2+-dependent ATP-Mg/P i transport. Structure 22:209–217

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zakeri P, Moshiri B, Sadeghi M (2011) Prediction of protein submitochondria locations based on data fusion of various features of sequences. J Theor Biol 269:208–216

    Article  CAS  PubMed  Google Scholar 

  • Zeng Y-h, Guo Y-z, Xiao R-q, Yang L, Yu L-z, Li M-l (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259:366–372

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maqsood Hayat.

Ethics declarations

Conflict of Interest

Authors have no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ahmad, K., Waris, M. & Hayat, M. Prediction of Protein Submitochondrial Locations by Incorporating Dipeptide Composition into Chou’s General Pseudo Amino Acid Composition. J Membrane Biol 249, 293–304 (2016). https://doi.org/10.1007/s00232-015-9868-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00232-015-9868-8

Keywords

  • Mitochondria
  • Dipeptide composition
  • SAAC
  • SVM
  • SMOTE
  • PCA