Skip to main content
Log in

Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis

  • Original Paper
  • Published:
Molecular Genetics and Genomics Aims and scope Submit manuscript

Abstract

Protein remote homology detection is one of the important tasks in computational proteomics, which is important for basic research and practical application. Currently, the SVM-based discriminative methods have shown superior performance. However, the existing feature vectors still cannot suitably represent the protein sequences, and often lack an interpretable model for analysis of characteristic features. Previous studies showed that sequence-order effects and physicochemical properties are important for representing protein sequences. However, how to use these kinds of information for constructing predictors is still a challenging problem. In this study, in order to incorporate the sequence-order information and physicochemical properties into the prediction, a method called disPseAAC is proposed, in which the feature vector is constructed by combining the occurrences of amino acid pairs within the Chou’s pseudo amino acid composition (PseAAC) approach. The predictive performance and computational cost are further improved by employing the principal component analysis strategy. Various experiments are conducted on a benchmark dataset. Experimental results show that disPseAAC achieves an ROC score of 0.922, outperforming some existing state-of-the-art methods. Furthermore, the learnt model can easily be analyzed in terms of discriminative features, and the computational cost of the proposed method is much lower than that of other profile-based methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Althaus IW, Chou JJ, Gonzales AJ, Deibel MR, Chou KC, Kezdy FJ, Romero DL, Palmer JR, Thomas RC (1993) Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. Biochemistry 32:6548–6554

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  • Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG (2004) SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res 32:D226–D229

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Bjorndahl TC, Zhou GP, Liu X, Perez-Pineiro R, Semenchenko V, Saleem F, Acharya S, Bujold A, Sobsey CA, Wishart DS (2011) Detailed biophysical characterization of the acid-induced PrPc to PrPβ conversion process. Biochemistry 50:1162–1173

    Article  CAS  PubMed  Google Scholar 

  • Brandt BW, Heringa J (2009) WebPRC: the profile comparer for alignment-based searching of public domain databases. Nucleic Acids Res 37:W48–W52

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Brenner SE, Koehl P, Levitt M (2000) The ASTRAL compendium for sequence and structure analysis. Nucleic Acids Res 28:254–256

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Cao DS, Xu QS, Liang YZ (2013) Propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29:960–962

    Article  CAS  PubMed  Google Scholar 

  • Chang TH, Wu LC, Lee TY, Chen SP, Huang HD, Horng JT (2013) EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s PseAAC. J Comput Aided Mol Des 27:91–103

    Article  CAS  PubMed  Google Scholar 

  • Chen YK, Li KB (2013) Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition. J Theor Biol 318:1–12

    Article  CAS  PubMed  Google Scholar 

  • Chen W, Lin H, Feng PM, Ding C, Zuo YC, Chou KC (2012) iNuc-PhysChem: a sequence-based predictor for identifying nucleosomes via physicochemical properties. PLoS One 7:e47843

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chen W, Feng PM, Lin H, Chou KC (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41:e68

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chen W, Lei TY, Jin DC, Lin H, Chou KC (2014) PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (1989) Graphic rules in steady and non-steady state enzyme kinetics. J Biol Chem 264:12074–12079

    CAS  PubMed  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins Struct Func Genet 43:246–255 (Erratum: ibid., 2001, vol 44, 60)

    Article  CAS  Google Scholar 

  • Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2010) Graphic rule for drug metabolism systems. Curr Drug Metab 11:369–378

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review). J Theor Biol 273:236–247

    Article  CAS  PubMed  Google Scholar 

  • Chou KC (2014) Impacts of bioinformatics to medicinal chemistry. Med Chem (Shariqah, United Arab Emirates)

  • Chou KC, Forsen S (1980) Graphical rules for enzyme-catalyzed rate laws. Biochemistry 187:829–835

    Article  CAS  Google Scholar 

  • Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A (2015) Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou’s general PseAAC. J Theor Biol 364:284–294

    Article  CAS  PubMed  Google Scholar 

  • Ding H, Deng EZ, Yuan LF, Liu L, Lin H, Chen W, Chou KC (2014a) iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. Biomed Res Int 2014:286419

    PubMed Central  PubMed  Google Scholar 

  • Ding H, Lin H, Chen W, Li ZQ, Guo FB, Huang J, Rao N (2014b) Prediction of protein structural classes based on feature selection technique. Interdiscip Sci 6:235–240

    Article  CAS  PubMed  Google Scholar 

  • Dong QW, Wang XL, Lin L (2006) Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22:285–290

    Article  CAS  PubMed  Google Scholar 

  • Du P, Wang X, Xu C, Gao Y (2012) PseAAC-builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425:117–119

    Article  CAS  PubMed  Google Scholar 

  • Du P, Gu S, Jiao Y (2014) PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15:3495–3506

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263:203–209

    Article  CAS  PubMed  Google Scholar 

  • Fan GL, Li QZ (2012a) Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou’s pseudo amino acid composition. J Theor Biol 304:88–95

    Article  CAS  PubMed  Google Scholar 

  • Fan GL, Li QZ (2012b) Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition. Amino Acids 43:545–555

    Article  CAS  PubMed  Google Scholar 

  • Georgiou DN, Karakasidis TE, Nieto JJ, Torres A (2009) Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. J Theor Biol 257:17–26

    Article  CAS  PubMed  Google Scholar 

  • Georgiou DN, Karakasidis TE, Megaritis AC (2013) A short survey on genetic sequences, Chou’s pseudo amino acid composition and its combination with fuzzy set theory. Open Bioinform J 7:41–48; open access at http://www.benthamscience.com/open/tobioij/articles/V007/SI0025TOBIOIJ/0041TOBIOIJ.pdf

  • Gront D, Blaszczyk M, Wojciechowski P, Kolinski A (2012) BioShell threader: protein homology detection based on sequence profiles and secondary structure profiles. Nucleic Acids Res 40:W257–W262

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, Chou KC (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics 30:1522–1529

    Article  CAS  PubMed  Google Scholar 

  • Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H (2014) Predicting anticancer peptides with Chou’s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 341:34–40

    Article  CAS  PubMed  Google Scholar 

  • Han GS, Yu ZG, Anh V (2014) A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou’s PseAAC. J Theor Biol 344:31–39

    Article  CAS  PubMed  Google Scholar 

  • Hayat M, Iqbal N (2014) Discriminating protein structure classes by incorporating pseudo average chemical shift to Chou’s general PseAAC and support vector machine. Comput Methods Programs Biomed 116:184–192

    Article  PubMed  Google Scholar 

  • Hayat M, Khan A (2012) Discriminating outer membrane proteins with fuzzy K-nearest neighbor algorithms based on the general form of Chou’s PseAAC. Protein Pept Lett 19:411–421

    Article  CAS  PubMed  Google Scholar 

  • Hochreiter S, Heusel M, Obermayer K (2007) Fast model-based protein homology detection without alignment. Bioinformatics 23:1728–1736

    Article  CAS  PubMed  Google Scholar 

  • Huang C, Yuan J (2013a) Using radial basis function on the general form of Chou’s pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites. Biosystems 113:50–57

    Article  CAS  PubMed  Google Scholar 

  • Huang C, Yuan JQ (2013b) Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou’s pseudo amino acid compositions. J Theor Biol 335:205–212

    Article  CAS  PubMed  Google Scholar 

  • Huang T, Wang J, Cai YD, Yu H, Chou KC (2012) Hepatitis C virus network based classification of hepatocellular cirrhosis and carcinoma. PLoS One 7:e34460

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Hur AB, Brutlag D (2003) Remote homology detection: a motif based approach. Bioinformatics 19:i26–i33

    Article  Google Scholar 

  • Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher Kernel method to detect remote protein homologies. In: Proceedings of the 7th international conference on intelligent systems for molecular biology, pp 149–158

  • Jia C, Lin X, Wang Z (2014) Prediction of protein S-nitrosylation sites based on adapted normal distribution bi-profile Bayes and Chou’s pseudo amino acid composition. Int J Mol Sci 15:10410–10423

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Jiang Y, Huang T, Chen L, Gao YF, Cai Y, Chou KC (2013) Signal propagation in protein interaction network during colorectal cancer progression. Biomed Res Int 2013:287019

    PubMed Central  PubMed  Google Scholar 

  • Joshi AG, Raghavender US, Sowdhamini R (2013) Improved performance of sequence search algorithms in remote homology detection. F1000 Res 2:93

  • Kandaswamy KK, Pugalenthi G, Moller S, Hartmann E, Kalies KU, Suganthan PN, Martinetz T (2010) Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. Protein Pept Lett 17:1473–1479

    Article  CAS  PubMed  Google Scholar 

  • Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14:846–856

    Article  CAS  PubMed  Google Scholar 

  • Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M (2008) AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 36:D202–D205

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Kelley LA, Sternberg MJ (2009) Protein structure prediction on the web: a case study using the phyre server. Nat Protoc 4:363–371

    Article  CAS  PubMed  Google Scholar 

  • Khosravian M, Faramarzi FK, Beigi MM, Behbahani M, Mohabatkar H (2013) Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods. Protein Pept Lett 20:180–186

    Article  CAS  PubMed  Google Scholar 

  • Kong L, Zhang L, Lv J (2014) Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 344:12–18

    Article  CAS  PubMed  Google Scholar 

  • Kuang R, Ie E, Wang K, Wang K, Siddiqi M (2005) Profile-based direct kernels for remote homology detection and motif extraction. J Bioinform Comput Biol 3:527–550

    Article  CAS  PubMed  Google Scholar 

  • Leslie CS, Eskin E, Cohen A, Weston J, Noble WS (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20:467–476

    Article  CAS  PubMed  Google Scholar 

  • Li BQ, Huang T, Liu L, Cai YD, Chou KC (2012) Identification of colorectal cancer related genes with mRMR and shortest path in protein–protein interaction network. PLoS One 7:e33393

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Li L, Yu S, Xiao W, Li Y, Li M, Huang L, Zheng X, Zhou S, Yang H (2014) Prediction of bacterial protein subcellular localization by incorporating various features into Chou’s PseAAC and a backward feature selection approach. Biochimie 104:100–107

    Article  CAS  PubMed  Google Scholar 

  • Liao L, Noble WS (2003) Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol 10:857–868

    Article  CAS  PubMed  Google Scholar 

  • Lin SX, Lapointe J (2013) Theoretical and experimental biology in one—a symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers. J Biomed Sci Eng (JBiSE) 6:435–442

    Article  CAS  Google Scholar 

  • Lin H, Wang H, Ding H, Chen YL, Li QZ (2009) prediction of subcellular localization of apoptosis protein using Chou’s pseudo amino acid composition. Acta Biotheor 57:321–330

    Article  PubMed  Google Scholar 

  • Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q (2013a) Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One 8:e56499

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Lin H, Chen W, Ding H (2013b) AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS One 8:e75726

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Lin H, Deng EZ, Ding H, Chen W, Chou KC (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Res 42:12961–12972

    Article  PubMed Central  PubMed  Google Scholar 

  • Lingner T, Meinicke P (2006) Remote homology detection based on oligomer distances. Bioinformatics 22:2224–2231

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Wang X, Lin L, Dong Q, Wang X (2008) A discriminative method for protein remote homology detection and fold recognition combining top-n-grams and latent semantic analysis. BMC Bioinform 9:510

    Article  CAS  Google Scholar 

  • Liu B, Wang X, Lin L, Dong Q, Wang X (2009a) Exploiting three kinds of interface propensities to identify protein binding sites. Comput Biol Chem 33:303–311

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Wang X, Lin L, Tang B, Dong Q, Wang X (2009b) Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinform 10:381

    Article  CAS  Google Scholar 

  • Liu X, Zhao L, Dong Q (2011) Protein remote homology detection based on auto-cross covariance transformation. Comput Biol Med 41:640–647

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Wang X, Chen Q, Dong Q, Lan X (2012) Using amino acid physicochemical distance transformation for fast protein remote homology detection. PLoS One 7:e46633

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Liu B, Wang X, Zou Q, Dong Q, Chen Q (2013) Protein remote homology detection by combining Chou’s pseudo amino acid composition and profile-based protein representation. Mol Inform 32:775–782

    Article  CAS  Google Scholar 

  • Liu B, Liu B, Liu F, Wang X (2014a) Protein binding site prediction by combining hidden Markov support vector machine and profile-based propensities. Sci World J 2014:464093

    Google Scholar 

  • Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou K-C (2014b) iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 9:e106691

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Liu B, Xu J, Zou Q, Xu R, Wang X, Chen Q (2014c) Using distances between top-n-gram and residue pairs for protein remote homology detection. BMC Bioinform 15:S3

    Google Scholar 

  • Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q, Dong Q, Chou K-C (2014d) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30:472–479

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Fang L, Chen J, Liu F, Wang X (2015a) miRNA-dis: microRNA precursor identification based on distance structure status pairs. Mol BioSyst 11:1194–1204

    Article  CAS  PubMed  Google Scholar 

  • Liu B, Fang L, Liu F, Wang X, Chen J, Chou K-C (2015b) Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 10:e0121501

    Article  PubMed Central  PubMed  Google Scholar 

  • Liu B, Fang L, Liu F, Wang X, Chou K-C (2015c) iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J Biomol Struct Dyn. doi:10.1080/07391102.07392015.01014422

    Google Scholar 

  • Liu B, Liu F, Fang L, Wang X, Chou K-C (2015d) repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics 31:1307–1309. doi:10.1093/bioinformatics/btu1820

    Article  PubMed  Google Scholar 

  • Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X (2015e) PseDNA-Pro: DNA-binding protein identification by combining Chou’s PseAAC and physicochemical distance transformation. Mol Inform 34:8–17

    Article  CAS  Google Scholar 

  • Lobley A, Sadowski MJ, Jones DT (2009) pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily fiscrimination. Bioinformatics 25:1761–1767

    Article  CAS  PubMed  Google Scholar 

  • Ma J, Wang S, Wang Z, Xu J (2014) MRFalign: protein homology detection through alignment of Markov random fields. Res Comput Mol Biol 8394:173–174

    Article  CAS  Google Scholar 

  • Margelevicius M, Venclovas MLC (2010) COMA server for protein distant homology search. Bioinformatics 26:1905–1906

    Article  CAS  PubMed  Google Scholar 

  • Mei S (2012a) Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization. J Theor Biol 293:121–130

    Article  CAS  PubMed  Google Scholar 

  • Mei S (2012b) Predicting plant protein subcellular multi-localization by Chou’s PseAAC formulation based multi-label homolog knowledge transfer learning. J Theor Biol 310:80–87

    Article  CAS  PubMed  Google Scholar 

  • Mohabatkar H (2010) Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Pept Lett 17:1207–1214

    Article  CAS  PubMed  Google Scholar 

  • Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) Prediction of GABA(A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281:18–23

    Article  CAS  PubMed  Google Scholar 

  • Mohabatkar H, Beigi MM, Abdolahi K, Mohsenzadeh S (2013) Prediction of allergenic proteins by means of the concept of Chou’s pseudo amino acid composition and a machine learning approach. Med Chem 9:133–137

    Article  CAS  PubMed  Google Scholar 

  • Mohammad Beigi M, Behjati M, Mohabatkar H (2011) Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach. J Struct Funct Genomics 12:191–197

    Article  CAS  PubMed  Google Scholar 

  • Mondal S, Pai PP (2014) Chou’s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol 356:30–35

    Article  CAS  PubMed  Google Scholar 

  • Muda HM, Saad P, Othman RM (2011) Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Comput Biol Med 41:687–699

    Article  CAS  PubMed  Google Scholar 

  • Nanni L, Lumini A (2008) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34:653–660

    Article  CAS  PubMed  Google Scholar 

  • Nanni L, Lumini A, Gupta D, Garg A (2012) Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou’s pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform 9:467–475

    Article  PubMed  Google Scholar 

  • Nanni L, Brahnam S, Lumini A (2014) Prediction of protein structure classes by incorporating different protein descriptors into general Chou’s pseudo amino acid composition. J Theor Biol 360C:109–116

    Article  CAS  Google Scholar 

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    Article  CAS  PubMed  Google Scholar 

  • Noble WS, Kuang R, Leslie C, Weston J (2005) Identifying remote protein homologs by network propagation. FEBS J 272:5119–5128

    Article  CAS  PubMed  Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Phil Mag 2:559–572

    Article  Google Scholar 

  • Qin YF, Zheng L, Huang J (2013) Locating apoptosis proteins by incorporating the signal peptide cleavage sites into the general form of Chou’s pseudo amino acid composition. Int J Quantum Chem 113:1660–1667

    Article  CAS  Google Scholar 

  • Qiu WR, Xiao X, Chou KC (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15:1746–1766

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Rangwala H, Karypis G (2005) Profile-based direct kernels for remote homology detection and fold detection. Bioinformatics 21:4239–4247

    Article  CAS  PubMed  Google Scholar 

  • Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12:85–94

    Article  CAS  PubMed  Google Scholar 

  • Såding J (2005) Protein homology detection by HMM–HMM comparison. Bioinformatics 21:951–960

    Article  Google Scholar 

  • Sadreyev RI, Tang M, Kim BH, Grishin NV (2009) COMPASS server for homology detection: improved statistical accuracy, speed and functionality. Nucleic Acids Res 37:W90–W94

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34:320–327

    Article  CAS  PubMed  Google Scholar 

  • Saigo H, Vert JP, Ueda N, Akutsu T (2004) Protein homology detection using string alignment kernels. Bioinformatics 20:1682–1689

    Article  CAS  PubMed  Google Scholar 

  • Schäffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF (1999) Impala: matching a protein sequence against a collection of Psi-Blast-constructed position-specific score matrices. Bioinformatics 15:1000–1011

    Article  PubMed  Google Scholar 

  • Sharma AK, Zhou GP, Kupferman J, Surks HK, Christensen EN, Chou JJ, Mendelsohn ME, Rigby AC (2008) Probing the interaction between the coiled coil leucine zipper of cGMP-dependent protein kinase Iα and the C terminus of the myosin binding subunit of the myosin light chain phosphatase. J Biol Chem 283:32860–32869

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Shen HB, Chou KC (2008) PseAAC: a flexible web-server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373:386–388

    Article  CAS  PubMed  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    Article  CAS  PubMed  Google Scholar 

  • Song L, Li D, Zeng X, Wu Y, Guo L, Zou Q (2014) nDNA-prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinformatics 15:298

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Sun XY, Shi SP, Qiu JD, Suo SB, Huang SY, Liang RP (2012) Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou’s PseAAC via discrete wavelet transform. Mol BioSyst 8:3178–3184

    Article  CAS  PubMed  Google Scholar 

  • Tomii K, Akiyama Y (2004) FORTE: a profile–profile comparison tool for protein fold recognition. Bioinformatics 20:594–595

    Article  CAS  PubMed  Google Scholar 

  • Vapnik VN (1998) Statistical Learning Theory. Wiley-Interscience

  • Wan S, Mak MW, Kung SY (2013) GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. J Theor Biol 323:40–48

    Article  CAS  PubMed  Google Scholar 

  • Weston J, Elisseeff A, Zhou D, Leslie CS, Noble WS (2004) Protein ranking: from local to global structure in the protein similarity network. Proc Natl Acad Sci USA 101:6559–6563

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Xie HL, Fu L, Nie XD (2013) Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chou’s PseAAC. Protein Eng Des Sel 26:735–742

    Article  CAS  PubMed  Google Scholar 

  • Xu Y, Ding J, Wu LY, Chou KC (2013) iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition. PLoS One 8:e55844

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Yang Y, Tantoso E, Li KB (2008) Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties. J Theor Biol 252:145–154

    Article  CAS  PubMed  Google Scholar 

  • Yu L, Guo Y, Li Y, Li G, Li M, Luo J, Xiong W, Qin W (2010) SecretP: identifying bacterial secreted proteins by fusing new features into Chou’s pseudo-amino acid composition. J Theor Biol 267:1–6

    Article  CAS  PubMed  Google Scholar 

  • Zeng YH, Guo YZ, Xiao RQ, Yang L, Yu LZ, Li ML (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259:366–372

    Article  CAS  PubMed  Google Scholar 

  • Zhang GY, Fang BS (2008) Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou’s amphiphilic pseudo amino acid composition. J Theor Biol 253:310–315

    Article  CAS  PubMed  Google Scholar 

  • Zhang SW, Chen W, Yang F, Pan Q (2008a) Using Chou’s pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach. Amino Acids 35:591–598

    Article  CAS  PubMed  Google Scholar 

  • Zhang SW, Zhang YL, Yang HF, Zhao CH, Pan Q (2008b) Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies. Amino Acids 34:565–572

    Article  CAS  PubMed  Google Scholar 

  • Zhang J, Sun P, Zhao X, Ma Z (2014a) PECM: prediction of extracellular matrix proteins using the concept of Chou’s pseudo amino acid composition. J Theor Biol 363:412–418

    Article  CAS  PubMed  Google Scholar 

  • Zhang J, Zhao X, Sun P, Ma Z (2014b) PSNO: predicting cysteine S-nitrosylation sites by incorporating various sequence-derived features into the general form of Chou’s PseAAC. Int J Mol Sci 15:11204–11219

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Zhang L, Zhao X, Kong L (2014c) Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou’s pseudo amino acid composition. J Theor Biol 355:105–110

    Article  CAS  PubMed  Google Scholar 

  • Zhong WZ, Zhou SF (2014) Molecular science for drug development and biomedicine. Int J Mol Sci 15:20072–20078

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Zhou GP (2011) The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism. J Theor Biol 284:142–148

    Article  CAS  PubMed  Google Scholar 

  • Zhou GP, Deng MH (1984) An extension of Chou’s graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. Biochem J 222:169

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Zhou GP, Huang RB (2013) The pH-triggered conversion of the PrPc to PrPsc. Curr Top Med Chem 13:1152–1163

    Article  CAS  PubMed  Google Scholar 

  • Zhou GP, Troy FA (2003) Characterization by NMR and molecular modeling of the binding of polyisoprenols and polyisoprenyl recognition sequence peptides: 3D structure of the complexes reveals sites of specific interactions. Glycobiology 13:51–71

    Article  CAS  PubMed  Google Scholar 

  • Zhou GP, Troy FA (2005a) Invited review: NMR studies on how the binding complex of polyisoprenol recognition sequence peptides and polyisoprenols can modulate membrane structure. Curr Protein Pept 6:399–411

    Article  Google Scholar 

  • Zhou GP, Troy FA (2005b) NMR study of the preferred membrane orientation of polyisoprenols (dolichol) and the impact of their complex with polyisoprenyl recognition sequence peptides on membrane structure. Glycobiology 15:347–359

    Article  CAS  PubMed  Google Scholar 

  • Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551

    Article  CAS  PubMed  Google Scholar 

  • Zhou GP, Huang RB, Troy FA (2015) 3D structural conformation and functional domains of polysialyltransferase ST8Sia IV required for polysialylation of neural cell adhesion molecules. Protein Pept Lett 22:137–148

    Article  CAS  PubMed  Google Scholar 

  • Zia Ur R, Khan A (2012) Identifying GPCRs and their types with Chou’s pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix. Protein Pept Lett 19:890–903

    Article  Google Scholar 

  • Zou D, He Z, He J, Xia Y (2011) Supersecondary structure prediction using Chou’s pseudo amino acid composition. J Comput Chem 32:271–278

    Article  CAS  PubMed  Google Scholar 

  • Zou Q, Li X, Jiang Y, Zhao Y, Wang G (2013) BinMemPredict: a web server and software for predicting membrane protein types. Curr Proteomics 10:2–9

    Article  CAS  Google Scholar 

  • Zuo YC, Peng Y, Liu L, Chen W, Yang L, Fan GL (2014) Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’s pseudo amino acid patterns. Anal Biochem 458:14–19

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 61300112 and 61272383), the Scientific Research Innovation Foundation in Harbin Institute of Technology (Project No. HIT.NSRIF.2013103), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, the Natural Science Foundation of Guangdong Province (2014A030313695), and Shenzhen Municipal Science and Technology Innovation Council (Grant No. CXZZ20140904154910774).

Conflict of interest

The authors declare that they have no competing interests.

Ethical standard

This article does not contain any studies with human participants or animals performed by any of the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Liu.

Additional information

Communicated by S. Hohmann.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, B., Chen, J. & Wang, X. Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. Mol Genet Genomics 290, 1919–1931 (2015). https://doi.org/10.1007/s00438-015-1044-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00438-015-1044-4

Keywords

Navigation