Skip to main content
Log in

Wavelet images and Chou’s pseudo amino acid composition for protein classification

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou’s pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Available at http://www.genome.jp/dbget/aaindex.html. We have not considered the properties where the amino acids have value 0 or 1.

  2. http://sourceforge.net/projects/svm/.

  3. The IDs of the properties are available at http://bias.csr.unibo.it\nanni\IDw.docx.

  4. http://www.cse.oulu.fi/Downloads/LPQMatlab.

  5. http://www.cse.oulu.fi/MVG/Downloads/LBPMatlab.

  6. Implemented as in DDtool 0.95 Matlab Toolbox.

  7. It is performed 10 times and the average results are reported.

  8. For a multi-class classification with a two-class classifiers the one-versus-one or one-versus-all approach should be used (Cristianini 2000).

  9. Before the fusion the scores of each method are normalized to mean 0 and standard deviation 1.

  10. We have tested both linear and Gaussian kernels, the parameters are estimated using a grid search in the training set.

References

  • Ahonen T et al (2009) Rotation invariant image description with local binary pattern histogram Fourier features, Image Analysis, SCIA 2009. Lect Notes Comp Sci 5575:61–70

    Article  Google Scholar 

  • Althaus IW et al (1993) Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E. J Biol Chem 268:6119–6124

    PubMed  CAS  Google Scholar 

  • Andraos J (2008) Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws: new methods based on directed graphs. Can J Chem 86:342–357

    Article  CAS  Google Scholar 

  • Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucl Acids Res 28:45–48

    Article  PubMed  CAS  Google Scholar 

  • Ben-Gal I et al (2005) Identification of transcription factor binding sites with variable-order bayesian networks. Bioinformatics 21(11):2657–2666

    Article  PubMed  CAS  Google Scholar 

  • Bock J, Gough D (2003) Whole-proteome interaction mining. Bioinformatics 19:125–135

    Article  PubMed  CAS  Google Scholar 

  • Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinform 7:298

    Article  Google Scholar 

  • Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 248:377–381

    Article  PubMed  CAS  Google Scholar 

  • Chen L et al (2005) VFDB: a reference database for bacterial virulence factors. Nucl Acids Res 33:D325–D328

    Article  PubMed  CAS  Google Scholar 

  • Chen C et al (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Peptide Lett 16:27–31

    Article  Google Scholar 

  • Chou KC (1985) Low-frequency motions in protein molecules: beta-sheet and beta-barrel. Biophys J 48:289–297

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (1988) Review: low-frequency collective motion in biomacromolecules and its biological functions. Biophys Chem 30:3–48

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (1989a) Graphic rules in steady and non-steady enzyme kinetics. J Biol Chem 264:12074–12079

    PubMed  CAS  Google Scholar 

  • Chou KC (1989b) Low-frequency resonance and cooperativity of hemoglobin. Trends Biochem Sci 14:212

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (1990) Review: applications of graph theory to enzyme kinetics and protein folding kinetics: steady and non-steady state systems. Biophys Chem 35:1–24

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Genet 43:246–255

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2010) Graphic rule for drug metabolism systems. Curr Drug Metab 11:369–378

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review). J Theor Biol 273:236–247

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Google Scholar 

  • Chou KC, Shen HB (2007b) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2009) Review: recent advances in developing web-servers for predicting protein attributes. Nat Sci 2:63–92. (openly accessible at http://www.scirp.org/journal/NS/)

    Google Scholar 

  • Chou KC, Shen HB (2010a) Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Sci 2:1090–1103

    CAS  Google Scholar 

  • Chou KC, Shen HB (2010b) Plant-mPLoc: a top–down strategy to augment the power for predicting plant protein subcellular localization. PLoS ONE 5:e11335

    Article  PubMed  Google Scholar 

  • Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Kezdy FJ, Reusser F (1994) Review: steady-state inhibition kinetics of processive nucleic acid polymerases and nucleases. Anal Biochem 221:217–230

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT, Maggiora GM (1997) Disposition of amphiphilic helices in heteropolar environments. Proteins Struct Funct Genet 28:99–108

    Article  PubMed  CAS  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

  • Daras P et al (2006) Three-dimensional shape-structure comparison method for protein classification. IEEE Trans Comput Biol Bioinform 3(3):193–207

    Article  CAS  Google Scholar 

  • Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    Google Scholar 

  • Ding YS, Zhang TL (2008) Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognit Lett 29:1887–1892

    Article  CAS  Google Scholar 

  • Ding H, Luo L, Lin H (2009) Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition. Protein Peptide Lett 16:351–355

    Article  CAS  Google Scholar 

  • Du PF, Li YD (2006) Prediction of protein submitochondria locationsby hybridizing pseudoamino acid composition with various physicochemical. BMC Bioinform 7:518

    Article  Google Scholar 

  • Du PF, Cao SJ, Li YD (2009a) SubChlo: predicting protein subchloroplast locations with pseudo- amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 261:330–335

    Article  PubMed  CAS  Google Scholar 

  • Du P, Cao S, Li Y (2009b) SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 261(2):330–335

    Article  PubMed  CAS  Google Scholar 

  • Fang Y et al (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34(1):103–109

    Article  PubMed  CAS  Google Scholar 

  • Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. HP Laboratories, Palo Alto

  • Garg A, Gupta D (2008) VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinform 9:62. doi:10.1186/1471-2105-9-62

  • Hayat M, Khan A (2011) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271:10–17

    Article  CAS  Google Scholar 

  • Hu L et al (2011) Predicting functions of proteins in mouse based on weighted protein–protein interaction network and protein hybrid properties. PLoS ONE 6:e14556

    Article  PubMed  CAS  Google Scholar 

  • Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. In: Seventh international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 149–158

  • Jiang X et al (2008) Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. Protein Peptide Lett 15:392–396

    Article  CAS  Google Scholar 

  • Kandaswamy KK et al (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270:56–62

    Article  PubMed  CAS  Google Scholar 

  • Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucl Acids Res 20:1

    Google Scholar 

  • Lei Z, Dai Y (2005) An SVM-based system for predicting protein subnuclear localizations. BMC Bioinform 6:291

    Google Scholar 

  • Leslie CS et al (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20:467–476

    Article  PubMed  CAS  Google Scholar 

  • Li FM, Li QZ (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Peptide Lett 15:612–616

    Google Scholar 

  • Liao S, Law MWK, Chung ACS (2009) Dominant local binary patterns for texture classification. IEEE Trans Image Process 18(5):1107–1118

    Article  PubMed  CAS  Google Scholar 

  • Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252:350–356

    Article  PubMed  CAS  Google Scholar 

  • Lin MT, Beal MF (2006) Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases. Nature 443:787–795

    Article  PubMed  CAS  Google Scholar 

  • Lin H et al (2008) Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Peptide Lett 15:739–744

    Article  CAS  Google Scholar 

  • Lowell BB, Shulman GI (2005) Mitochondrial dysfunction and type 2 diabetes. Science 307:384–387

    Article  PubMed  CAS  Google Scholar 

  • Masso M, Vaisman II (2010) Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. J Theor Biol 266:560–568

    Article  PubMed  CAS  Google Scholar 

  • Mohabatkar H (2010) Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Peptide Lett 17:1207–1214

    Article  CAS  Google Scholar 

  • Nanni L, Lumini A (2006) An ensemble of K-local hyperplane for predicting protein–protein interactions. Bioinformatics 22(10):1207–1210

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2008a) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34(4):653–660

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2008b) Genetic programming for creating Chou’s pseudoamino acid based features for submitochondria localization. Amino Acids 34(4):653–660

    Article  PubMed  CAS  Google Scholar 

  • Nanni L, Lumini A (2010) A high performance set of descriptors extracted from the amino acid sequence for protein classification. J Theor Biol 266(1):1–10

    Article  PubMed  CAS  Google Scholar 

  • Niu B et al (2006) Predicting protein structural class with AdaBoost learner. Protein Peptide Lett 13:489–492

    Article  CAS  Google Scholar 

  • Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. In: ICISP

  • Qin ZC (2006) ROC analysis for predictions made by probabilistic classifiers. In: Fourth international conference on machine learning and cybernetics, pp 3119–3124

  • Qiu JD et al (2009) Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform. Anal Biochem 390:68–73

    Article  PubMed  CAS  Google Scholar 

  • Rahtu E, Salo M, Heikkila J (2005) Affine invariant pattern recognition using multi- scale autoconvolution. IEEE Trans Pattern Anal Machine Intell 27(6):908–918

    Article  Google Scholar 

  • Saigo H et al (2004) Protein homology detection using string alignment kernels. Bioinformatics 20(11):1682–1689

    Article  PubMed  CAS  Google Scholar 

  • Shen H-B, Chou K-C (2007) Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng Design Select 20:39–46

    Article  CAS  Google Scholar 

  • Shi SP et al (2011) Identify submitochondria and subchloroplast locations with pseudo amino acid composition: Approach from the strategy of discrete wavelet transform feature extraction. Biochim Biophys Acta 1813:424–430

    Article  PubMed  CAS  Google Scholar 

  • Tan X, Triggs B (2007) Enhanced local texture feature sets for face recognition under difficult lighting conditions. Analysis and modelling of faces and gestures. In: LNCS, vol 4778, pp 168–182

  • Wen ZN, Wang KL, Li ML, Nie FS, Yang Y (2005) Analyzing functional similarity of protein sequences with discrete wavelet transform. Comput Biol Chem 29:220–228

    Article  PubMed  CAS  Google Scholar 

  • Wolfram S (1984) Cellular automation as models of complexity. Nature 311:419–424

    Article  Google Scholar 

  • Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Peptide Lett 14:871–875

    Article  CAS  Google Scholar 

  • Xiao X et al (2005a) An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J Theor Biol 235:555–565

    Article  PubMed  CAS  Google Scholar 

  • Xiao X et al (2005b) Using cellular automata to generate Image representation for biological sequences. Amino Acids 28:29–35

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao SH, Chou KC (2006a) A probability cellular automaton model for hepatitis B viral infections. Biochem Biophys Res Commun 342:605–610

    Article  PubMed  CAS  Google Scholar 

  • Xiao X et al (2006b) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Wang P, Chou KC (2009) GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes. J Comput Chem 30(9):1414–1423

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Wang P, Chou KC (2011a) Quat-2L: a web-server for predicting protein quaternary structural attributes. Mol Divers 15:149–155

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Wang P, Chou KC (2011b) GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. Mol Biosyst 7:911–919

    Article  PubMed  CAS  Google Scholar 

  • Yang ZR, Thomson R (2005) Bio-basis function neural network for prediction of protease cleavage sites in proteins. IEEE Trans Neural Netw 16:263–274

    Article  PubMed  CAS  Google Scholar 

  • Zeng YH et al (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259(2):366–372

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP (2011) The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism. J Theor Biol 284:142–148

    Article  PubMed  CAS  Google Scholar 

  • Zhou GP, Deng MH (1984) An extension of Chou’s graphical rules for deriving enzyme kinetic equations to system involving parallel reaction pathways. Biochem J 222:169–176

    PubMed  CAS  Google Scholar 

  • Zhou XB et al (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We wish to thank Ojansivu and Heikkila for sharing their LPQ code; Rahtu, Salo and Heikkila for sharing their MSAhist code; Ahonen, Matas, He and Pietikäinen for sharing their LBP-HF code.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Loris Nanni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nanni, L., Brahnam, S. & Lumini, A. Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 43, 657–665 (2012). https://doi.org/10.1007/s00726-011-1114-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-011-1114-9

Keywords

Navigation