Amino Acids

, Volume 43, Issue 2, pp 657–665 | Cite as

Wavelet images and Chou’s pseudo amino acid composition for protein classification

Original Article

Abstract

The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou’s pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar.

Keywords

Proteins classification Machine learning Ensemble of classifiers Support vector machines 

References

  1. Ahonen T et al (2009) Rotation invariant image description with local binary pattern histogram Fourier features, Image Analysis, SCIA 2009. Lect Notes Comp Sci 5575:61–70CrossRefGoogle Scholar
  2. Althaus IW et al (1993) Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E. J Biol Chem 268:6119–6124PubMedGoogle Scholar
  3. Andraos J (2008) Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws: new methods based on directed graphs. Can J Chem 86:342–357CrossRefGoogle Scholar
  4. Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucl Acids Res 28:45–48PubMedCrossRefGoogle Scholar
  5. Ben-Gal I et al (2005) Identification of transcription factor binding sites with variable-order bayesian networks. Bioinformatics 21(11):2657–2666PubMedCrossRefGoogle Scholar
  6. Bock J, Gough D (2003) Whole-proteome interaction mining. Bioinformatics 19:125–135PubMedCrossRefGoogle Scholar
  7. Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinform 7:298CrossRefGoogle Scholar
  8. Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 248:377–381PubMedCrossRefGoogle Scholar
  9. Chen L et al (2005) VFDB: a reference database for bacterial virulence factors. Nucl Acids Res 33:D325–D328PubMedCrossRefGoogle Scholar
  10. Chen C et al (2009) Prediction of protein secondary structure content by using the concept of Chou’s pseudo amino acid composition and support vector machine. Protein Peptide Lett 16:27–31CrossRefGoogle Scholar
  11. Chou KC (1985) Low-frequency motions in protein molecules: beta-sheet and beta-barrel. Biophys J 48:289–297PubMedCrossRefGoogle Scholar
  12. Chou KC (1988) Review: low-frequency collective motion in biomacromolecules and its biological functions. Biophys Chem 30:3–48PubMedCrossRefGoogle Scholar
  13. Chou KC (1989a) Graphic rules in steady and non-steady enzyme kinetics. J Biol Chem 264:12074–12079PubMedGoogle Scholar
  14. Chou KC (1989b) Low-frequency resonance and cooperativity of hemoglobin. Trends Biochem Sci 14:212PubMedCrossRefGoogle Scholar
  15. Chou KC (1990) Review: applications of graph theory to enzyme kinetics and protein folding kinetics: steady and non-steady state systems. Biophys Chem 35:1–24PubMedCrossRefGoogle Scholar
  16. Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Genet 43:246–255PubMedCrossRefGoogle Scholar
  17. Chou KC (2010) Graphic rule for drug metabolism systems. Curr Drug Metab 11:369–378PubMedCrossRefGoogle Scholar
  18. Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review). J Theor Biol 273:236–247PubMedCrossRefGoogle Scholar
  19. Chou KC, Shen HB (2007) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16Google Scholar
  20. Chou KC, Shen HB (2007b) MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Commun 360:339–345PubMedCrossRefGoogle Scholar
  21. Chou KC, Shen HB (2009) Review: recent advances in developing web-servers for predicting protein attributes. Nat Sci 2:63–92. (openly accessible at http://www.scirp.org/journal/NS/)Google Scholar
  22. Chou KC, Shen HB (2010a) Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Sci 2:1090–1103Google Scholar
  23. Chou KC, Shen HB (2010b) Plant-mPLoc: a top–down strategy to augment the power for predicting plant protein subcellular localization. PLoS ONE 5:e11335PubMedCrossRefGoogle Scholar
  24. Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349PubMedCrossRefGoogle Scholar
  25. Chou KC, Kezdy FJ, Reusser F (1994) Review: steady-state inhibition kinetics of processive nucleic acid polymerases and nucleases. Anal Biochem 221:217–230PubMedCrossRefGoogle Scholar
  26. Chou KC, Zhang CT, Maggiora GM (1997) Disposition of amphiphilic helices in heteropolar environments. Proteins Struct Funct Genet 28:99–108PubMedCrossRefGoogle Scholar
  27. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
  28. Daras P et al (2006) Three-dimensional shape-structure comparison method for protein classification. IEEE Trans Comput Biol Bioinform 3(3):193–207CrossRefGoogle Scholar
  29. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Google Scholar
  30. Ding YS, Zhang TL (2008) Using Chou’s pseudo amino acid composition to predict subcellular localization of apoptosis proteins: an approach with immune genetic algorithm-based ensemble classifier. Pattern Recognit Lett 29:1887–1892CrossRefGoogle Scholar
  31. Ding H, Luo L, Lin H (2009) Prediction of cell wall lytic enzymes using Chou’s amphiphilic pseudo amino acid composition. Protein Peptide Lett 16:351–355CrossRefGoogle Scholar
  32. Du PF, Li YD (2006) Prediction of protein submitochondria locationsby hybridizing pseudoamino acid composition with various physicochemical. BMC Bioinform 7:518CrossRefGoogle Scholar
  33. Du PF, Cao SJ, Li YD (2009a) SubChlo: predicting protein subchloroplast locations with pseudo- amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 261:330–335PubMedCrossRefGoogle Scholar
  34. Du P, Cao S, Li Y (2009b) SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 261(2):330–335PubMedCrossRefGoogle Scholar
  35. Fang Y et al (2008) Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34(1):103–109PubMedCrossRefGoogle Scholar
  36. Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. HP Laboratories, Palo AltoGoogle Scholar
  37. Garg A, Gupta D (2008) VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinform 9:62. doi:10.1186/1471-2105-9-62
  38. Hayat M, Khan A (2011) Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol 271:10–17CrossRefGoogle Scholar
  39. Hu L et al (2011) Predicting functions of proteins in mouse based on weighted protein–protein interaction network and protein hybrid properties. PLoS ONE 6:e14556PubMedCrossRefGoogle Scholar
  40. Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. In: Seventh international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 149–158Google Scholar
  41. Jiang X et al (2008) Using the concept of Chou’s pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy. Protein Peptide Lett 15:392–396CrossRefGoogle Scholar
  42. Kandaswamy KK et al (2011) AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 270:56–62PubMedCrossRefGoogle Scholar
  43. Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucl Acids Res 20:1Google Scholar
  44. Lei Z, Dai Y (2005) An SVM-based system for predicting protein subnuclear localizations. BMC Bioinform 6:291Google Scholar
  45. Leslie CS et al (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20:467–476PubMedCrossRefGoogle Scholar
  46. Li FM, Li QZ (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Peptide Lett 15:612–616Google Scholar
  47. Liao S, Law MWK, Chung ACS (2009) Dominant local binary patterns for texture classification. IEEE Trans Image Process 18(5):1107–1118PubMedCrossRefGoogle Scholar
  48. Lin H (2008) The modified Mahalanobis discriminant for predicting outer membrane proteins by using Chou’s pseudo amino acid composition. J Theor Biol 252:350–356PubMedCrossRefGoogle Scholar
  49. Lin MT, Beal MF (2006) Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases. Nature 443:787–795PubMedCrossRefGoogle Scholar
  50. Lin H et al (2008) Predicting subcellular localization of mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Peptide Lett 15:739–744CrossRefGoogle Scholar
  51. Lowell BB, Shulman GI (2005) Mitochondrial dysfunction and type 2 diabetes. Science 307:384–387PubMedCrossRefGoogle Scholar
  52. Masso M, Vaisman II (2010) Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. J Theor Biol 266:560–568PubMedCrossRefGoogle Scholar
  53. Mohabatkar H (2010) Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein Peptide Lett 17:1207–1214CrossRefGoogle Scholar
  54. Nanni L, Lumini A (2006) An ensemble of K-local hyperplane for predicting protein–protein interactions. Bioinformatics 22(10):1207–1210PubMedCrossRefGoogle Scholar
  55. Nanni L, Lumini A (2008a) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34(4):653–660PubMedCrossRefGoogle Scholar
  56. Nanni L, Lumini A (2008b) Genetic programming for creating Chou’s pseudoamino acid based features for submitochondria localization. Amino Acids 34(4):653–660PubMedCrossRefGoogle Scholar
  57. Nanni L, Lumini A (2010) A high performance set of descriptors extracted from the amino acid sequence for protein classification. J Theor Biol 266(1):1–10PubMedCrossRefGoogle Scholar
  58. Niu B et al (2006) Predicting protein structural class with AdaBoost learner. Protein Peptide Lett 13:489–492CrossRefGoogle Scholar
  59. Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. In: ICISPGoogle Scholar
  60. Qin ZC (2006) ROC analysis for predictions made by probabilistic classifiers. In: Fourth international conference on machine learning and cybernetics, pp 3119–3124Google Scholar
  61. Qiu JD et al (2009) Prediction of G-protein-coupled receptor classes based on the concept of Chou’s pseudo amino acid composition: an approach from discrete wavelet transform. Anal Biochem 390:68–73PubMedCrossRefGoogle Scholar
  62. Rahtu E, Salo M, Heikkila J (2005) Affine invariant pattern recognition using multi- scale autoconvolution. IEEE Trans Pattern Anal Machine Intell 27(6):908–918CrossRefGoogle Scholar
  63. Saigo H et al (2004) Protein homology detection using string alignment kernels. Bioinformatics 20(11):1682–1689PubMedCrossRefGoogle Scholar
  64. Shen H-B, Chou K-C (2007) Gpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng Design Select 20:39–46CrossRefGoogle Scholar
  65. Shi SP et al (2011) Identify submitochondria and subchloroplast locations with pseudo amino acid composition: Approach from the strategy of discrete wavelet transform feature extraction. Biochim Biophys Acta 1813:424–430PubMedCrossRefGoogle Scholar
  66. Tan X, Triggs B (2007) Enhanced local texture feature sets for face recognition under difficult lighting conditions. Analysis and modelling of faces and gestures. In: LNCS, vol 4778, pp 168–182Google Scholar
  67. Wen ZN, Wang KL, Li ML, Nie FS, Yang Y (2005) Analyzing functional similarity of protein sequences with discrete wavelet transform. Comput Biol Chem 29:220–228PubMedCrossRefGoogle Scholar
  68. Wolfram S (1984) Cellular automation as models of complexity. Nature 311:419–424CrossRefGoogle Scholar
  69. Xiao X, Chou KC (2007) Digital coding of amino acids based on hydrophobic index. Protein Peptide Lett 14:871–875CrossRefGoogle Scholar
  70. Xiao X et al (2005a) An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J Theor Biol 235:555–565PubMedCrossRefGoogle Scholar
  71. Xiao X et al (2005b) Using cellular automata to generate Image representation for biological sequences. Amino Acids 28:29–35PubMedCrossRefGoogle Scholar
  72. Xiao X, Shao SH, Chou KC (2006a) A probability cellular automaton model for hepatitis B viral infections. Biochem Biophys Res Commun 342:605–610PubMedCrossRefGoogle Scholar
  73. Xiao X et al (2006b) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30:49–54PubMedCrossRefGoogle Scholar
  74. Xiao X, Wang P, Chou KC (2009) GPCR-CA: a cellular automaton image approach for predicting G-protein-coupled receptor functional classes. J Comput Chem 30(9):1414–1423PubMedCrossRefGoogle Scholar
  75. Xiao X, Wang P, Chou KC (2011a) Quat-2L: a web-server for predicting protein quaternary structural attributes. Mol Divers 15:149–155PubMedCrossRefGoogle Scholar
  76. Xiao X, Wang P, Chou KC (2011b) GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. Mol Biosyst 7:911–919PubMedCrossRefGoogle Scholar
  77. Yang ZR, Thomson R (2005) Bio-basis function neural network for prediction of protease cleavage sites in proteins. IEEE Trans Neural Netw 16:263–274PubMedCrossRefGoogle Scholar
  78. Zeng YH et al (2009) Using the augmented Chou’s pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 259(2):366–372PubMedCrossRefGoogle Scholar
  79. Zhou GP (2011) The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism. J Theor Biol 284:142–148PubMedCrossRefGoogle Scholar
  80. Zhou GP, Deng MH (1984) An extension of Chou’s graphical rules for deriving enzyme kinetic equations to system involving parallel reaction pathways. Biochem J 222:169–176PubMedGoogle Scholar
  81. Zhou XB et al (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Loris Nanni
    • 1
  • Sheryl Brahnam
    • 2
  • Alessandra Lumini
    • 3
  1. 1.Department of Information EngineeringUniversity of PaduaPadovaItaly
  2. 2.Computer Information SystemsMissouri State UniversitySpringfieldUSA
  3. 3.Department of ElectronicInformatics and Systems (DEIS), Università di BolognaCesenaItaly

Personalised recommendations