Wavelet images and Chou’s pseudo amino acid composition for protein classification
- 380 Downloads
The last decade has seen an explosion in the collection of protein data. To actualize the potential offered by this wealth of data, it is important to develop machine systems capable of classifying and extracting features from proteins. Reliable machine systems for protein classification offer many benefits, including the promise of finding novel drugs and vaccines. In developing our system, we analyze and compare several feature extraction methods used in protein classification that are based on the calculation of texture descriptors starting from a wavelet representation of the protein. We then feed these texture-based representations of the protein into an Adaboost ensemble of neural network or a support vector machine classifier. In addition, we perform experiments that combine our feature extraction methods with a standard method that is based on the Chou’s pseudo amino acid composition. Using several datasets, we show that our best approach outperforms standard methods. The Matlab code of the proposed protein descriptors is available at http://bias.csr.unibo.it/nanni/wave.rar.
KeywordsProteins classification Machine learning Ensemble of classifiers Support vector machines
We wish to thank Ojansivu and Heikkila for sharing their LPQ code; Rahtu, Salo and Heikkila for sharing their MSAhist code; Ahonen, Matas, He and Pietikäinen for sharing their LBP-HF code.
Conflict of interest
The authors declare that they have no conflict of interest.
- Chou KC, Shen HB (2007) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16Google Scholar
- Chou KC, Shen HB (2010a) Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Sci 2:1090–1103Google Scholar
- Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeGoogle Scholar
- Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30Google Scholar
- Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. HP Laboratories, Palo AltoGoogle Scholar
- Garg A, Gupta D (2008) VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinform 9:62. doi: 10.1186/1471-2105-9-62
- Jaakkola T, Diekhans M, Haussler D (1999) Using the Fisher kernel method to detect remote protein homologies. In: Seventh international conference on intelligent systems for molecular biology. AAAI Press, Menlo Park, pp 149–158Google Scholar
- Kawashima S, Kanehisa M (2000) AAindex: amino acid index database. Nucl Acids Res 20:1Google Scholar
- Lei Z, Dai Y (2005) An SVM-based system for predicting protein subnuclear localizations. BMC Bioinform 6:291Google Scholar
- Li FM, Li QZ (2008) Predicting protein subcellular location using Chou’s pseudo amino acid composition and improved hybrid approach. Protein Peptide Lett 15:612–616Google Scholar
- Ojansivu V, Heikkila J (2008) Blur insensitive texture classification using local phase quantization. In: ICISPGoogle Scholar
- Qin ZC (2006) ROC analysis for predictions made by probabilistic classifiers. In: Fourth international conference on machine learning and cybernetics, pp 3119–3124Google Scholar
- Tan X, Triggs B (2007) Enhanced local texture feature sets for face recognition under difficult lighting conditions. Analysis and modelling of faces and gestures. In: LNCS, vol 4778, pp 168–182Google Scholar