Peptidase Detection and Classification Using Enhanced Kernel Methods with Feature Selection

  • Lionel Morgado
  • Carlos Pereira
  • Paula Veríssimo
  • António Dourado
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 93)


The process of protein sequentialization that has been taking place for the last decade has been creating very large amounts of data, for which the knowledge is limited. Retrieving information from these proteins is the next step. For that, computational techniques are indispensable. Although there isn’t yet a silver bullet approach to solve the problem of enzyme detection and classification, machine learning formulations such as the state-of-the-art support vector machine (SVM) appear among the most reliable options. Here is presented a framework specialized in peptidase analysis, namely for detection and classification according to the hierarchies demarked in the MEROPS database. Feature selection with SVM-RFE is used to improve the discriminative models and build classifiers computationally more efficient.


Peptidase Classification Support Vector Machine Recursive Feature Elimination Bioinformatics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  2. 2.
    Chang, C., Lin, C.: LIBSVM: a Library for Support Vector Machines (2004)Google Scholar
  3. 3.
    Jaakkola, T., Diekhans, M., Haussler, D.: Using the Fisher Kernel Method to Detect Remote Protein Homologies. In: Proc. Int. Conf. Intell. Syst. Mol. Biol. (1999)Google Scholar
  4. 4.
    Krogh, A., Brown, M., Mian, I., Sjolander, K., Haussler, D.: Hidden markov models in computational biology: Applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)CrossRefGoogle Scholar
  5. 5.
    Kuang, R., Ie, E., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinform. Comput. Biol. 3, 527–550 (2005), doi:10.1142/S021972000500120XCrossRefGoogle Scholar
  6. 6.
    Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: astring kernel for SVM protein classification. In: Proc. Pac. Symp. Biocomput., vol. 7, pp. 564–575 (2002)Google Scholar
  7. 7.
    Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, W.: Mismatch string kernels for discriminative protein classification. Bioinform. 20, 467–476 (2004), doi:10.1093/bioinformatics/btg431Google Scholar
  8. 8.
    Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: Svm-fold: a tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinform. 8(4) (2007)Google Scholar
  9. 9.
    Aydin, Z., Altunbasak, Y., Pakatci, I., Erdogan, H.: Training Set Reduction Methods for Protein Secondary Structure Prediction in Single-Sequence Condition. In: Proc. 29th Annual Int. Conf. IEEE EMBS (2007)Google Scholar
  10. 10.
    Kurgan, L., Chen, K.: Prediction of protein structural class for the twilight zone sequences. Biochem. Biophys. Res. Commun. 357(2), 453–460 (2007)CrossRefGoogle Scholar
  11. 11.
    Cheng, J., Baldi, P.: A machine learning information retrieval approach to protein fold recognition. Bioinform. 22(12), 1456–1463 (2006)CrossRefGoogle Scholar
  12. 12.
    Mei, S., Fei, W.: Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinform. 11(Suppl. 1), 17 (2010)CrossRefGoogle Scholar
  13. 13.
    Du, P., Li, Y.: Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform. 7, 518 (2006), doi:10.1186/1471-2105-7-518CrossRefGoogle Scholar
  14. 14.
    Lanckriet, G., Deng, M., Cristianini, N., Jordan, M., Noble, W.: Kernel-based data fusion and its application to protein function prediction in yeast. Pac. Symp. Biocomput., 300–311 (2004)Google Scholar
  15. 15.
    Kuang, R., Gu, J., Cai, H., Wang, Y.: Improved Prediction of Malaria Degradomes by Supervised Learning with SVM and Profile Kernel. Genetica 36(1), 189–209 (2009)CrossRefGoogle Scholar
  16. 16.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)zbMATHCrossRefGoogle Scholar
  17. 17.
    Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: A structural classification of proteins database for the investigation of sequences and structure. J. Mol. Biol. 247, 536–540 (1995)Google Scholar
  18. 18.
    Vapnik, V.: Statistical learning theory. Adaptive and Learning Systems for Signal Processing, Communications and Control. Wiley, Chichester (1998)zbMATHGoogle Scholar
  19. 19.
    Niijima, S., Kuhara, S.: Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE. BMC Bioinform. 7 (2006), doi:10.1186/1471-2105-7-543Google Scholar
  20. 20.
    Ding, Y., Wilkins, D.: Improving the performance of SVM-RFE to select genes in microarray data. BMC Bioinform. 7 (2006), doi:10.1186/1471-2105-7-S2-S12Google Scholar
  21. 21.
    Tang, Y., Zhang, Y., Huang, Z.: Development of two-stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis. IEEE/ACM Transac. Comput. Biol. Bioinform. 4, 365–381 (2007)CrossRefGoogle Scholar
  22. 22.
    Vapnik, V.: Statistical learning theory. Wiley, New York (1998)zbMATHGoogle Scholar
  23. 23.
    Varshavsky, R., Fromer, M., Man, A., Linial, M.: When less is more: improving classification of protein families with a minimal set of global features. In: Giancarlo, R., Hannenhalli, S. (eds.) WABI 2007. LNCS (LNBI), vol. 4645, pp. 12–24. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Website of the Laboratory of Mass Spectrometry and Gaseous Ion Chemistry of the University of Rockefeller, (accessed October 1, 2009)
  25. 25.
    Chen, K., Kurgan, L., Ruan, J.: Optimization of the sliding window size for protein structure prediction. In: Int. Conf. Comput. Intell. Bioinfo. Comput. Biol., pp. 366–372 (2006)Google Scholar
  26. 26.
    Yang, X., Wang, B.: Weave amino acid sequences for protein secondary structure prediction. In: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 80–88 (2003)Google Scholar
  27. 27.
    Rawlings, N., Barrett, A., Bateman, A.: MEROPS: the peptidase database. Nucleic Acids Res. 38 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Lionel Morgado
    • 1
  • Carlos Pereira
    • 1
    • 2
  • Paula Veríssimo
    • 3
  • António Dourado
    • 1
  1. 1.Center for Informatics and Systems of the University of Coimbra Polo IIUniversity of CoimbraCoimbraPortugal
  2. 2.Instituto Superior de Engenharia de Coimbra Quinta da NoraCoimbraPortugal
  3. 3.Department of Biochemistry and Center for Neuroscience and Cell BiologyUniversity of CoimbraCoimbraPortugal

Personalised recommendations