A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

  • Afshine Amidi
  • Shervine Amidi
  • Dimitrios Vlachakis
  • Nikos Paragios
  • Evangelia I. ZacharakiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9656)


The massive expansion of the worldwide Protein Data Bank (PDB) provides new opportunities for computational approaches which can learn from available data and extrapolate the knowledge into new coming instances. The aim of this work is to apply machine learning in order to train prediction models using data acquired by costly experimental procedures and perform enzyme functional classification. Enzymes constitute key pharmacological targets and the knowledge on the chemical reactions they catalyze is very important for the development of potent molecular agents that will either suppress or enhance the function of the given enzyme, thus modulating a pathogenicity, an illness or even the phenotype. Classification is performed on two levels: (i) using structural information into a Support Vector Machines (SVM) classifier and (ii) based on amino acid sequence alignment and Nearest Neighbor (NN) classification. The classification accuracy is increased by fusing the two classifiers and reaches 93.4 % on a large dataset of 39,251 proteins from the PDB database. The method is very competitive with respect to accuracy of classification into the 6 enzymatic classes, while at the same time its computational cost during prediction is very small.


Enzyme classification Protein structure Amino acid sequence alignment Multi-class SVM PDB database 



This research was partially supported by European Research Council Grant Diocles (ERC-STG-259112).


  1. 1.
    Dobson, P.D., Doig, A.J.: Predicting enzyme class from protein structure without alignments. J. Mol. Biol. 345(1), 187–199 (2005)CrossRefGoogle Scholar
  2. 2.
    Osman, M.H., Choong-Yeun Liong, I.H.: Hybrid Learning algorithm in neural network system for enzyme classification. ICSRS 2 (2010). ISSN 2074–8523Google Scholar
  3. 3.
    Volpato, V., Adelfio, A., Pollastri, G.: Accurate prediction of protein enzymatic class by N-to-1 neural networks. BMC Bioinformatics 14(Suppl 1), S11 (2013). doi: 10.1186/1471-2105-14-S1-S11, licensee BioMed Central Ltd. 2013
  4. 4.
    des Jardins, M., Karp, P.D., Krummenacker, M., Lee, T.J., Ouzounis, C.A. : Prediction of enzyme classification from protein sequence without the use of sequence similarity. ISMB (1997)Google Scholar
  5. 5.
    Kumar, C., Choudhary, A.: A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP J Bioinform Syst Biol. (1), 1 (2012). doi: 10.1186/1687-4153-2012-1
  6. 6.
    Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of enzyme function from protein sequence based of feature representation. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, 2007, BIBE 2007, pp. 741–747 (2007)Google Scholar
  7. 7.
    Sharma, M., Garg, P.: Computational approaches for enzyme functional class prediction: a review. Curr. Proteomics 11(1), 17–22 (2014)CrossRefGoogle Scholar
  8. 8.
    Read, R., Adams, P., Arendall III, W., Brunger, A., Emsley, P., Joosten, R., Keyweft, G., Krissinel, E., Lütteke, T., Otwinowski, Z., Perrakis, A., Richardson, J., Sheffler, W., Smith, J., Tickle, I., Vriend, G., Zwart, P.: A new generation of crystallographic validation tools for the protein data bank. PubMed (2011). doi: 10.1016/j.str.2011.08.006
  9. 9.
    Bermejo, G., Clore, G., Schwieters, C.: Smooth statistical torsion angle potential derived from a large conformational database via adaptive kernel density estimation improves the quality of NMR protein structures. Proteine Sci. (2012). doi: 10.1002/pro.2163
  10. 10.
    Lie, J., Koehl, P.: 3D representations of amino acids-applications to protein sequence comparison and classification. Comput. Struct. Biotechnol. J. 11, 47–58 (2014). doi: 10.1016/j.csbj.2014.09.001 CrossRefGoogle Scholar
  11. 11.
    Sharif, M.M., Thrwat, A., Amin, I.I., Ella, A., Hefeny, H.A.: Enzyme function classification based on sequence alignment. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K., Sarkar, P.P., Mukhopadhyay, A. (eds.) Advances in Intelligent Systems and Computing. Advances in Intelligent Systems and Computing, vol. 340, pp. 409–418. Springer, India (2015)Google Scholar
  12. 12.
    Jensen, L.J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Stærfeldt, H.H., Rapacki, K., Workman, C., Andersen, C.A., Knudsen, S., Krogh, A., Valencia, A., Brunak, S.: Prediction of human protein function from post-translational modifications and localization features. J. Mol. Biol. 319(5), 1257–1265 (2002)CrossRefGoogle Scholar
  13. 13.
    Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981). doi: 10.1016/0022-2836(81)90087-5 CrossRefGoogle Scholar
  14. 14.
    Platt, J.C.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)Google Scholar
  15. 15.
    Mohammed, A., Guda, C.: Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism. BMC Genomics 16 (2015). doi: 10.1186/1471-2164-16-S7-S16
  16. 16.
    Chawla, N.V.: Data Mining for Imbalanced Datasets: an overview (chap. 40). In: Maimon, O., Rokach, L. (eds.) Data Mining and Knwoledge Discovery Handbook, pp. 853–867. Springer, New York (2000). doi: 10.1007/0-387-25465-X40 Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Afshine Amidi
    • 1
  • Shervine Amidi
    • 1
  • Dimitrios Vlachakis
    • 2
  • Nikos Paragios
    • 1
    • 3
  • Evangelia I. Zacharaki
    • 1
    • 3
    Email author
  1. 1.Center for Visual Computing, Department of Applied MathematicsÉcole Centrale de ParisChâtenay-MalabryFrance
  2. 2.Bioinformatics and Medical Informatics LaboratoryBiomedical Research Foundation of the Academy of AthensAthensGreece
  3. 3.Equipe GALENINRIA Saclay, Île-de-FranceOrsayFrance

Personalised recommendations