A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors
The massive expansion of the worldwide Protein Data Bank (PDB) provides new opportunities for computational approaches which can learn from available data and extrapolate the knowledge into new coming instances. The aim of this work is to apply machine learning in order to train prediction models using data acquired by costly experimental procedures and perform enzyme functional classification. Enzymes constitute key pharmacological targets and the knowledge on the chemical reactions they catalyze is very important for the development of potent molecular agents that will either suppress or enhance the function of the given enzyme, thus modulating a pathogenicity, an illness or even the phenotype. Classification is performed on two levels: (i) using structural information into a Support Vector Machines (SVM) classifier and (ii) based on amino acid sequence alignment and Nearest Neighbor (NN) classification. The classification accuracy is increased by fusing the two classifiers and reaches 93.4 % on a large dataset of 39,251 proteins from the PDB database. The method is very competitive with respect to accuracy of classification into the 6 enzymatic classes, while at the same time its computational cost during prediction is very small.
KeywordsEnzyme classification Protein structure Amino acid sequence alignment Multi-class SVM PDB database
This research was partially supported by European Research Council Grant Diocles (ERC-STG-259112).
- 2.Osman, M.H., Choong-Yeun Liong, I.H.: Hybrid Learning algorithm in neural network system for enzyme classification. ICSRS 2 (2010). ISSN 2074–8523Google Scholar
- 3.Volpato, V., Adelfio, A., Pollastri, G.: Accurate prediction of protein enzymatic class by N-to-1 neural networks. BMC Bioinformatics 14(Suppl 1), S11 (2013). doi: 10.1186/1471-2105-14-S1-S11, licensee BioMed Central Ltd. 2013
- 4.des Jardins, M., Karp, P.D., Krummenacker, M., Lee, T.J., Ouzounis, C.A. : Prediction of enzyme classification from protein sequence without the use of sequence similarity. ISMB (1997)Google Scholar
- 5.Kumar, C., Choudhary, A.: A top-down approach to classify enzyme functional classes and sub-classes using random forest. EURASIP J Bioinform Syst Biol. (1), 1 (2012). doi: 10.1186/1687-4153-2012-1
- 6.Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of enzyme function from protein sequence based of feature representation. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, 2007, BIBE 2007, pp. 741–747 (2007)Google Scholar
- 8.Read, R., Adams, P., Arendall III, W., Brunger, A., Emsley, P., Joosten, R., Keyweft, G., Krissinel, E., Lütteke, T., Otwinowski, Z., Perrakis, A., Richardson, J., Sheffler, W., Smith, J., Tickle, I., Vriend, G., Zwart, P.: A new generation of crystallographic validation tools for the protein data bank. PubMed (2011). doi: 10.1016/j.str.2011.08.006
- 9.Bermejo, G., Clore, G., Schwieters, C.: Smooth statistical torsion angle potential derived from a large conformational database via adaptive kernel density estimation improves the quality of NMR protein structures. Proteine Sci. (2012). doi: 10.1002/pro.2163
- 11.Sharif, M.M., Thrwat, A., Amin, I.I., Ella, A., Hefeny, H.A.: Enzyme function classification based on sequence alignment. In: Mandal, J.K., Satapathy, S.C., Sanyal, M.K., Sarkar, P.P., Mukhopadhyay, A. (eds.) Advances in Intelligent Systems and Computing. Advances in Intelligent Systems and Computing, vol. 340, pp. 409–418. Springer, India (2015)Google Scholar
- 12.Jensen, L.J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Stærfeldt, H.H., Rapacki, K., Workman, C., Andersen, C.A., Knudsen, S., Krogh, A., Valencia, A., Brunak, S.: Prediction of human protein function from post-translational modifications and localization features. J. Mol. Biol. 319(5), 1257–1265 (2002)CrossRefGoogle Scholar
- 14.Platt, J.C.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Advances in Large Margin Classifiers, pp. 61–74. MIT Press, Cambridge (1999)Google Scholar
- 15.Mohammed, A., Guda, C.: Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism. BMC Genomics 16 (2015). doi: 10.1186/1471-2164-16-S7-S16