A Hierarchical and Scalable Strategy for Protein Structural Classification

Mendes, Vinício F.; Monteiro, Cleiton R.; Comarela, Giovanni V.; Silveira, Sabrina A.

doi:10.1007/978-3-030-17938-0_34

Vinício F. Mendes¹⁸,
Cleiton R. Monteiro¹⁸,
Giovanni V. Comarela¹⁸ &
…
Sabrina A. Silveira¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11465))

Included in the following conference series:

International Work-Conference on Bioinformatics and Biomedical Engineering

1068 Accesses

Abstract

Protein function prediction is a relevant but challenging task as protein structural data is a large and complex information. With the increase of biological data available there is a demand for computational methods to annotate and help us make sense of this data deluge. Here we propose a model and a data mining based strategy to perform protein structural classification. We are particularly interested in hierarchical classification schemes. To evaluate the proposed strategy, we conduct three experiments using as input protein structural data from biological databases (CATH, SCOPe and BRENDA). Each dataset is associated with a well known hierarchical classification scheme (CATH, SCOP, EC number). We show that our model accuracy ranges from 86% to 95% when predicting CATH, SCOP and EC Number levels respectively. To the best of our knowledge, ours is the first work to reach such high accuracy when dealing with very large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.cathdb.info, http://scop.berkeley.edu/, https://www.brenda-enzymes.org.
2.
We intend to make the code and details related to the models publicly available upon publication of this manuscript.
3.
Using the Python’s Sklearn implementation (http://scikit-learn.org).

References

Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Chandonia, J.M., et al.: SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res. 47, D475–D481 (2018)
Article Google Scholar
Chen, K.E., et al.: Prediction of protein structural class using novel evolutionary collocation based sequence representation. J. Comput. Chem. 29(10), 1596–1604 (2008)
Article Google Scholar
Dalkiran, A., et al.: ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinform. 19(1), 334 (2018)
Article MathSciNet Google Scholar
Gu, J., et al.: Structural Bioinformatics, vol. 44. Wiley, London (2009)
Google Scholar
Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)
Article Google Scholar
Kedarisetti, K.D., et al.: Classifier ensembles for protein structural class prediction with varying homology. Biochem. Biophys. Res. Commun. 348(3), 981–988 (2006)
Article Google Scholar
McCallum, A., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
Google Scholar
Melo, R.C., et al.: A contact map matching approach to protein structure similarity analysis. Genet. Mol. Res. 5(2), 284–308 (2006)
Google Scholar
Melo, R.C., et al.: Finding protein-protein interaction patterns by contact map matching. Genet. Mol. Res. 6(4), 946–963 (2007)
Google Scholar
Mirceva, G., et al.: A novel approach for classifying protein structures based on fuzzy decision tree. In: 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–5. IEEE (2018)
Google Scholar
Nelson, D.L., et al.: Lehninger Principles of Biochemistry, 6th edn. Macmillan Learning, New York (2013)
Google Scholar
Pearl, F.M., et al.: The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 31(1), 452–455 (2003)
Article MathSciNet Google Scholar
Pires, D.E., et al.: Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 12(4), S12 (2011)
Article Google Scholar
Rogen, P., et al.: Automatic classification of protein structure by using Gauss integrals. Proc. Nat. Acad. Sci. U.S.A. 100(1), 119–124 (2003)
Article Google Scholar
Rogen, P., et al.: A new family of global protein shape descriptors. Math. Biosci. 182(2), 167–181 (2003)
Article MathSciNet Google Scholar
Rose, P.W., et al.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45, D271–D281 (2016)
Google Scholar
Schomburg, I., et al.: BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30(1), 47–49 (2002)
Article Google Scholar
Sillitoe, I., et al.: CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 43(D1), D376–D381 (2015)
Article Google Scholar
Silveira, S.A., et al.: ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot. PloS One 9(2), e89162 (2014)
Article Google Scholar
Sun, X.D., et al.: Prediction of protein structural classes using support vector machines. Amino Acids 30(4), 469–475 (2006)
Article Google Scholar
Tyzack, J.D., et al.: Understanding enzyme function evolution from a computational perspective. Curr. Opin. Struct. Biol. 47, 131–139 (2017)
Article Google Scholar
Wei, D., Xu, Q., Zhao, T., Dai, H. (eds.): Advance in Structural Bioinformatics. AEMB, vol. 827. Springer, Dordrecht (2015). https://doi.org/10.1007/978-94-017-9245-5
Book Google Scholar
Weinberger, K., et al.: Distance metric learning for large margin nearest neighbor classification. Adv. Neural Inf. Process. Syst. 18, 1473 (2006)
Google Scholar

Download references

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).

Author information

Authors and Affiliations

Department of Computer Science, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil
Vinício F. Mendes, Cleiton R. Monteiro, Giovanni V. Comarela & Sabrina A. Silveira

Authors

Vinício F. Mendes
View author publications
You can also search for this author in PubMed Google Scholar
Cleiton R. Monteiro
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni V. Comarela
View author publications
You can also search for this author in PubMed Google Scholar
Sabrina A. Silveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinício F. Mendes .

Editor information

Editors and Affiliations

Department of Computer Architecture and Computer Technology Higher Technical School of Information Technology and Telecommunications Engineering, CITIC-UGR, Granada, Spain
Ignacio Rojas
ETSIIT, University of Granada, Granada, Spain
Olga Valenzuela
CITIC-UGR, University of Granada, Granada, Spain
Fernando Rojas
Fundacion Progreso y Salud, Granada, Spain
Francisco Ortuño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mendes, V.F., Monteiro, C.R., Comarela, G.V., Silveira, S.A. (2019). A Hierarchical and Scalable Strategy for Protein Structural Classification. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-17938-0_34
Published: 13 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17937-3
Online ISBN: 978-3-030-17938-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics