Skip to main content

A Hierarchical and Scalable Strategy for Protein Structural Classification

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2019)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11465))

  • 1068 Accesses

Abstract

Protein function prediction is a relevant but challenging task as protein structural data is a large and complex information. With the increase of biological data available there is a demand for computational methods to annotate and help us make sense of this data deluge. Here we propose a model and a data mining based strategy to perform protein structural classification. We are particularly interested in hierarchical classification schemes. To evaluate the proposed strategy, we conduct three experiments using as input protein structural data from biological databases (CATH, SCOPe and BRENDA). Each dataset is associated with a well known hierarchical classification scheme (CATH, SCOP, EC number). We show that our model accuracy ranges from 86% to 95% when predicting CATH, SCOP and EC Number levels respectively. To the best of our knowledge, ours is the first work to reach such high accuracy when dealing with very large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cathdb.info, http://scop.berkeley.edu/, https://www.brenda-enzymes.org.

  2. 2.

    We intend to make the code and details related to the models publicly available upon publication of this manuscript.

  3. 3.

    Using the Python’s Sklearn implementation (http://scikit-learn.org).

References

  1. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  2. Chandonia, J.M., et al.: SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res. 47, D475–D481 (2018)

    Article  Google Scholar 

  3. Chen, K.E., et al.: Prediction of protein structural class using novel evolutionary collocation based sequence representation. J. Comput. Chem. 29(10), 1596–1604 (2008)

    Article  Google Scholar 

  4. Dalkiran, A., et al.: ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinform. 19(1), 334 (2018)

    Article  MathSciNet  Google Scholar 

  5. Gu, J., et al.: Structural Bioinformatics, vol. 44. Wiley, London (2009)

    Google Scholar 

  6. Hearst, M.A., et al.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)

    Article  Google Scholar 

  7. Kedarisetti, K.D., et al.: Classifier ensembles for protein structural class prediction with varying homology. Biochem. Biophys. Res. Commun. 348(3), 981–988 (2006)

    Article  Google Scholar 

  8. McCallum, A., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

    Google Scholar 

  9. Melo, R.C., et al.: A contact map matching approach to protein structure similarity analysis. Genet. Mol. Res. 5(2), 284–308 (2006)

    Google Scholar 

  10. Melo, R.C., et al.: Finding protein-protein interaction patterns by contact map matching. Genet. Mol. Res. 6(4), 946–963 (2007)

    Google Scholar 

  11. Mirceva, G., et al.: A novel approach for classifying protein structures based on fuzzy decision tree. In: 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–5. IEEE (2018)

    Google Scholar 

  12. Nelson, D.L., et al.: Lehninger Principles of Biochemistry, 6th edn. Macmillan Learning, New York (2013)

    Google Scholar 

  13. Pearl, F.M., et al.: The CATH database: an extended protein family resource for structural and functional genomics. Nucleic Acids Res. 31(1), 452–455 (2003)

    Article  MathSciNet  Google Scholar 

  14. Pires, D.E., et al.: Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 12(4), S12 (2011)

    Article  Google Scholar 

  15. Rogen, P., et al.: Automatic classification of protein structure by using Gauss integrals. Proc. Nat. Acad. Sci. U.S.A. 100(1), 119–124 (2003)

    Article  Google Scholar 

  16. Rogen, P., et al.: A new family of global protein shape descriptors. Math. Biosci. 182(2), 167–181 (2003)

    Article  MathSciNet  Google Scholar 

  17. Rose, P.W., et al.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 45, D271–D281 (2016)

    Google Scholar 

  18. Schomburg, I., et al.: BRENDA, enzyme data and metabolic information. Nucleic Acids Res. 30(1), 47–49 (2002)

    Article  Google Scholar 

  19. Sillitoe, I., et al.: CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res. 43(D1), D376–D381 (2015)

    Article  Google Scholar 

  20. Silveira, S.A., et al.: ENZYMAP: exploiting protein annotation for modeling and predicting EC number changes in UniProt/Swiss-Prot. PloS One 9(2), e89162 (2014)

    Article  Google Scholar 

  21. Sun, X.D., et al.: Prediction of protein structural classes using support vector machines. Amino Acids 30(4), 469–475 (2006)

    Article  Google Scholar 

  22. Tyzack, J.D., et al.: Understanding enzyme function evolution from a computational perspective. Curr. Opin. Struct. Biol. 47, 131–139 (2017)

    Article  Google Scholar 

  23. Wei, D., Xu, Q., Zhao, T., Dai, H. (eds.): Advance in Structural Bioinformatics. AEMB, vol. 827. Springer, Dordrecht (2015). https://doi.org/10.1007/978-94-017-9245-5

    Book  Google Scholar 

  24. Weinberger, K., et al.: Distance metric learning for large margin nearest neighbor classification. Adv. Neural Inf. Process. Syst. 18, 1473 (2006)

    Google Scholar 

Download references

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinício F. Mendes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mendes, V.F., Monteiro, C.R., Comarela, G.V., Silveira, S.A. (2019). A Hierarchical and Scalable Strategy for Protein Structural Classification. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11465. Springer, Cham. https://doi.org/10.1007/978-3-030-17938-0_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17938-0_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17937-3

  • Online ISBN: 978-3-030-17938-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics