Skip to main content

Semantic Annotation Modelling for Protein Functions Prediction

  • Conference paper
  • First Online:
  • 870 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11089))

Abstract

Functional protein annotation is a key phase in the analysis of de-novo sequenced genomes. Often the automatic annotation tools are insensitive to removing wrong annotations associated with contradictions and non-compliance in biological terms. In this study, we introduce a semantic model for representation of functional annotations based on a resource description framework standard (RDF).

We have integrated several databases with information for protein sequences and ontologies describing the functional relationships of the protein molecules. By using Web Ontology Language (OWL) axioms, RDF storage engines are able to take decisions which candidate annotations should be marked as biologically unviable and do not withstand the reality checks associated with coexistence, subcellular location and species affiliation [1]. This approach reduces the number of false positives and time spent in machine annotation’s curation process. The presented semantic data model is designed to combine the semantic representation of annotations with examples designed for machine learning.

Current work is part of a large scale project of functional annotation of plant genomes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Koonin, V., Galperin, Y.: Computational Approaches in Comparative Genomics. Sequence - Evolution – Function (2003)

    Google Scholar 

  2. Poux, S., Arighi, N., et al.: Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. Database 2014(1) (2014). https://doi.org/10.1093/database/bau016

  3. The Gene Ontology Consortium: Gene ontology annotations and resources. Nucleic Acids Res. 41(D1), 530–535 (2013). https://doi.org/10.1093/nar/gks1050

    Article  Google Scholar 

  4. Claude, P., Fabrice, G., et al.: THEA: Ontology driven analysis of microarray data. Bioinformatics (2007). https://doi.org/10.1093/bioinformatics/bth295

    Article  Google Scholar 

  5. Xu, D.: Protein databases on the internet. Curr. Protoc. Mol. Biol., Chap. Unit–19.4 (2004). https://doi.org/10.1002/0471142727.mb1904s68

  6. Pundir, S., Martin, J., et al.: UniProt protein knowledgebase. In: Wu, C., Arighi, C., Ross, K. (eds.) Protein Bioinformatics (2017). Methods Mol. Biol. 1558

    Google Scholar 

  7. Sigrist, A., Cerutti, L., et al.: PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 38(Database issue), 161–166 (2010). https://doi.org/10.1093/nar/gkp885

  8. Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32(suppl. 1), D258–D261 (2004)

    Google Scholar 

  9. Apweiler, K., Attwood, A., et al.: The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29(1), 37–40 (2001)

    Article  Google Scholar 

  10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (2013)

    MATH  Google Scholar 

  11. Kaushik, S., Chouhan, U., Dwivedi, A.: Prediction of protein subcellular localization of human protein using j48, random forest and best first tree techniques. J. Adv. Appl. Sci. Res. 1(12) (2017)

    Google Scholar 

  12. W3C Recommendation: OWL Web Ontology Language Reference, 10 February 2004

    Google Scholar 

Download references

Acknowledgements

The presented work has been funded by the Bulgarian NSF within the “GloBIG: A Model of Integration of Cloud Framework for Hybrid Massive Parallelism and its Application for Analysis and Automated Semantic Enhancement of Big Heterogeneous Data Collections” project, Contract DN02/9 of 17.12.2016, and by Sofia University SRF within the “Models for semantic integration of biomedical data” project, Contract 80-10-207/26.04.2018

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irena Avdjieva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peychev, D., Avdjieva, I. (2018). Semantic Annotation Modelling for Protein Functions Prediction. In: Agre, G., van Genabith, J., Declerck, T. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2018. Lecture Notes in Computer Science(), vol 11089. Springer, Cham. https://doi.org/10.1007/978-3-319-99344-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99344-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99343-0

  • Online ISBN: 978-3-319-99344-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics