A Text Mining-Based Framework for Constructing an RDF-Compliant Biodiversity Knowledge Repository

Batista-Navarro, Riza; Zerva, Chrysoula; Nguyen, Nhung T. H.; Ananiadou, Sophia

doi:10.1007/978-3-319-55209-5_3

A Text Mining-Based Framework for Constructing an RDF-Compliant Biodiversity Knowledge Repository

Riza Batista-Navarro¹²,
Chrysoula Zerva¹²,
Nhung T. H. Nguyen¹² &
…
Sophia Ananiadou¹²

Conference paper
First Online: 08 March 2017

679 Accesses
7 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 656))

Abstract

In our aim to make the information encapsulated by biodiversity literature more accessible and searchable, we have developed a text mining-based framework for automatically transforming text into a structured knowledge repository. A text mining workflow employing information extraction techniques, i.e., named entity recognition and relation extraction, was implemented in the Argo platform and was subsequently applied on biodiversity literature to extract structured information. The resulting annotations were stored in a repository following the emerging Open Annotation standard, thus promoting interoperability with external applications. Accessible as a SPARQL endpoint, the repository facilitates knowledge discovery over a huge amount of biodiversity literature by retrieving annotations matching user-specified queries. We present some use cases to illustrate the types of queries that the knowledge repository currently accommodates.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Flora Phenotype Ontology. https://bioportal.bioontology.org/ontologies/FLOPO. Accessed 20 Sep 2016
Gazetteer. http://bioportal.bioontology.org/ontologies/GAZ. Accessed 20 Sep 2016
LingPipe. http://alias-i.com/lingpipe/. Accessed 20 Sep 2016
NERsuite: a named entity recognition toolkit. http://nersuite.nlplab.org/. Accessed 20 Sep 2016
Plant Trait Ontology. http://www.obofoundry.org/ontology/to.html. Accessed 20 Sep 2016
Species 2000 & ITIS Catalogue of Life. Digital resource, September 2016. www.catalogueoflife.org/col. Accessed 20 Sep 2016
Buttigieg, P.L., Morrison, N., Smith, B., Mungall, C.J., Lewis, S.E.: The environment ontology: contextualising biological and biomedical entities. J. Biomed. Semant. 4(1), 43 (2013)
Article Google Scholar
Cui, H., Jiang, K., Sanyal, P.P.: From text to RDF triple store: an application for biodiversity literature. In: Proceedings of the Association for Information Science and Technology (ASIST 2010) (2010)
Google Scholar
Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.: RDF123: from spreadsheets to RDF. In: Sheth, A., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 451–466. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88564-1_29
Chapter Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Miyao, Y., Tsujii, J.: Feature forest models for probabilistic HPSG parsing. Comput. Linguist. 34(1), 35–80 (2008)
Article MathSciNet Google Scholar
Mungall, C.J., Torniai, C., Gkoutos, G.V., Lewis, S.E., Haendel, M.A.: Uberon, an integrative multi-species anatomy ontology. Genome Biol. 13(1), R5 (2012)
Article Google Scholar
Parr, C., Sachs, J., Han, L., Wang, T.: RDF123 and spotter: tools for generating OWL and RDF for biodiversity data in spreadsheets and unstructured text. In: Proceedings of Biodiversity Information Standards Annual Conference (TDWG 2007) (2007)
Google Scholar
Rak, R., Rowley, A., Carter, J., Batista-Navarro, R., Ananiadou, S.: Interoperability and customisation of annotation schemata in argo. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 3837–3842. European Language Resources Association (ELRA), May 2014
Google Scholar
Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database 2012, bas010 (2012)
Article Google Scholar
Sanderson, R., Ciccarese, P., Van de Sompel, H.: Designing the w3c open annotation data model. In: Proceedings of the 5th Annual ACM Web Science Conference (WebSci 2013), pp. 366–375. ACM, New York (2013)
Google Scholar
Stucky, B.J., Deck, J., Conlin, T., Ziemba, L., Cellinese, N., Guralnick, R.: The BiSciCol triplifier: bringing biodiversity data to the semantic web. BMC Bioinform. 15(1), 1–9 (2014)
Article Google Scholar
Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.: Developing a robust part-of-speech tagger for biomedical text. In: Bozanis, P., Houstis, E.N. (eds.) PCI 2005. LNCS, vol. 3746, pp. 382–392. Springer, Heidelberg (2005). doi:10.1007/11573036_36
Chapter Google Scholar

Download references

Acknowledgments

We would like to thank Prof. Marilou Nicolas for her valuable inputs. This work is funded by the British Council [172722806 (COPIOUS)], and is partially supported by the Engineering and Physical Sciences Research Council [EP/1038099/1 (CDT)].

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
Riza Batista-Navarro, Chrysoula Zerva, Nhung T. H. Nguyen & Sophia Ananiadou

Authors

Riza Batista-Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Chrysoula Zerva
View author publications
You can also search for this author in PubMed Google Scholar
Nhung T. H. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Ananiadou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sophia Ananiadou .

Editor information

Editors and Affiliations

College of Medicine, University of Florida, Gainesville, Florida, USA
Juan Antonio Lossio-Ventura
Faculty of Engineering, Universidad del Pacífico, Jesús María, Lima, Peru
Hugo Alatrista-Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista-Navarro, R., Zerva, C., Nguyen, N.T.H., Ananiadou, S. (2017). A Text Mining-Based Framework for Constructing an RDF-Compliant Biodiversity Knowledge Repository. In: Lossio-Ventura, J., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig SIMBig 2015 2016. Communications in Computer and Information Science, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-319-55209-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-55209-5_3
Published: 08 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55208-8
Online ISBN: 978-3-319-55209-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics