A Text Mining Framework for Accelerating the Semantic Curation of Literature

Batista-Navarro, Riza; Hammock, Jennifer; Ulate, William; Ananiadou, Sophia

doi:10.1007/978-3-319-43997-6_44

Riza Batista-Navarro¹⁷,
Jennifer Hammock¹⁸,
William Ulate¹⁹ &
…
Sophia Ananiadou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9819))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1641 Accesses
1 Citations
6 Altmetric

Abstract

The Biodiversity Heritage Library is the world’s largest digital library of biodiversity literature. Currently containing almost 40 million pages, the library can be explored with a search interface employing keyword-matching, which unfortunately fails to address issues brought about by ambiguity. Helping alleviate these issues are tools that automatically attach semantic metadata to documents, e.g., biodiversity concept recognisers. However, gold standard, semantically annotated textual corpora are critical for the development of these advanced tools. In the biodiversity domain, such corpora are almost non-existent especially since the construction of semantically annotated resources is typically a time-consuming and laborious process. Aiming to accelerate the development of a corpus of biodiversity documents, we propose a text mining framework that hastens curation through an iterative feedback-loop process of (1) manual annotation, and (2) training and application of statistical concept recognition models. Even after only a few iterations, our curators were observed to have spent less time and effort on annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database: J. Biol. Databases Curation 2012, bas010 (2012)
Article Google Scholar
Batista-Navarro, R., Rak, R., Ananiadou, S.: Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics. J. Cheminformatics 7(Suppl. 1), S6 (2015)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
Riza Batista-Navarro & Sophia Ananiadou
Smithsonian Institute, Washington, D.C., USA
Jennifer Hammock
Missouri Botanical Garden, Missouri, USA
William Ulate

Authors

Riza Batista-Navarro
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Hammock
View author publications
You can also search for this author in PubMed Google Scholar
William Ulate
View author publications
You can also search for this author in PubMed Google Scholar
Sophia Ananiadou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Riza Batista-Navarro .

Editor information

Editors and Affiliations

Universität Duisburg-Essen , Duisburg, Germany
Norbert Fuhr
Hungarian Academy of Science , Budapest, Hungary
László Kovács
Leibniz Universität Hannover , Hannover, Germany
Thomas Risse
Leibniz Universität Hannover , Hannover, Germany
Wolfgang Nejdl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Batista-Navarro, R., Hammock, J., Ulate, W., Ananiadou, S. (2016). A Text Mining Framework for Accelerating the Semantic Curation of Literature. In: Fuhr, N., Kovács, L., Risse, T., Nejdl, W. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2016. Lecture Notes in Computer Science(), vol 9819. Springer, Cham. https://doi.org/10.1007/978-3-319-43997-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-43997-6_44
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43996-9
Online ISBN: 978-3-319-43997-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics