GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications

Lopez, Patrice

doi:10.1007/978-3-642-04346-8_62

Patrice Lopez²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5714))

Included in the following conference series:

International Conference on Theory and Practice of Digital Libraries

1910 Accesses
74 Citations
7 Altmetric

Abstract

Based on state of the art machine learning techniques, GROBID (GeneRation Of BIbliographic Data) performs reliable bibliographic data extractions from scholar articles combined with multi-level term extractions. These two types of extraction present synergies and correspond to complementary descriptions of an article. This tool is viewed as a component for enhancing the existing and the future large repositories of technical and scientific publications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Peng, F., McCallum, A.: Accurate Information Extraction from Research Papers using Conditional Random Fields. In: Proceedings of HLT-NAACL (2004)
Google Scholar
McCallum, A., Kachites, A.: MALLET: A Machine Learning for Language Toolkit (2002)
Google Scholar
Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of ACL Workshop on Multiword Expressions (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

European Patent Office, D-10969, Berlin, Germany
Patrice Lopez

Authors

Patrice Lopez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padova, Italy
Maristella Agosti
Department of Computer Science and Engineering IST, Instituto Superior Técnico, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
José Borbinha
Department of Archives and Library Sciences, Ionian University, 72 Ioannou Theotoki str., 49100, Corfu, Greece
Sarantos Kapidakis
Department of Archives and Library Sciences, Ionian University, 72 Ioannou Theotoiki str., 49100, Corfu, Greece
Christos Papatheodorou & Giannis Tsakonas &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lopez, P. (2009). GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-04346-8_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04345-1
Online ISBN: 978-3-642-04346-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics