GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications

  • Patrice Lopez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5714)


Based on state of the art machine learning techniques, GROBID (GeneRation Of BIbliographic Data) performs reliable bibliographic data extractions from scholar articles combined with multi-level term extractions. These two types of extraction present synergies and correspond to complementary descriptions of an article. This tool is viewed as a component for enhancing the existing and the future large repositories of technical and scientific publications.


  1. 1.
    Peng, F., McCallum, A.: Accurate Information Extraction from Research Papers using Conditional Random Fields. In: Proceedings of HLT-NAACL (2004)Google Scholar
  2. 2.
    McCallum, A., Kachites, A.: MALLET: A Machine Learning for Language Toolkit (2002)Google Scholar
  3. 3.
    Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of ACL Workshop on Multiword Expressions (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Patrice Lopez
    • 1
  1. 1.European Patent OfficeBerlinGermany

Personalised recommendations