Knowledge and Information Systems

, Volume 34, Issue 1, pp 1–21

From raw publications to Linked Data

  • Tudor Groza
  • Gunnar AAstrand Grimnes
  • Siegfried Handschuh
  • Stefan Decker
Regular Paper

DOI: 10.1007/s10115-011-0473-6

Cite this article as:
Groza, T., Grimnes, G.A., Handschuh, S. et al. Knowl Inf Syst (2013) 34: 1. doi:10.1007/s10115-011-0473-6

Abstract

The continuous development of the Linked Data Web depends on the advancement of the underlying extraction mechanisms. This is of particular interest for the scientific publishing domain, where currently most of the data sets are being created manually. In this article, we present a Machine Learning pipeline that enables the automatic extraction of heading metadata (i.e., title, authors, etc) from scientific publications. The experimental evaluation shows that our solution handles very well any type of publication format and improves the average extraction performance of the state of the art with around 4%, in addition to showing an increased versatility. Finally, we propose a flexible Linked Data-driven mechanism to be used both for refining and linking the automatically extracted metadata.

Keywords

Metadata extractionSupport vector machinesConditional random fieldsLinked data

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Tudor Groza
    • 1
    • 2
    • 3
  • Gunnar AAstrand Grimnes
    • 4
  • Siegfried Handschuh
    • 1
    • 2
  • Stefan Decker
    • 1
    • 2
  1. 1.DERI, National University of IrelandGalwayIreland
  2. 2.IDA Business Park, Lower DanganGalwayIreland
  3. 3.School of ITEEThe University of QueenslandQueenslandAustralia
  4. 4.DFKI GmbHKaiserslauternGermany