Identifying References to Datasets in Publications

  • Katarina Boland
  • Dominique Ritze
  • Kai Eckert
  • Brigitte Mathiak
Conference paper

DOI: 10.1007/978-3-642-33290-6_17

Volume 7489 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Boland K., Ritze D., Eckert K., Mathiak B. (2012) Identifying References to Datasets in Publications. In: Zaphiris P., Buchanan G., Rasmussen E., Loizides F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg

Abstract

Research data and publications are usually stored in separate and structurally distinct information systems. Often, links between these resources are not explicitly available which complicates the search for previous research. In this paper, we propose a pattern induction method for the detection of study references in full texts. Since these references are not specified in a standardized way and may occur inside a variety of different contexts – i.e., captions, footnotes, or continuous text – our algorithm is required to induce very flexible patterns. To overcome the sparse distribution of training instances, we induce patterns iteratively using a bootstrapping approach. We show that our method achieves promising results for the automatic identification of data references and is a first step towards building an integrated information system.

Keywords

Digital Libraries Information Extraction Recognition of Dataset References Iterative Pattern Induction Bootstrapping 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Katarina Boland
    • 1
  • Dominique Ritze
    • 2
  • Kai Eckert
    • 2
  • Brigitte Mathiak
    • 1
  1. 1.GESIS - Leibniz Institute for the Social SciencesCologneGermany
  2. 2.Mannheim University LibraryMannheimGermany