Disentangling the Structure of Tables in Scientific Literature

  • Nikola Milosevic
  • Cassie Gregson
  • Robert Hernandez
  • Goran Nenadic
Conference paper

DOI: 10.1007/978-3-319-41754-7_14

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9612)
Cite this paper as:
Milosevic N., Gregson C., Hernandez R., Nenadic G. (2016) Disentangling the Structure of Tables in Scientific Literature. In: Métais E., Meziane F., Saraee M., Sugumaran V., Vadera S. (eds) Natural Language Processing and Information Systems. NLDB 2016. Lecture Notes in Computer Science, vol 9612. Springer, Cham

Abstract

Within the scientific literature, tables are commonly used to present factual and statistical information in a compact way, which is easy to digest by readers. The ability to “understand” the structure of tables is key for information extraction in many domains. However, the complexity and variety of presentation layouts and value formats makes it difficult to automatically extract roles and relationships of table cells. In this paper, we present a model that structures tables in a machine readable way and a methodology to automatically disentangle and transform tables into the modelled data structure. The method was tested in the domain of clinical trials: it achieved an F-score of 94.26 % for cell function identification and 94.84 % for identification of inter-cell relationships.

Keywords

Table mining Text mining Data management Data modelling Natural language processing 

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Nikola Milosevic
    • 1
  • Cassie Gregson
    • 2
  • Robert Hernandez
    • 2
  • Goran Nenadic
    • 1
    • 3
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK
  2. 2.AstraZeneca LtdCambridgeUK
  3. 3.Health EResearch CentreManchesterUK

Personalised recommendations