An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper)

  • Joshua Valdez
  • Michael Rueschman
  • Matthew Kim
  • Susan Redline
  • Satya S. Sahoo
Conference paper

DOI: 10.1007/978-3-319-48472-3_43

Part of the Lecture Notes in Computer Science book series (LNCS, volume 10033)
Cite this paper as:
Valdez J., Rueschman M., Kim M., Redline S., Sahoo S.S. (2016) An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper). In: Debruyne C. et al. (eds) On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science, vol 10033. Springer, Cham

Abstract

Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called “Principles of Rigor and Reproducibility”. In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

Keywords

Ontology-based natural language processing Provenance metadata Scientific reproducibility Named entity recognition 

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Joshua Valdez
    • 1
  • Michael Rueschman
    • 2
  • Matthew Kim
    • 2
  • Susan Redline
    • 2
  • Satya S. Sahoo
    • 1
  1. 1.Division of Medical Informatics and Electrical Engineering and Computer Science DepartmentCase Western Reserve UniversityClevelandUSA
  2. 2.Departments of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical CenterHarvard UniversityBostonUSA

Personalised recommendations