Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities

  • Bernd Müller
  • Roman Klinger
  • Harsha Gurulingappa
  • Heinz-Theodor Mevissen
  • Martin Hofmann-Apitius
  • Juliane Fluck
  • Christoph M. Friedrich
Conference paper

DOI: 10.1007/978-3-642-13084-7_12

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6107)
Cite this paper as:
Müller B. et al. (2010) Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities. In: Cunningham H., Hanbury A., Rüger S. (eds) Advances in Multidisciplinary Retrieval. IRFC 2010. Lecture Notes in Computer Science, vol 6107. Springer, Berlin, Heidelberg

Abstract

In information retrieval, named entity recognition gives the opportunity to apply semantic search in domain specific corpora. Recently, more full text patents and journal articles became freely available. As the information distribution amongst the different sections is unknown, an analysis of the diversity is of interest.

This paper discovers the density and variety of relevant life science terminologies in Medline abstracts, PubMedCentral journal articles and patents from the TREC Chemistry Track. For this purpose named entity recognition for various bio, pharmaceutical, and chemical entity classes has been conducted and the frequencies and distributions in the different text zones analyzed.

The full texts from PubMedCentral comprise information to a greater extent than their abstracts while containing almost all given content from their abstracts. In the patents from the TREC Chemistry Track, it is even more extrem. Especially the description section includes almost all entities mentioned in a patent and contains in comparison to the claim section at least 79 % of all entities exclusively.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Bernd Müller
    • 1
    • 2
  • Roman Klinger
    • 1
  • Harsha Gurulingappa
    • 1
    • 2
  • Heinz-Theodor Mevissen
    • 1
  • Martin Hofmann-Apitius
    • 1
    • 2
  • Juliane Fluck
    • 1
  • Christoph M. Friedrich
    • 1
  1. 1.Fraunhofer Institute for Algorithms and Scientific Computing (SCAI)Sankt AugustinGermany
  2. 2.Bonn-Aachen International Center for Information Technology (B-IT)BonnGermany

Personalised recommendations