Skip to main content

Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities

  • Conference paper
Advances in Multidisciplinary Retrieval (IRFC 2010)

Abstract

In information retrieval, named entity recognition gives the opportunity to apply semantic search in domain specific corpora. Recently, more full text patents and journal articles became freely available. As the information distribution amongst the different sections is unknown, an analysis of the diversity is of interest.

This paper discovers the density and variety of relevant life science terminologies in Medline abstracts, PubMedCentral journal articles and patents from the TREC Chemistry Track. For this purpose named entity recognition for various bio, pharmaceutical, and chemical entity classes has been conducted and the frequencies and distributions in the different text zones analyzed.

The full texts from PubMedCentral comprise information to a greater extent than their abstracts while containing almost all given content from their abstracts. In the patents from the TREC Chemistry Track, it is even more extrem. Especially the description section includes almost all entities mentioned in a patent and contains in comparison to the claim section at least 79 % of all entities exclusively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Fluck, J., Mevissen, H.T., Dach, H., Oster, M., Hofmann-Apitius, M.: ProMiner: recognition of human gene and protein names using regularly updated dictionaries. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, pp. 149–151 (2007) (last accessed August 2009)

    Google Scholar 

  2. Friedrich, C.M., Dach, H., Gattermayer, T., Engelbrecht, G., Benkner, S., Hofmann-Apitius, M.: @neuLink: A service-oriented application for biomedical knowledge discovery. In: Proceedings of the HealthGrid 2008, pp. 165–172 (2008) (last accessed August 2009)

    Google Scholar 

  3. Gospodnetic, O., Hatcher, E.: Lucene In Action. Action Series. Manning Publications Co., Greenwich (2005)

    Google Scholar 

  4. Guha, R., McCool, R., Miller, E.: Semantic search. In: WWW 2003: Proceedings of the 12th international conference on World Wide Web, pp. 700–709. ACM, New York (2003)

    Google Scholar 

  5. Gurulingappa, H., Müller, B., Klinger, R., Mevissen, H.-T., Hofmann-Apitius, M., Fluck, J., Friedrich, C.M.: Patent retrieval in chemistry based on semantically tagged named entities. In: Voorhees, E.M., Buckland, L.P. (eds.) The Eighteenth Text RETrieval Conference (TREC 2009) Proceedings, Gaithersburg, Maryland, USA (November 2009)

    Google Scholar 

  6. Hanisch, D., Fundel, K., Mevissen, H.-T., Zimmer, R., Fluck, J.: Prominer: rule-based protein and gene entity recognition. BMC Bioinformatics 6(suppl. 1), S14 (2005)

    Article  Google Scholar 

  7. Hirschman, L., Colosimo, M., Morgan, A., Yeh, A.: Overview of biocreative task 1b: normalized gene lists. BMC Bioinformatics 6(suppl. 1), S11 (2005)

    Article  Google Scholar 

  8. Hofmann-Apitius, M., Fluck, J., Furlong, L., Fornes, O., Kolářik, C., Hanser, S., Boeker, M., Schulz, S., Sanz, F., Klinger, R., Mevissen, T., Gattermayer, T., Oliva, B., Friedrich, C.M.: Knowledge environments representing molecular entities for the virtual physiological human. Philos. Transact. A Math. Phys. Eng. Sci. 366(1878), 3091–3110 (2008)

    Article  Google Scholar 

  9. Kanehisa, M., Goto, S.: Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  10. Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., Hirakawa, M.: From genomics to chemical genomics: new developments in kegg. Nucleic Acids Res. 34(Database issue), D354–D357 (2006)

    Article  Google Scholar 

  11. Klinger, R., Friedrich, C.M., Fluck, J., Hofmann-Apitius, M.: Named Entity Recognition with Combinations of Conditional Random Fields. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, Madrid, Spain, April 2007, pp. 89–91 (2007)

    Google Scholar 

  12. Klinger, R., Friedrich, C.M., Mevissen, H.T., Fluck, J., Hofmann-Apitius, M., Furlong, L.I., Sanz, F.: Identifying gene-specific variations in biomedical text. J. Bioinform. Comput. Biol. 5(6), 1277–1296 (2007)

    Article  Google Scholar 

  13. Klinger, R., Kolářik, C., Fluck, J., Hofmann-Apitius, M., Friedrich, C.M.: Detection of IUPAC and IUPAC-like Chemical Names. Bioinformatics 24(13), i268–i276 (2008); Proceedings of the International Conference Intelligent Systems for Molecular Biology (ISMB)

    Article  Google Scholar 

  14. Kolářik, C., Klinger, R., Friedrich, C.M., Hofmann-Apitius, M., Fluck, J.: Chemical names: Terminological resources and corpora annotation. In: Workshop on Building and evaluating resources for biomedical text mining, volume 6th edition of the Language Resources and Evaluation Conference, Marrakech, Morocco (2008)

    Google Scholar 

  15. Schuemie, M.J., Weeber, M., Schijvenaars, B.J.A., van Mulligen, E.M., van der Eijk, C.C., Jelier, R., Mons, B., Kors, J.A.: Distribution of information in biomedical abstracts and full-text publications. Bioinformatics 20(16), 2597–2604 (2004)

    Article  Google Scholar 

  16. Shah, P.K., Perez-Iratxeta, C., Bork, P., Andrade, M.A.: Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics 4, 20 (2003)

    Article  Google Scholar 

  17. Verspoor, K., Bretonnel Cohen, K., Hunter, L.: The textual characteristics of traditional and open access scientific journals are similar. BMC Bioinformatics 10(1), 183 (2009)

    Article  Google Scholar 

  18. White, M.J.: Espacenet, europe’s network of patent databases. Issues in Science & Technology Librarianship 47 (2006)

    Google Scholar 

  19. Wishart, D.S., Knox, C., Guo, A.C., Cheng, D., Shrivastava, S., Tzur, D., Gautam, B., Hassanali, M.: Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36(Database issue), D901–D906 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Müller, B. et al. (2010). Abstracts versus Full Texts and Patents: A Quantitative Analysis of Biomedical Entities. In: Cunningham, H., Hanbury, A., Rüger, S. (eds) Advances in Multidisciplinary Retrieval. IRFC 2010. Lecture Notes in Computer Science, vol 6107. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13084-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13084-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13083-0

  • Online ISBN: 978-3-642-13084-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics