Skip to main content

Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 8138)

Abstract

The identification and normalisation of biomedical entities from the scientific literature has a long tradition and a number of challenges have contributed to the development of reliable solutions. Increasingly patient records are processed to align their content with other biomedical data resources, but this approach requires analysing documents in different languages across Europe [1,2].

The CLEF-ER challenge has been organized by the Mantra project partners to improve entity recognition (ER) in multilingual documents. Several corpora in different languages, i.e. Medline titles, EMEA documents and patent claims, have been prepared to enable ER in parallel documents. The participants have been ask to annotate entity mentions with concept unique identifiers (CUIs) in the documents of their preferred non-English language.

The evaluation determines the number of correctly identified entity mentions against a silver standard (Task A) and the performance measures for the identification of CUIs in the non-English corpora. The participants could make use of the prepared terminological resources for entity normalisation and of the English silver standard corpora (SSCs) as input for concept candidates in the non-English documents.

The participants used different approaches including translation techniques and word or phrase alignments apart from lexical lookup and other text mining techniques. The performances for task A and B was lower for the patent corpus in comparison to Medline titles and EMEA documents. In the patent documents, chemical entities were identified at higher performance, whereas the other two document types cover a higher portion of medical terms. The number of novel terms provided from all corpora is currently under investigation.

Altogether, the CLEF-ER challenge demonstrates the performances of annotation solutions in different languages against an SSC.

Keywords

  • Semantic Group
  • Patent Document
  • Entity Recognition
  • Parallel Corpus
  • Annotate Corpus

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-40802-1_32
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-40802-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   83.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Roberts, A., Gaizauskas, R., Hepple, M., Davis, N., Demetriou, G., Guo, Y., Kola, J.S., Roberts, I., Setzer, A., Tapuria, A., et al.: The CLEF corpus: semantic annotation of clinical text. In: AMIA Annual Symposium Proceedings, vol. 2007, p. 625. American Medical Informatics Association (2007)

    Google Scholar 

  2. Lussier, Y.A., Shagina, L., Friedman, C.: Automating icd-9-cm encoding using medical language processing: A feasibility study. In: Proceedings of the AMIA Symposium, p. 1072. American Medical Informatics Association (2000)

    Google Scholar 

  3. Catarci, T., Ferro, N., Forner, P., Hiemstra, D., Karlgren, J., Penas, A., Santucci, G., Womser-Hacker, C.: CLEF 2012: information access evaluation meets multilinguality, multimodality, and visual analytics. ACM SIGIR Forum 46, 29–33 (2012)

    Google Scholar 

  4. Roda, G., Tait, J., Piroi, F., Zenz, V.: CLEF-IP 2009: retrieval experiments in the Intellectual Property domain. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 385–409. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  5. Krallinger, M., Leitner, F., Rodriguez-Penagos, C., Valencia, A.: Overview of the protein-protein interaction annotation extraction task of BioCreative II. Genome Biology 9(suppl. 2), S4 (2008), http://genomebiology.com/2008/9/S2/S4

    CrossRef  Google Scholar 

  6. Morgan, A., Lu, Z., Wang, X., Cohen, A., Fluck, J., Ruch, P., Divoli, A., Fundel, K., Leaman, R., Hakenberg, J., Sun, C., Liu, H.H., Torres, R., Krauthammer, M., Lau, W., Liu, H., Hsu, C.N., Schuemie, M., Cohen, K.B., Hirschman, L.: Overview of BioCreative II gene normalization. Genome Biology 9(suppl. 2), S3 (2008), http://genomebiology.com/2008/9/S2/S3

    CrossRef  Google Scholar 

  7. Cohen, K.B., Demner-Fushman, D., Ananiadou, S., Pestian, J., Tsujii, J., Webber, B. (eds.): Proceedings of the BioNLP 2009 Workshop. Association for Computational Linguistics, Boulder (2009), http://www.aclweb.org/anthology/W09-13

    Google Scholar 

  8. Rebholz-Schuhmann, D., Yepes, A.J., Mulligen, E.M.V., Kang, N., Kors, J., Milward, D., Corbett, P., Buyko, E., Beisswanger, E., Hahn, U.: CALBC silver standard corpus. Journal of Bioinformatics and Computational Biology 8, 163–179 (2010)

    CrossRef  Google Scholar 

  9. Rebholz-Schuhmann, D., Jimeno-Yepes, A., Li, C., Kafkas, S., Lewin, I., Kang, N., Corbett, P., Milward, D., Buyko, E., Beisswanger, E., Hornbostel, K., Kouznetsov, A., Witte, R., Laurila, J., Baker, C., Kuo, C.J., Clematide, S., Rinaldi, F., Farkas, R., Móra, G., Hara, K., Furlong, L., Rautschka, M., Lara Neves, M., Pascual-Montano, A., Wei, Q., Collier, N., Mahbub Chowdhury, M.F., Lavelli, A., Berlanga, R., Morante, R., Van Asch, V., Daelemans, W., Marina, J., van Mulligen, E., Kors, J., Hahn, U.: Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. J. Biomedical Semantics 2(suppl. 5), S11 (2011)

    Google Scholar 

  10. Hersh, W., Voorhees, E.: TREC genomics special issue overview. Inf. Retr. Boston 12, 1–15 (2009)

    CrossRef  Google Scholar 

  11. Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford), 2011:baq036 (2011)

    Google Scholar 

  12. Rebholz-Schuhmann, D., Clematide, S., Rinaldi, F., Kafkas, S., van Mulligen, E.M., Bui, C., Hellrich, J., Lewin, I., Milward, D., Poprat, M., Jimeno-Yepes, A., Hahn, U., Kors, J.A.: Multilingual semantic resources and parallel corpora in the biomedical domain: the CLEF-ER challenge. In: Proceedings CLEF Conference, vol. 2013 (2013)

    Google Scholar 

  13. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004)

    CrossRef  Google Scholar 

  14. Brown, E.G., Wood, L., Wood, S.: The medical dictionary for regulatory activities (MedDRA). Drug Safety 20(2), 109–117 (1999)

    CrossRef  Google Scholar 

  15. Stearns, M.Q., Price, C., Spackman, K.A., Wang, A.Y.: SNOMED clinical terms: overview of the development process and project status. In: Proceedings of the AMIA Symposium, vol. 662, American Medical Informatics Association (2001)

    Google Scholar 

  16. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L.J., Eilbeck, K., Ireland, A., Mungall, C.J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S.A., Scheuermann, R.H., Shah, N., Whetzel, P.L., Lewis, S.: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25, 1251–1255 (2007)

    CrossRef  Google Scholar 

  17. Lewin, I., Kafkas, S., Rebholz-Schuhmann, D.: Centroids: Gold standards with distributional variations. In: Proceedings of the Language Resources Evaluation Conference, Istanbul, Turkey (2012)

    Google Scholar 

  18. Lewin, I., Clematide, S.: Deriving the Mantra Silver Standard. In: Proceedings CLEF Conference, vol.  2013 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rebholz-Schuhmann, D. et al. (2013). Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds) Information Access Evaluation. Multilinguality, Multimodality, and Visualization. CLEF 2013. Lecture Notes in Computer Science, vol 8138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40802-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40802-1_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40801-4

  • Online ISBN: 978-3-642-40802-1

  • eBook Packages: Computer ScienceComputer Science (R0)