Overview of the ShARe/CLEF eHealth Evaluation Lab 2013

  • Hanna Suominen
  • Sanna Salanterä
  • Sumithra Velupillai
  • Wendy W. Chapman
  • Guergana Savova
  • Noemie Elhadad
  • Sameer Pradhan
  • Brett R. South
  • Danielle L. Mowery
  • Gareth J. F. Jones
  • Johannes Leveling
  • Liadh Kelly
  • Lorraine Goeuriot
  • David Martinez
  • Guido Zuccon
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8138)


Discharge summaries and other free-text reports in healthcare transfer information between working shifts and geographic locations. Patients are likely to have difficulties in understanding their content, because of their medical jargon, non-standard abbreviations, and ward-specific idioms. This paper reports on an evaluation lab with an aim to support the continuum of care by developing methods and resources that make clinical reports in English easier to understand for patients, and which helps them in finding information related to their condition. This ShARe/CLEFeHealth2013 lab offered student mentoring and shared tasks: identification and normalisation of disorders (1a and 1b) and normalisation of abbreviations and acronyms (2) in clinical reports with respect to terminology standards in healthcare as well as information retrieval (3) to address questions patients may have when reading clinical reports. The focus on patients’ information needs as opposed to the specialised information needs of physicians and other healthcare workers was the main feature of the lab distinguishing it from previous shared tasks. De-identified clinical reports for the three tasks were from US intensive care and originated from the MIMIC II database. Other text documents for Task 3 were from the Internet and originated from the Khresmoi project. Task 1 annotations originated from the ShARe annotations. For Tasks 2 and 3, new annotations, queries, and relevance assessments were created. 64, 56, and 55 people registered their interest in Tasks 1, 2, and 3, respectively. 34 unique teams (3 members per team on average) participated with 22, 17, 5, and 9 teams in Tasks 1a, 1b, 2 and 3, respectively. The teams were from Australia, China, France, India, Ireland, Republic of Korea, Spain, UK, and USA. Some teams developed and used additional annotations, but this strategy contributed to the system performance only in Task 2. The best systems had the F1 score of 0.75 in Task 1a; Accuracies of 0.59 and 0.72 in Tasks 1b and 2; and Precision at 10 of 0.52 in Task 3. The results demonstrate the substantial community interest and capabilities of these systems in making clinical reports easier to understand for patients. The organisers have made data and tools available for future research and development.


Information Retrieval Evaluation Medical Informatics Test-set Generation Text Classification Text Segmentation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allvin, H., Carlsson, E., Dalianis, H., Danielsson-Ojala, R., Daudaravicius, V., Hassel, M., Kokkinakis, D., Lundgren-Laine, H., Nilsson, G., Nytro, O., Salanterä, S., Skeppstedt, M., Suominen, H., Velupillai, S.: Characteristics of Finnish and Swedish intensive care nursing narratives: A comparative analysis to support the development of clinical language technologies. Journal of Biomedical Semantics 2(suppl. 3), S1 (2011)CrossRefGoogle Scholar
  2. 2.
    Suominen, H. (ed.): The Proceedings of the CLEFeHealth2012 — the CLEF 2012 Workshop on Cross-Language Evaluation of Methods, Applications, and Resources for eHealth Document Analysis. NICTA (2012)Google Scholar
  3. 3.
    Fox, S.: Health Topics: 80% of internet users look for health information online. Technical report, Pew Research Center (February 2011)Google Scholar
  4. 4.
    Kummervold, P., Chronaki, C., Lausen, B., Prokosch, H., Rasmussen, J., Santana, S., Staniszewski, A., Wangberg, S.: eHealth trends in Europe 2005–2007: A population-based survey. Journal of Medical Internet Research 10(4), e42 (2008)CrossRefGoogle Scholar
  5. 5.
    Experian Hitwise: Google Receives 87.81 Percent of Australian Searches in June 2008 (2008),
  6. 6.
    Pradhan, S., Elhadad, N., South, B., Martinez, D., Christensen, L., Vogel, A., Suominen, H., Chapman, W., Savova, G.: Task 1: ShARe/CLEF eHealth Evaluation Lab 2013. In: Online Working Notes of CLEF, CLEF (2013)Google Scholar
  7. 7.
    Mowery, D., South, B., Christensen, L., Murtola, L., Salanterä, S., Suominen, H., Martinez, D., Elhadad, N., Pradhan, S., Savova, G., Chapman, W.: Task 2: ShARe/CLEF eHealth Evaluation Lab 2013. In: Online Working Notes of CLEF, CLEF (2013)Google Scholar
  8. 8.
    Goeuriot, L., Jones, G., Kelly, L., Leveling, J., Hanbury, A., Müller, H., Salanterä, S., Suominen, H., Zuccon, G.: ShARe/CLEF eHealth Evaluation Lab 2013, Task 3: Information retrieval to address patients’ questions when reading clinical reports. In: Online Working Notes of CLEF, CLEF (2013)Google Scholar
  9. 9.
    Becker, H.: Computerization of patho-histological findings in natural language. Pathologia Europaea 7(2), 193–200 (1972)Google Scholar
  10. 10.
    Anderson, B., Bross, I., Sager, N.: Grammatical compression in notes and records: Analysis and computation. American Journal of Computational Linguistics 2(4), 68–82 (1975)Google Scholar
  11. 11.
    Hirschman, L., Grishman, R., Sager, N.: From text to structured information: automatic processing of medical reports. In: American Federation of Information Processing Societies: 1976 National Computer Conference. AFIPS Conference Proceedings, vol. 45, pp. 267–275. Association for Computational Linguistics, New York (1976)Google Scholar
  12. 12.
    Collen, M.: Patient data acquisition. Medical Instrumentation 12, 222–225 (1978)Google Scholar
  13. 13.
    Sarkar, I.: Biomedical informatics and translational medicine. Journal of Translational Medicine 8, 22 (2010) (review) CrossRefGoogle Scholar
  14. 14.
    Demner-Fushman, D., Chapman, W., McDonald, C.: What can natural language processing do for clinical decision support? Journal of Biomedical Informatics 42(5), 760–772 (2009) (review)CrossRefGoogle Scholar
  15. 15.
    Meystre, S., Savova, G., Kipper-Schuler, K., Hurdle, J.: Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook of Medical Informatics, 128–144 (2008) (review)Google Scholar
  16. 16.
    Reiner, B., Knight, N., Siegel, E.: Radiology reporting, past, present, and future: the radiologist’s perspective. Journal of the American College of Radiology: JACR 4(5), 313–319 (2007) (review) CrossRefGoogle Scholar
  17. 17.
    Suominen, H., Lehtikunnas, T., Back, B., Karsten, H., Salakoski, T., Salanterä, S.: Applying language technology to nursing documents: pros and cons with a focus on ethics. International Journal of Medical Informatics 76(suppl. 2), S293–S301 (2007) (review) Google Scholar
  18. 18.
    Zweigenbaum, P., Demner-Fushman, D., Yu, H., Cohen, K.: Frontiers of biomedical text mining: current progress. Briefings in Bioinformatics 8(5), 358–375 (2007) (review) CrossRefGoogle Scholar
  19. 19.
    Mendonça, E., Haas, J., Shagina, L., Larson, E., Friedman, C.: Extracting information on pneumonia in infants using natural language processing of radiology reports. Journal of Biomedical Informatics 38(4), 314–321 (2005)CrossRefGoogle Scholar
  20. 20.
    Pakhomov, S., Buntrock, J., Chute, C.: Automating the assignment of diagnosis codes to patient encounters using example based and machine learning techniques. Journal of the American Medical Informatics Association: JAMIA 13(5), 516–525 (2006)CrossRefGoogle Scholar
  21. 21.
    Chapman, W., Nadkarni, P., Hirschman, L., D’Avolio, L., Savova, G., Uzuner, Ö.: Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association: JAMIA 18, 540–543 (2011) (editorial)CrossRefGoogle Scholar
  22. 22.
    Robertson, S., Hull, D.: The TREC-9 filtering track final report. In: NIST Special Publication 500-249: The 9th Text REtrieval Conference (TREC 9), pp. 25–40 (2000)Google Scholar
  23. 23.
    Roberts, P.M., Cohen, A.M., Hersh, W.R.: Tasks, topics and relevance judging for the TREC genomics track: five years of experience evaluating biomedical text information retrieval systems. Information Retrieval 12, 81–97 (2009)CrossRefGoogle Scholar
  24. 24.
    Voorhees, E.M., Tong, R.M.: Overview of the TREC 2011 medical records track. In: Proceedings of TREC, NIST (2011)Google Scholar
  25. 25.
    Kalpathy-Cramer, J., Müller, H., Bedrick, S., Eggel, I., de Herrera, A., Tsikrika, T.: The CLEF 2011 medical image retrieval and classification tasks. In: Working Notes of CLEF 2011 (Cross Language Evaluation Forum) (2011)Google Scholar
  26. 26.
    Müller, H., Clough, P., Deselaers, T., Caputo, B. (eds.): Experimental Evaluation in Visual Information Retrieval. The Information Retrieval Series, vol. 32. Springer (2010)Google Scholar
  27. 27.
    Uzuner, Ö., South, B., Shen, S., DuVall, S.: 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association: JAMIA 18, 552–556 (2011)CrossRefGoogle Scholar
  28. 28.
    Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Cohen, K., Duch, W.: A shared task involving multi-label classification of clinical free text. In: BioNLP Workshop of the Association for Computational Linguistics, pp. 97–104. Association for Computational Linguistics (2007)Google Scholar
  29. 29.
    Pestian, J., Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, Ö., Wiebe, J., Cohen, K., Hurdle, J., Brew, C.: Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights 5(suppl. 1), 3–16 (2012)CrossRefGoogle Scholar
  30. 30.
    Boyer, C., Gschwandtner, M., Hanbury, A., Kritz, M., Pletneva, N., Samwald, M., Vargas, A.: Use case definition including concrete data requirements (D8.2). public deliverable, Khresmoi EU project (2012)Google Scholar
  31. 31.
    Hanbury, A., Müller, H.: Khresmoi – multimodal multilingual medical information search. In: MIE Village of the Future (2012)Google Scholar
  32. 32.
    Bodenreider, O., McCray, A.: Exploring semantic groups through visual approaches. Journal of Biomedical Informatics 36, 414–432 (2003)CrossRefGoogle Scholar
  33. 33.
    South, B.R., Shen, S., Leng, J., Forbush, T.B., DuVall, S.L., Chapman, W.W.: A prototype tool set to support machine-assisted annotation. In: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing, BioNLP 2012, pp. 130–139. Association for Computational Linguistics, Stroudsburg (2012)Google Scholar
  34. 34.
    Goeuriot, L., Kelly, L., Jones, G., Zuccon, G., Suominen, H., Hanbury, A., Müller, H., Leveling, J.: Creation of a New Evaluation Benchmark for Information Retrieval Targeting Patient Information Needs. In: Song, R., Webber, W., Kando, N., Kishida, K. (eds.) Proceedings of the 5th International Workshop on Evaluating Information Access (EVIA), A Satellite Workshop of the NTCIR-10 Conference. National Institute of Informatics/Kijima Printing, Tokyo/Fukuoka (2013)Google Scholar
  35. 35.
    Koopman, B., Zuccon, G.: Relevation! an open source system for information retrieval relevance assessment. arXiv preprint (2013)Google Scholar
  36. 36.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  37. 37.
    Robertson, S.E., Jones, S.: Simple, proven approaches to text retrieval. Technical Report 356, University of Cambridge (1994)Google Scholar
  38. 38.
    Yeh, A.: More accurate tests for the statistical significance of result differences. In: Proceedings of the 18th Conference on Computational Linguistics (COLING), Saarbrücken, Germany, pp. 947–953 (2000)Google Scholar
  39. 39.
    Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM 2007), pp. 623–632. Association for Computing Machinery, New York (2007)Google Scholar
  40. 40.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hanna Suominen
    • 1
  • Sanna Salanterä
    • 2
  • Sumithra Velupillai
    • 3
  • Wendy W. Chapman
    • 4
  • Guergana Savova
    • 5
  • Noemie Elhadad
    • 6
  • Sameer Pradhan
    • 5
  • Brett R. South
    • 7
  • Danielle L. Mowery
    • 8
  • Gareth J. F. Jones
    • 9
  • Johannes Leveling
    • 9
  • Liadh Kelly
    • 9
  • Lorraine Goeuriot
    • 9
  • David Martinez
    • 10
  • Guido Zuccon
    • 11
  1. 1.NICTA and The Australian National UniversityAustralia
  2. 2.University of TurkuFinland
  3. 3.DSV Stockholm UniversitySweden
  4. 4.University of CaliforniaSan DiegoUSA
  5. 5.Harvard UniversityUSA
  6. 6.Columbia UniversityUSA
  7. 7.University of UtahUSA
  8. 8.University of PittsburghUSA
  9. 9.Dublin City UniversityIreland
  10. 10.NICTA and The University of MelbourneAustralia
  11. 11.The Australian e-Health Research CentreCSIROAustralia

Personalised recommendations