Annotating Medical Forms Using UMLS

  • Victor Christen
  • Anika Groß
  • Julian Varghese
  • Martin Dugas
  • Erhard Rahm
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9162)


Medical forms are frequently used to document patient data or to collect relevant data for clinical trials. It is crucial to harmonize medical forms in order to improve interoperability and data integration between medical applications. Here we propose a (semi-) automatic annotation of medical forms with concepts of the Unified Medical Language System (UMLS). Our annotation workflow encompasses a novel semantic blocking, sophisticated match techniques and post-processing steps to select reasonable annotations. We evaluate our methods based on reference mappings between medical forms and UMLS, and further manually validate the recommended annotations.


Semantic annotation Medical forms Clinical trials UMLS 



This work is funded by the German Research Foundation (DFG) (grant RA 497/22-1, “ELISA - Evolution of Semantic Annotations”).


  1. 1.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)CrossRefGoogle Scholar
  2. 2.
    Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl 1), D267–D270 (2004)CrossRefGoogle Scholar
  3. 3.
    Bramesfeld, A., Willms, G.: Cross-Sectoral Quality Assurance. §137a Social Code Book V. Public Health Forum, pp. 14.e1–14.e3 (2014)Google Scholar
  4. 4.
    Breil, B., Kenneweg, J., Fritz, F., et al.: Multilingual medical data models in ODM format-a novel form-based approach to semantic interoperability between routine health-care and clinical research. Appl. Clin. Inf. 3, 276–289 (2012)CrossRefzbMATHGoogle Scholar
  5. 5.
    Donnelly, K.: SNOMED-CT: The advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. Med. Care Compunetics 3(121), 279–290 (2006)Google Scholar
  6. 6.
    Dugas, M.: Missing semantic annotation in databases. The root cause for data integration and migration problems in information systems. Methods Inf. Med. 53(6), 516–517 (2014)CrossRefGoogle Scholar
  7. 7.
    Dugas, M., Fritz, F., Krumm, R., Breil, B.: Automated UMLS-based comparison of medical forms. PloS one 8(7) (2013). doi: 10.1371/journal.pone.0067883
  8. 8.
    Euzenat, J., Shvaiko, P.: Ontology Matching, vol. 18. Springer, Heidelberg (2007)zbMATHGoogle Scholar
  9. 9.
    Hao, T., Rusanov, A., Boland, M.R., et al.: Clustering clinical trials with similar eligibility criteria features. J. Biomed. Inform. 52, 112–120 (2014)CrossRefzbMATHGoogle Scholar
  10. 10.
    Huntley, R.P., Sawford, T., Mutowo-Meullenet, P., et al.: The GOA database: gene Ontology annotation updates for 2015. Nucleic Acids Res. 43(D1), D1057–D1063 (2015)CrossRefzbMATHGoogle Scholar
  11. 11.
    Kirsten, T., Gross, A., Hartung, M., Rahm, E.: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomed. Semant. 2(6), 1–24 (2011)Google Scholar
  12. 12.
    Lingren, T., Deleger, L., Molnar, K., et al.: Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. J. Am. Med. Inform. Assoc. 21(3), 406–413 (2014)CrossRefzbMATHGoogle Scholar
  13. 13.
    Lowe, H.J., Barnett, G.O.: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. J. Am. Med. Assoc. (JAMA) 271(14), 1103–1108 (1994)CrossRefzbMATHGoogle Scholar
  14. 14.
    Luo, Z., Duffy, R., Johnson, S., Weng, C.: Corpus-based approach to creating a semantic lexicon for clinical research eligibility criteria from umls. AMIA Summits Transl. Sci. Proc. 2010, 26–30 (2010)Google Scholar
  15. 15.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008) CrossRefzbMATHGoogle Scholar
  16. 16.
    Ogren, P., Savova, G., Chute, C.: Constructing evaluation corpora for automated clinical named entity recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), pp. 3143–3150 (2008)Google Scholar
  17. 17.
    Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications, pp. 3–27. Springer, Berlin (2011)CrossRefGoogle Scholar
  18. 18.
    Ren, K., Lai, A.M., Mukhopadhyay, A., et al.: Effectively processing medical term queries on the UMLS Metathesaurus by layered dynamic programming. BMC Med. Genomics 7(Suppl 1), 1–12 (2014)CrossRefGoogle Scholar
  19. 19.
    Roberts, A., Gaizauskas, R., Hepple, M., et al.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inform. 42(5), 950–966 (2009)CrossRefGoogle Scholar
  20. 20.
    Varghese, J., Dugas, M.: Frequency analysis of medical concepts in clinical trials and their coverage in MeSH and SNOMED-CT. Methods Inf. Med. 53(6), 83–92 (2014)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Victor Christen
    • 1
  • Anika Groß
    • 1
  • Julian Varghese
    • 2
  • Martin Dugas
    • 2
  • Erhard Rahm
    • 1
  1. 1.Department of Computer ScienceUniversität LeipzigLeipzigGermany
  2. 2.Institute of Medical InformaticsUniversität MünsterMünsterGermany

Personalised recommendations