Skip to main content

Annotating Medical Forms Using UMLS

Part of the Lecture Notes in Computer Science book series (LNBI,volume 9162)


Medical forms are frequently used to document patient data or to collect relevant data for clinical trials. It is crucial to harmonize medical forms in order to improve interoperability and data integration between medical applications. Here we propose a (semi-) automatic annotation of medical forms with concepts of the Unified Medical Language System (UMLS). Our annotation workflow encompasses a novel semantic blocking, sophisticated match techniques and post-processing steps to select reasonable annotations. We evaluate our methods based on reference mappings between medical forms and UMLS, and further manually validate the recommended annotations.


  • Semantic annotation
  • Medical forms
  • Clinical trials
  • UMLS

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-21843-4_5
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   44.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-21843-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   59.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.


  1. 1.

  2. 2.

  3. 3.


  1. Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. J. Am. Med. Inform. Assoc. 17(3), 229–236 (2010)

    CrossRef  Google Scholar 

  2. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl 1), D267–D270 (2004)

    CrossRef  Google Scholar 

  3. Bramesfeld, A., Willms, G.: Cross-Sectoral Quality Assurance. §137a Social Code Book V. Public Health Forum, pp. 14.e1–14.e3 (2014)

    Google Scholar 

  4. Breil, B., Kenneweg, J., Fritz, F., et al.: Multilingual medical data models in ODM format-a novel form-based approach to semantic interoperability between routine health-care and clinical research. Appl. Clin. Inf. 3, 276–289 (2012)

    CrossRef  MATH  Google Scholar 

  5. Donnelly, K.: SNOMED-CT: The advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. Med. Care Compunetics 3(121), 279–290 (2006)

    Google Scholar 

  6. Dugas, M.: Missing semantic annotation in databases. The root cause for data integration and migration problems in information systems. Methods Inf. Med. 53(6), 516–517 (2014)

    CrossRef  Google Scholar 

  7. Dugas, M., Fritz, F., Krumm, R., Breil, B.: Automated UMLS-based comparison of medical forms. PloS one 8(7) (2013). doi:10.1371/journal.pone.0067883

  8. Euzenat, J., Shvaiko, P.: Ontology Matching, vol. 18. Springer, Heidelberg (2007)

    MATH  Google Scholar 

  9. Hao, T., Rusanov, A., Boland, M.R., et al.: Clustering clinical trials with similar eligibility criteria features. J. Biomed. Inform. 52, 112–120 (2014)

    CrossRef  MATH  Google Scholar 

  10. Huntley, R.P., Sawford, T., Mutowo-Meullenet, P., et al.: The GOA database: gene Ontology annotation updates for 2015. Nucleic Acids Res. 43(D1), D1057–D1063 (2015)

    CrossRef  MATH  Google Scholar 

  11. Kirsten, T., Gross, A., Hartung, M., Rahm, E.: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomed. Semant. 2(6), 1–24 (2011)

    Google Scholar 

  12. Lingren, T., Deleger, L., Molnar, K., et al.: Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. J. Am. Med. Inform. Assoc. 21(3), 406–413 (2014)

    CrossRef  MATH  Google Scholar 

  13. Lowe, H.J., Barnett, G.O.: Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. J. Am. Med. Assoc. (JAMA) 271(14), 1103–1108 (1994)

    CrossRef  MATH  Google Scholar 

  14. Luo, Z., Duffy, R., Johnson, S., Weng, C.: Corpus-based approach to creating a semantic lexicon for clinical research eligibility criteria from umls. AMIA Summits Transl. Sci. Proc. 2010, 26–30 (2010)

    Google Scholar 

  15. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, vol. 1. Cambridge University Press, Cambridge (2008)

    CrossRef  MATH  Google Scholar 

  16. Ogren, P., Savova, G., Chute, C.: Constructing evaluation corpora for automated clinical named entity recognition. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC), pp. 3143–3150 (2008)

    Google Scholar 

  17. Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications, pp. 3–27. Springer, Berlin (2011)

    CrossRef  Google Scholar 

  18. Ren, K., Lai, A.M., Mukhopadhyay, A., et al.: Effectively processing medical term queries on the UMLS Metathesaurus by layered dynamic programming. BMC Med. Genomics 7(Suppl 1), 1–12 (2014)

    CrossRef  Google Scholar 

  19. Roberts, A., Gaizauskas, R., Hepple, M., et al.: Building a semantically annotated corpus of clinical texts. J. Biomed. Inform. 42(5), 950–966 (2009)

    CrossRef  Google Scholar 

  20. Varghese, J., Dugas, M.: Frequency analysis of medical concepts in clinical trials and their coverage in MeSH and SNOMED-CT. Methods Inf. Med. 53(6), 83–92 (2014)

    CrossRef  MATH  Google Scholar 

Download references


This work is funded by the German Research Foundation (DFG) (grant RA 497/22-1, “ELISA - Evolution of Semantic Annotations”).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Victor Christen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Christen, V., Groß, A., Varghese, J., Dugas, M., Rahm, E. (2015). Annotating Medical Forms Using UMLS. In: Ashish, N., Ambite, JL. (eds) Data Integration in the Life Sciences. DILS 2015. Lecture Notes in Computer Science(), vol 9162. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21842-7

  • Online ISBN: 978-3-319-21843-4

  • eBook Packages: Computer ScienceComputer Science (R0)