Semantic Annotation of Medical Documents in CDA Context

  • Diego MontiEmail author
  • Maurizio Morisio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9832)


The goal of this work is to recover semantic and structural information from medical documents in electronic format.

Despite the progressive diffusion of Electronic Health Record systems, a lot of medical information, also for legacy reasons, is available to patients and physicians in image-only or textual format. The difficulties of obtaining such information when needed result in high costs for health providers.

In this work we develop the concept of a system designed to convert legacy medical documents into a standard and interoperable format compliant with the Clinical Document Architecture model by the means of semantic annotation.


Semantic Annotation Name Entity Recognition Electronic Health Record System Medical Document Clinical Document Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Aronson, A.R.: Metamap: mapping text to the UMLS metathesaurus. NLM, NIH, DHHS, Bethesda, pp. 1–26 (2006)Google Scholar
  2. 2.
    Bloomberg, D.S., Chen, F.R.: Document image summarization without OCR. In: International Conference on Image Processing, vol. 2, pp. 229–232 (1996)Google Scholar
  3. 3.
    Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(1), D267–D270 (2004)CrossRefGoogle Scholar
  4. 4.
    Burton, R., Coleman, E., Lipson, D.J., Agres, T., Schwartz, A., Dentzer, S.: Health policy brief: care transitions. Health Affairs (2012).
  5. 5.
    Carpenter, B., Baldwin, B.: Text Analysis with LingPipe 4. LingPipe Inc., New York (2011)Google Scholar
  6. 6.
    Cavnar, W.B., Trenkle, J.M., et al.: N-gram-based text categorization. Ann Arbor MI 48113(2), 161–175 (1994)Google Scholar
  7. 7.
    Chao, H., Fan, J.: Layout and content extraction for PDF documents. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 213–224. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Dolin, R.H., Alschuler, L., Boyer, S., Beebe, C., Behlen, F.M., Biron, P.V., Shabo, A.: HL7 clinical document architecture, release 2. J. Am. Med. Inform. Assoc. 13(1), 30–39 (2006)CrossRefGoogle Scholar
  9. 9.
    Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: state-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014)Google Scholar
  10. 10.
    International, H.L.S: HL7 implementation guide for CDA release 2: consolidated CDA templates for clinical notes (2014)Google Scholar
  11. 11.
    Kripalani, S., LeFevre, F., Phillips, C.O., Williams, M.V., Basaviah, P., Baker, D.W.: Deficits in communication and information transfer between hospital-based and primary care physicians: Implications for patient safety and continuity of care. JAMA 297(8), 831–841 (2007)CrossRefGoogle Scholar
  12. 12.
    Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Electronic Imaging 2003, pp. 197–207. International Society for Optics and Photonics (2003)Google Scholar
  13. 13.
    Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from textual documents in the electronic health record: a review of recent research. IMIA Yearb. Med. Inform. 35, 128–144 (2008)Google Scholar
  14. 14.
    Paskin, N.: Digital object identifier (DOI) system. Encycl. Libr. Inf. Sci. 3, 1586–1592 (2008)Google Scholar
  15. 15.
    Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning. O’Reilly Media, Sebastopol (2012)Google Scholar
  16. 16.
    Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J. Am. Med. Inf. Assoc. 17(5), 507–513 (2010)CrossRefGoogle Scholar
  17. 17.
    Trotter, F., Uhlman, D.: Hacking Healthcare: A Guide to Standards, Workflows and Meaningful Use, pp. 172–182. O’Reilly Media, Sebastopol (2011). Chap. 11Google Scholar
  18. 18.
    Trotter, F., Uhlman, D.: Hacking Healthcare: A Guide to Standards, Workflows and Meaningful Use, pp. 144–159. O’Reilly Media, Sebastopol (2011). Chap. 10Google Scholar
  19. 19.
    Xu, H., Stenner, S.P., Doan, S., Johnson, K.B., Waitman, L.R., Denny, J.C.: MedEx: a medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17(1), 19–24 (2010)CrossRefGoogle Scholar
  20. 20.
    Yimam, S.M., Biemann, C., Majnaric, L., Šabanović, Š., Holzinger, A.: An adaptive annotation approach for biomedical entity and relation recognition. Brain Inform. 3, 1–12 (2016)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Dipartimento di Automatica e InformaticaPolitecnico di TorinoTurinItaly

Personalised recommendations