Automatic Semantic Labeling of Medical Texts with Feature Structures

  • Agnieszka Mykowiecka
  • Małgorzata Marciniak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6836)


This paper presents the results of testing two approaches in the automatic semantic labeling of medical data. For a chosen domain (diabetic patients’ discharge records) a set of domain related concepts was identified. The annotated resource is the result of a rule based application, that relies on the results of two related rule based information extraction (IE) systems, post processed in a way that makes the label structures simpler, and the boundaries of annotations more precise. The second application is a machine learning (CRF) approach in which the results of the first application are used as training data. Both applications were evaluated by comparing to manually corrected documents.


Discharge Record Medical Text Semantic Label Annotate Corpus Label Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cohen, K.B., Fox, L., Ogren, P.V., Hunter, L.: Corpus design for biomedical natural language processing. In: ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, Detroit, pp. 38–45 (2005)Google Scholar
  2. 2.
    Hahn, S., Lehnen, P., Ney, H.: System combination for spoken language understanding. In: INTERSPEECH 2009. ISCA, Brisbane (2008)Google Scholar
  3. 3.
    Karkaletis, V., et al.: Automating accreditation of medical web content. In: Proceeding of the 18th European Conference on Artificial Intelligence (2008)Google Scholar
  4. 4.
    Kokkinakis, D.: A Semantically Annotated Swedish Medical Corpus. In: Proceedings of the LREC Conference, pp. 32–38 (2008)Google Scholar
  5. 5.
    Lehnen, P., Hahn, S., Ney, H., Mykowiecka, A.: Large scale Polish SLU. In: INTERSPEECH 2009. ISCA, Brighton (2009)Google Scholar
  6. 6.
    Mykowiecka, A., Marciniak, M.: Domain model for medical information extraction – the LightMedOnt ontology. In: Marciniak, M., Mykowiecka, A. (eds.) Bolc Festschrift. LNCS, vol. 5070, pp. 333–357. Springer, Heidelberg (2009)Google Scholar
  7. 7.
    Mykowiecka, A., Marciniak, M.: Some remarks on automatic semantic annotation of a medical corpus. In: Proc. of Third Louhi Workshop on Health Documentation Text Mining and Information Analysis at AIME (2011)Google Scholar
  8. 8.
    Mykowiecka, A., Marciniak, M., Kupść, A.: Rule-based information extraction from patient’s clinical data. Journal of Biomedical Informatics 42, 923–936 (2009)CrossRefGoogle Scholar
  9. 9.
    Mykowiecka, A., Waszczuk, J.: Semantic annotation of city transportation information dialogues using CRF method. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS (LNAI), vol. 5729, pp. 411–418. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Roberts, A., et al.: Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics 42(5), 950–966 (2009)CrossRefGoogle Scholar
  11. 11.
    Sutton, C., McCallum, A.: An introduction to conditional random fields for relational learning. In: Getoor, L., Taskar, B. (eds.) Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Agnieszka Mykowiecka
    • 1
  • Małgorzata Marciniak
    • 1
  1. 1.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations