Symbolic Classification Methods for Patient Discharge Summaries Encoding into ICD

  • Laurent Kevers
  • Julia Medori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6233)


This paper addresses the issue of semi-automatic patient discharge summaries encoding into medical classifications such as ICD-9-CM. The methods detailed in this paper focus on symbolic approaches which allow the processing of unannotated corpora without any machine learning. The first method is based on the morphological analysis (MA) of medical terms extracted with hand-crafted linguistic resources. The second one (ELP) relies on the automatic extraction of variants of ICD-9-CM code labels. Each method was evaluated on a set of 19,692 discharge summaries in French from a General Internal Medicine unit. Depending on the number of suggested classes, the MA method resulted in a maximal F-measure of 28.00 and a highest recall of 46.13%. The best F-measure for the second method was 29.43 while the maximal recall was 52.74%. Both methods were then combined. The best recall increased to 60.21% and the maximal F-measure reached 31.64.


Chronic Suppurative Otitis Medium Linguistic Resource Encode Task Symbolic Method Negative Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ananiadou, S., McNaught, J.: Introduction to text mining in biology. In: Text Mining for Biology and Biomedicine, pp. 1–12. Artech House Books (2006)Google Scholar
  2. 2.
    Ceusters, W., Michel, C., Penson, D., Mauclet, E.: Semi-automated encoding of diagnoses and medical procedures combining ICD-9-CM with computational-linguistic tools. Ann. Med. Milit. Belg. 8(2), 53–58 (1994)Google Scholar
  3. 3.
    Zweigenbaum, P., Consortium Menelas: Menelas: Coding and information retrieval from natural language patient discharge summaries. In: Laires, M., Ladeira, M., Christensen, J. (eds.) Advances in Health Telematics, pp. 82–89. IOS Press, Amsterdam (1995)Google Scholar
  4. 4.
    Friedman, C., Shagina, L., Lussier, Y., Hripcsak, G.: Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association 11(5), 392–402 (2004)CrossRefGoogle Scholar
  5. 5.
    Pakhomov, S.V., Buntrock, J.D., Chute, C.G.: Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. JAMIA 13(5), 516–525 (2006)Google Scholar
  6. 6.
    Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing, ACL, Prague, Czech Republic, pp. 97–104 (2007)Google Scholar
  7. 7.
    Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. BMC Bioinformatics 9(Suppl. 3), S10 (2008)Google Scholar
  8. 8.
    Goldstein, I., Arzrumtsyan, A., Uzuner, O.: Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: Proceedings of AMIA Annual Symposium, pp. 279–283 (2007)Google Scholar
  9. 9.
    Pereira, S., Névéol, A., Massari, P., Joubert, M., Darmoni, S.: Construction of a semi-automated ICD-10 coding help system to optimize medical and economic coding. Studies in Health Technology and Informatics 124, 845–850 (2006)Google Scholar
  10. 10.
    Medori, J.: From free text to ICD: development of a coding help. In: Proc. of the 1st Louhi Workshop on Text and Data Mining of Health Documents, Turku (2008)Google Scholar
  11. 11.
    Paumier, S.: De la reconnaissance de formes linguistiques à l’analyse syntaxique. PhD thesis, Université de Marne-la-Vallée (2003)Google Scholar
  12. 12.
    Deléger, L., Namer, F., Zweigenbaum, P.: Morphosemantic parsing of medical compound words: transferring a french analyzer to english. International Journal of Medical Informatics 78(Suppl. 1), S48–S55 (2009)Google Scholar
  13. 13.
    Namer, F.: Automatiser l’analyse morpho-sémantique non affixale: le système DériF. Cahiers de grammaire 28, 31–48 (2003)Google Scholar
  14. 14.
    Kevers, L.: Indexation semi-automatique de textes: thésaurus et transducteurs. In: Actes de la 6e Conférence Francophone en Recherche d’Information et Applications, Presqu’île de Giens, France, pp. 151–167 (May 2009)Google Scholar
  15. 15.
    Kevers, L., Mantrach, A., Fairon, C., Bersini, H., Saerens, M.: Classification supervisée hybride par motifs lexicaux étendus et classificateurs SVM. In: Actes Des 10e Journées Internationales D’analyse Des Données Textuelles, Rome (June 2010)Google Scholar
  16. 16.
    Porter, M.F.: An algorithm for suffix stripping. In: Readings in Information Retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  17. 17.
    Funk, M.E., Reid, C.A., McGoogan, L.S.: Indexing consistency in MEDLINE. Bulletin of the Medical Library Association 71(2), 176–183 (1983)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Laurent Kevers
    • 1
  • Julia Medori
    • 1
  1. 1.CENTAL - Université catholique de Louvain (UCL)Louvain-la-NeuveBelgium

Personalised recommendations