Diagnosis Code Assignment Support Using Random Indexing of Patient Records – A Qualitative Feasibility Study

  • Aron Henriksson
  • Martin Hassel
  • Maria Kvist
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6747)


The prediction of diagnosis codes is typically based on free-text entries in clinical documents. Previous attempts to tackle this problem range from strictly rule-based systems to utilizing various classification algorithms, resulting in varying degrees of success. A novel approach is to build a word space model based on a corpus of coded patient records, associating co-occurrences of words and ICD-10 codes. Random Indexing is a computationally efficient implementation of the word space model and may prove an effective means of providing support for the assignment of diagnosis codes. The method is here qualitatively evaluated for its feasibility by a physician on clinical records from two Swedish clinics. The assigned codes were in this initial experiment found among the top 10 generated suggestions in 20% of the cases, but a partial match in 77% demonstrates the potential of the method.


ICD-10 Assignment Random Indexing Electronic Patient Records Qualitative Evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    World Health Organization: International Classification of Diseases (ICD) (Internet). WHO, Geneva (2010), (accessed February 2010)
  2. 2.
    Stanfill, M.H., Williams, M., Fenton, S., Jenders, R.A., Hersh, W.R.: A systematic literature review of automated clinical coding and classification systems. J. Am. Med. Inform. Assoc. 17, 646–651 (2010)CrossRefGoogle Scholar
  3. 3.
    Larkey, L.S., Croft, W.B.: Automatic Assignment of ICD9 Codes to Discharge Summaries. In: PhD thesis University of Massachusetts at Amherst, Amherst, MA (1995)Google Scholar
  4. 4.
    Landauer, T.K., Foltz, W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  5. 5.
    Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. In: PhD thesis Stockholm University, Stockholm, Sweden (2006)Google Scholar
  6. 6.
    Sahlgren, M.: Vector-Based Semantic Analysis: Representing Word Meanings Based on Random Labels. In: Proceedings of Semantic Knowledge Acquisition and Categorization Workshop at ESSLLI 2001 (2001)Google Scholar
  7. 7.
    Dalianis, H., Hassel, M., Velupillai, S.: The Stockholm EPR Corpus: Characteristics and Some Initial Findings. In: Proceedings of ISHIMR 2009, pp. 243–249 (2009)Google Scholar
  8. 8.
    Knutsson, O., Bigert, J., Kann, V.: A Robust Shallow Parser for Swedish. In: Proceedings of Nodalida (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Aron Henriksson
    • 1
  • Martin Hassel
    • 1
  • Maria Kvist
    • 1
    • 2
  1. 1.Department of Computer and System Sciences (DSV)Stockholm UniversityKistaSweden
  2. 2.Department of Clinical Immunology and Transfusion MedicineKarolinska University HospitalStockholmSweden

Personalised recommendations