Skip to main content

Supervised Topic Models for Diagnosis Code Assignment to Discharge Summaries

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2016)

Abstract

Mining medical data has significantly gained interest in the recent years thanks to the advances in data mining and machine learning fields. In this work, we focus on a challenging issue in medical data mining: automatic diagnosis code assignment to discharge summaries, i.e., characterizing patient’s hospital stay (diseases, symptoms, treatments, etc.) with a set of codes usually derived from the International Classification of Diseases (ICD). We cast the problem as a machine learning task and we experiment some recent approaches based on the probabilistic topic models. We demonstrate the efficiency of these models in terms of high predictive scores and ease of result interpretation. As such, we show how topic models enable gaining insights into this field and provide new research opportunities for possible improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.who.int/classifications/icd/.

  2. 2.

    https://www.nlm.nih.gov/research/umls/.

  3. 3.

    Source codes from: http://www.cs.cmu.edu/~chongw/slda/ (sLDA) and https://github.com/myleott/JGibbLabeledLDA/ (labeledLDA).

  4. 4.

    http://hopital-saintlouis.aphp.fr/.

References

  1. Blei, D.M., Mcauliffe, J.D.: Supervised topic models. In: Advances in Neural Information Processing Systems (NIPS 2007), Vancouver, Canada, pp. 121–128. Curran Associates, Inc. (2007)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. (JMLR) 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Cerri, R., De Carvalho, A.C.P.L.F., Freitas, A.A.: Adapting non-hierarchical multilabel classification methods for hierarchical multilabel classification. Intell. Data Anal. 15(6), 861–887 (2011)

    Google Scholar 

  4. Farkas, R., Szarvas, G.: Automatic construction of rule-based ICD-9-CM coding systems. In: BMC Bioinformatics, vol. 9(Suppl. 3), p. S10 (2008)

    Google Scholar 

  5. Goldstein, I., Arzrumtsyan, A., Uzuner, O.: Three approaches to automatic assignment of ICD-9-CM codes to radiology reports. In: Proceedings of AMIA Symposium (AMIA 2007), pp. 279–283 (2007)

    Google Scholar 

  6. Jagarlamudi, J., Daumé III, H., Udupa, R.: Incorporating lexical priors into topic models. In: Proceedings of the European Chapter of the ACL (EACL 2012), Avignon, France, pp. 204–213. ACL (2012)

    Google Scholar 

  7. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, USA, pp. 1106–1114. NIPS (2012)

    Google Scholar 

  8. Lin, C., He, Y., Everson, R., Ruger, S.: Weakly supervised joint sentiment-topic detection from text. IEEE Trans. Knowl. Data Eng. (TKDE) 24(6), 1134–1145 (2012)

    Article  Google Scholar 

  9. Lita, L.V., Yu, S., Niculescu, S., Bi, J.: Large scale diagnostic code classification for medical patient records. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 877–882. ACL (2008)

    Google Scholar 

  10. Medori, J., Fairon, C.: Machine learning and features selection for semi-automatic ICD-9-CM encoding. In: Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents (Louhi 2010), Los Angeles, CA, USA, pp. 84–89. ACL (2010)

    Google Scholar 

  11. Metais, E., Nakache, D., Timsit, J.-F.: Automatic classification of medical reports, the CIREA project. In: Proceedings of the 5th WSEAS International Conference on Telecommunications and Informatics (TELE-INFO 2006), Istanbul, Turkey, pp. 354–359. WSEAS (2006)

    Google Scholar 

  12. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., Leisch, F.: e1071: Misc Functions of the Department of Statistics, Probability Theory Group (2015)

    Google Scholar 

  13. Perotte, A., Bartlett, N., Wood, F., Elhadad, N.: Hierarchically supervised latent Dirichlet allocation. In: Advances in Neural Information Processing Systems (NIPS 2011), Granada, Spain, pp. 2609–2617 (2011)

    Google Scholar 

  14. Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., Elhadad, N.: Diagnosis code assignment: models and evaluation metrics. J. Am. Med. Inform. Assoc. (JAMIA) 21(2), 231–237 (2014)

    Article  Google Scholar 

  15. Pestian, J.P., Brew, C., Matykiewicz, P., Hovermale, D.J., Johnson, N., Cohen, K.B., Duch, W.: A shared task involving multi-label classification of clinical free text. In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing (BioNLP 2007), Prague, Czech Republic, pp. 97–104. ACL (2007)

    Google Scholar 

  16. Ramage, D., Hall, D., Nallapati, R., Manning, C.D.: Labeled LDA : a supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), August, Singapore, pp. 248–256. ACL, Singapore (2009)

    Google Scholar 

  17. Ruch, P., Gobeilla, J., Tbahritia, I., Geissbühlera, A.: From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding. In: Proccedings of the AMIA Symposium (AMIA 2008), Washington D.C., USA, pp. 636–640 (2008)

    Google Scholar 

  18. Saeed, M., Villarroel, M., Reisner, A.T., Clifford, G., Lehman, L., Moody, G., Heldt, T., Kyaw, T.H., Moody, B., Mark, R.G.: Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II): a public-access intensive care unit database. Crit. Care Med. 39, 952–960 (2011)

    Article  Google Scholar 

  19. Therneau, T., Atkinson, B., Ripley, B.: rpart: Recursive Partitioning and Regression Trees (2015)

    Google Scholar 

  20. Yi, X., Allan, J.: A comparative study of utilizing topic models for information retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 29–41. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00958-7_6

    Chapter  Google Scholar 

  21. Zhang, Y.: A hierarchical approach to encoding medical concepts for clinical notes. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop (HLT-SRWS 2008), Columbus, OH, USA, pp. 67–72. ACL (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Dermouche .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dermouche, M., Velcin, J., Flicoteaux, R., Chevret, S., Taright, N. (2018). Supervised Topic Models for Diagnosis Code Assignment to Discharge Summaries. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9624. Springer, Cham. https://doi.org/10.1007/978-3-319-75487-1_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75487-1_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75486-4

  • Online ISBN: 978-3-319-75487-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics