Interpretable Segmentation of Medical Free-Text Records Based on Word Embeddings

Dobrakowski, Adam Gabriel; Mykowiecka, Agnieszka; Marciniak, Małgorzata; Jaworski, Wojciech; Biecek, Przemysław

doi:10.1007/978-3-030-59491-6_5

Adam Gabriel Dobrakowski¹³,
Agnieszka Mykowiecka¹⁴,
Małgorzata Marciniak¹⁴,
Wojciech Jaworski¹³ &
…
Przemysław Biecek¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12117))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

1074 Accesses
2 Citations
9 Altmetric

Abstract

Is it true that patients with similar conditions get similar diagnoses? In this paper we present a natural language processing (NLP) method that can be used to validate this claim. We (1) introduce a method for representation of medical visits based on free-text descriptions recorded by doctors, (2) introduce a new method for segmentation of patients’ visits, (3) present an application of the proposed method on a corpus of 100,000 medical visits and (4) show tools for interpretation and exploration of derived knowledge representation. With the proposed method we obtained stable and separated segments of visits which were positively validated against medical diagnoses. We show how the presented algorithm may be used to aid doctors in their practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apostolova, E., Channin, D.S., Demner-Fushman, D., Furst, J., Lytinen, S., Raicu, D.: Automatic segmentation of clinical texts. In: Proceedings of EMBC, pp. 5905–5908 (2009)
Google Scholar
Banea, C., Chen, D., Mihalcea, R., Cardie, C., Wiebe, J.: Simcompass: using deep learning word embeddings to assess cross-level similarity. In: Proceedings of SemEval, pp. 560–565 (2014)
Google Scholar
Biecek, P.: DALEX: explainers for complex predictive models in R. J. Mach. Learn. Res. 19(84), 1–5 (2018)
MATH Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(suppl-1), D267–D270 (2004)
Article Google Scholar
Chiu, B., Crichton, G., Korhonen, A., Pyysalo, S.: How to train good word embeddings for biomedical NLP. In: Proceedings of BioNLP, pp. 166–174 (2016)
Google Scholar
Choi, E., et al.: Multi-layer representation learning for medical concepts. In: SIGKDD Proceedings, pp. 1495–1504. ACM (2016)
Google Scholar
Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686 (2016)
Choi, Y., Chiu, C.Y.I., Sontag, D.: Learning low-dimensional representations of medical concepts. AMIA Summits Transl. Sci. 2016, 41 (2016)
Google Scholar
De Boom, C., Van Canneyt, S., Demeester, T., Dhoedt, B.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)
Article Google Scholar
De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., Bruza, P.: Medical semantic similarity with a neural language model. In: Proceedings of CIKM, pp. 1819–1822. ACM (2014)
Google Scholar
Fetter, R.B., Shin, Y., Freeman, J.L., Averill, R.F., Thompson, J.D.: Case mix definition by diagnosis-related groups. Med. Care 18(2), i-53 (1980)
Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3, 115–130 (2000)
Article Google Scholar
Ganesan, K., Subotin, M.: A general supervised approach to segmentation of clinical texts. In: IEEE International Conference on Big Data, pp. 33–40 (2014)
Google Scholar
Gordon, L., Grantcharov, T., Rudzicz, F.: Explainable artificial intelligence for safe intraoperative decision support. JAMA Surg. 154(11), 1064–1065 (2019)
Article Google Scholar
Jaworski, W., Kozakoszczak, J.: ENIAM: categorial syntactic-semantic parser for Polish. In: Proceedings of COLING, pp. 243–247 (2016)
Google Scholar
Jaworski, W., et al.: Categorial parser. CLARIN-PL digital repository (2018)
Google Scholar
Kobylińska, K., Mikołajczyk, T., Adamek, M., Orłowski, T., Biecek, P.: Explainable machine learning for modeling of early postoperative mortality in lung cancer. In: Marcos, M., et al. (eds.) KR4HC/TEAAM -2019. LNCS (LNAI), vol. 11979, pp. 161–174. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37446-4_13
Chapter Google Scholar
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
MATH Google Scholar
Marciniak, M., Mykowiecka, A., Rychlik, P.: TermoPL – a flexible tool for terminology extraction. In: Proceedings of LREC, pp. 2278–2284. ELRA, Portorož, Slovenia (2016)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Minarro-Giménez, J.A., Marin-Alonso, O., Samwald, M.: Exploring the application of deep learning techniques on medical text corpora. Stud. Health Technol. Inform. 205, 584–588 (2014)
Google Scholar
Newman-Griffis, D., Lai, A.M., Fosler-Lussier, E.: Insights into analogy completion from the biomedical domain. arXiv preprint arXiv:1706.02241 (2017)
Orosz, G., Novák, A., Prószéky, G.: Hybrid text segmentation for Hungarian clinical records. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS (LNAI), vol. 8265, pp. 306–317. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45114-0_25
Chapter Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of EMNLP, pp. 1532–1543 (2014)
Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Article Google Scholar
Ruffini, M., Gavaldà, R., Limón, E.: Clustering patients with tensor decomposition. arXiv preprint arXiv:1708.08994 (2017)
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Article MathSciNet Google Scholar
Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language. In: Proceedings of COLING, pp. 2789–2804 (2012)
Google Scholar

Download references

Acknowledgments

This work was financially supported by NCBR Grant POIR.01.01.01-00-0328/17. PBi was supported by NCN Opus grant 2016/21/B/ST6/02176.

Author information

Authors and Affiliations

University of Warsaw, Banacha 2, Warsaw, Poland
Adam Gabriel Dobrakowski, Wojciech Jaworski & Przemysław Biecek
Institute of Computer Science Polish Academy of Sciences, Jana Kazimierza 5, Warsaw, Poland
Agnieszka Mykowiecka & Małgorzata Marciniak

Authors

Adam Gabriel Dobrakowski
View author publications
You can also search for this author in PubMed Google Scholar
Agnieszka Mykowiecka
View author publications
You can also search for this author in PubMed Google Scholar
Małgorzata Marciniak
View author publications
You can also search for this author in PubMed Google Scholar
Wojciech Jaworski
View author publications
You can also search for this author in PubMed Google Scholar
Przemysław Biecek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adam Gabriel Dobrakowski .

Editor information

Editors and Affiliations

Graz University of Technology, Graz, Austria
Denis Helic
University of Klagenfurt, Klagenfurt, Austria
Gerhard Leitner
Graz University of Technology, Graz, Austria
Martin Stettinger
Graz University of Technology, Graz, Austria
Alexander Felfernig
University of North Carolina at Charlotte, Charlotte, NC, USA
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dobrakowski, A.G., Mykowiecka, A., Marciniak, M., Jaworski, W., Biecek, P. (2020). Interpretable Segmentation of Medical Free-Text Records Based on Word Embeddings. In: Helic, D., Leitner, G., Stettinger, M., Felfernig, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2020. Lecture Notes in Computer Science(), vol 12117. Springer, Cham. https://doi.org/10.1007/978-3-030-59491-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-59491-6_5
Published: 17 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59490-9
Online ISBN: 978-3-030-59491-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics