Beyond Topics: Discovering Latent Healthcare Objectives from Event Sequences

Caruana, Adrian; Bandara, Madhushi; Catchpoole, Daniel; Kennedy, Paul J.

doi:10.1007/978-3-030-97546-3_30

Adrian Caruana¹¹,
Madhushi Bandara¹¹,
Daniel Catchpoole^11,12 &
…
Paul J. Kennedy¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13151))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1786 Accesses
1 Citations

Abstract

A meaningful understanding of clinical protocols and patient pathways helps improve healthcare outcomes. Electronic health records (EHR) reflect real-world treatment behaviours that are used to enhance healthcare management but present challenges; protocols and pathways are often loosely defined and with elements frequently not recorded in EHRs, complicating the enhancement. To solve this challenge, healthcare objectives associated with healthcare management activities can be indirectly observed in EHRs as latent topics. Topic models, such as Latent Dirichlet Allocation (LDA), are used to identify latent patterns in EHR data. However, they do not examine the ordered nature of EHR sequences, nor do they appraise individual events in isolation. Our novel approach, the Categorical Sequence Encoder (CaSE) addresses these shortcomings. The sequential nature of EHRs is captured by CaSE’s event-level representations, revealing latent healthcare objectives. In synthetic EHR sequences, CaSE outperforms LDA by up to 37% at identifying healthcare objectives. In the real-world MIMIC-III dataset, CaSE identifies meaningful representations that could critically enhance protocol and pathway development.

This work was supported by Cancer Australia in the form of a doctoral research stipend (to A. Caruana).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The sliding window length of 32 is the mean length (\(1/\alpha \)) of treatment groups.
2.
Because HDBSCAN is nonlinear, PHC works best when the neighbourhood is small.

References

Bergin, R.J., Whitfield, K., White, V., Milne, R.L., Emery, J.D., et al.: Optimal care pathways: a national policy to improve quality of cancer care and address inequalities in cancer outcomes. J. Cancer Policy 25, 100254 (2020). https://doi.org/10.1016/j.jcpo.2020.100245
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chicco, D. In: Siamese Neural Networks: An Overview, pp. 73–94. Springer, US (2020). https://doi.org/10.1007/978-1-0716-0826-53
Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2016). https://doi.org/10.1145/2939672.2939823
Dieng, A.B., Wang, C., Gao, J., Paisley, J.W.: Topicrnn: a recurrent neural network with long-range semantic dependency. In: ICLR (Poster) (2016)
Google Scholar
Forster, K., et al.: Can concordance between actual care received and a pathway map be measured on a population level in Ontario? a pilot study. Current Oncol. 27(1), 27–33 (2020). https://doi.org/10.3747/co.27.5349
Article Google Scholar
Hinton, G.E.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.1127647
Article MathSciNet MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Hoyle, A.M., Goel, P., Resnik, P.: Improving neural topic models using knowledge distillation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics (2020). https://doi.org/10.18653/1/2020.emnlp-main.137
Huang, Z., Dong, W., Bath, P., Ji, L., Duan, H.: On mining latent treatment patterns from electronic medical records. Data Mining and Knowledge Discovery 29(4), 914–949 (2014). https://doi.org/10.1007/s10618-014-0381-y
Article MathSciNet Google Scholar
Huang, Z., Ge, Z., Dong, W., He, K., Duan, H.: Probabilistic modeling personalized treatment pathways using electronic health records. J. Biomed. Inf. 86, 33–48 (2018). https://doi.org/10.1016/j.jbi.2018.08.004
Article Google Scholar
Johnson, A., Pollard, T., Mark III, R.: Mimic-iii clinical database (version 1.4). Physio Net 10, C2XW26 (2016)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR 2015 : International Conference on Learning Representations 2015 (2015)
Google Scholar
McInnes, L., Healy, J., Astels, S.: hdbscan: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017). https://doi.org/10.21105/joss.00205
Article Google Scholar
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv (2018). http://arxiv.org/abs/1802.03426
Mohler, J.L., et al.: Prostate cancer, version 2.2019, NCCN clinical practice guidelines in oncology. J. National Comprehensive Cancer Netw. 17(5), 479–505 (2019). https://doi.org/10.6004/jnccn.2019.0023
Mueller, A., Dredze, M.: Fine-tuning encoders for improved monolingual and zero-shot polylingual neural topic modeling. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1-2F2021.naacl-main.243
Neculoiu, P., Versteegh, M., Rotaru, M.: Learning text similarity with siamese recurrent networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/w16-1617
Organization, W.H.: International classification of diseases : [9th] ninth revision, basic tabulation list with alphabetic index
Google Scholar
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. Curran Associates Inc, vol. 32, pp. 8024–8035 (2019). papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence embeddings using siamese BERT-networks. Assoc. Comput. Linguist. (2019). https://doi.org/10.18653/v1/d19-1410
Article Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986). https://doi.org/10.1038/323533a0
Article MATH Google Scholar
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Statist. Assoc. 101(476), 1566–1581 (2006). https://doi.org/10.1198/016214506000000302
Article MathSciNet MATH Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. vol. 30, pp. 5998–6008 (2017)
Google Scholar
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison. ACM Press (2009). https://doi.org/10.1145/1553374.1553511
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Australian Artificial Intelligence Institute, Faculty of Engineering and IT, University of Technology Sydney, Sydney, Australia
Adrian Caruana, Madhushi Bandara, Daniel Catchpoole & Paul J. Kennedy
Biospecimen Research Services, The Children’s Cancer Research Unit, The Children’s Hospital at Westmead, 2145, Westmead, NSW, Australia
Daniel Catchpoole

Authors

Adrian Caruana
View author publications
You can also search for this author in PubMed Google Scholar
Madhushi Bandara
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Catchpoole
View author publications
You can also search for this author in PubMed Google Scholar
Paul J. Kennedy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian Caruana .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Guodong Long
RMIT University, Melbourne, SA, Australia
Xinghuo Yu
University of Queensland, Brisbane, QLD, Australia
Sen Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Caruana, A., Bandara, M., Catchpoole, D., Kennedy, P.J. (2022). Beyond Topics: Discovering Latent Healthcare Objectives from Event Sequences. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-97546-3_30
Published: 19 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics