Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding
- 55 Downloads
Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.
KeywordsCode embedding Word2Vec Sequential matching Patient similarity matching Cancer
This work is partially supported by the Telstra-Deakin Centre of Excellence (CoE) in Big Data and Machine Learning. Dinh Phung gratefully acknowledges the partial support from the Australian Research Council (ARC).
Compliance with Ethical Standards
Conflict of Interest
The authors have no conflict of interest to declare.
Ethics approval was obtained from the New South Wales Population and Health Services Research Ethics Committee (AU RED Reference: HREC/15/CIPHS/1).
This study is a secondary analysis of routinely collected data, and the consent had been obtained by the original data guarantor.
- 1.World Health Organization: International Classification of Diseases (ICD). http://www.who.int/classifications/icd/en/, 2013
- 2.World Health Organization: International statistical classification of diseases and related health problems 10th revision. [Online]. Available: http://apps.who.int/classifications/icd10/browse/2010/en, 2010
- 3.Australian Consortium for Classification Development: ICD-10-AM. [Online]. Available: https://www.accd.net.au/Icd10.aspx, 2017
- 5.Wang, F., Hu, J., and Sun, J.: Medical prognosis based on patient similarity and expert feedback. In: The 21st International Conference on Pattern Recognition, pp. 1799–1802, IEEE, 2012.Google Scholar
- 6.Choi, E., Schuetz, A., Stewart, W. F., and Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv:1602.03686, 2016
- 7.Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119, 2013.Google Scholar
- 10.Hielscher, T., Spiliopoulou, M., Völzke, H., and Kühn, J.-P.: Using participant similarity for the classification of epidemiological data on hepatic steatosis. In: The 27th International Symposium on Computer-Based Medical Systems, pp. 1–7, IEEE, 2014.Google Scholar
- 11.Le, Q, and Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196, 2014.Google Scholar
- 12.Levy, O., Goldberg, Y., and Dagan, I., Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3:211–225, 2015.Google Scholar
- 13.Grover, A, and Leskovec, J.: node2vec: scalable feature learning for networks in KDD. In: ACM, pp. 855–864, 2016.Google Scholar
- 14.Nguyen, D., Luo, W., Nguyen, T. D., Venkatesh, S., and Phung, D.: Learning graph representation via frequent subgraphs. In: SDM. Accepted, SIAM, 2018.Google Scholar
- 15.Moen, H., Ginter, F., Marsi, E., Peltonen, L.-M., Salakoski, T., and Salanterä, S., Care episode retrieval: distributional semantic models for information retrieval in the clinical domain. BMC Med. Inform. Decis. Mak. 15(2):1, 2015.Google Scholar
- 17.Choi, E., Bahadori, M. T., Searles, E., Coffey, C., Thompson, M., Bost, J., Tejedor-Sojo, J., and Sun. J.: Multi-layer representation learning for medical concepts in KDD. In: ACM, pp. 1495–1504, 2016.Google Scholar
- 18.Choi, Y., Chiu, C. Y.-I., and Sontag, D.: Learning low-dimensional representations of medical concepts. In: AMIA Summits on Translational Science Proceedings, pp. 41–51, 2016.Google Scholar
- 19.Mikolov, T., Chen, K., Corrado, G., and Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013
- 21.Nguyen, D., Luo, W., Phung, D., and Venkatesh, S.: Exceptional contrast set mining: moving beyond the deluge of the obvious. In: Australasian Joint Conference on Artificial Intelligence, pp. 455–468. Springer, Berlin, 2016.Google Scholar
- 25.Maaten, L. V. D., and Hinton, G., Visualizing data using t-sne. Journal of Machine Learning Research 9: 2579–2605, 2008.Google Scholar
- 27.Pham, T., Tran, T., Phung, D., and Venkatesh, S., Deepcare: a deep dynamic memory model for predictive medicine in PAKDD, pp. 30–41. Berlin: Springer, 2016.Google Scholar
- 30.Nguyen, D., Nguyen, T. D., Luo, W., and Venkatesh, S.: Trans2vec: learning transaction embedding via items and frequent itemsets. In: PAKDD. Accepted. Springer, Berlin, 2018.Google Scholar