Abstract
Rare diseases affect 350 million patients worldwide, but they are commonly delayed in diagnosis or misdiagnosed. The problem of detecting rare disease faces two main challenges: the first being extreme imbalance of data and the second being finding the appropriate features. In this paper, we propose to address the problems by using semi-supervised generative adversarial networks (GANs) to deal with the data imbalance issue and recurrent neural networks (RNNs) to directly model patient sequences. We experimented with detecting patients with a particular rare disease (exocrine pancreatic insufficiency, EPI). The dataset includes 1.8 million patients with 29,149 patients being positive, from a large longitudinal study using 7 years medical claims. Our model achieved 0.56 PR-AUC and outperformed benchmark models in terms of precision and recall.
Keywords
- Rare Disease Detection
- Sequence data modeling
- Long Short Term Memory
- Generative Adversarial Networks
This is a preview of subscription content, access via your institution.
Buying options




References
Bai, T., Zhang, S., Egleston, B.L., Vucetic, S.: Interpretable representation learning for healthcare via capturing disease progression through time. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 43–51. ACM (2018)
Boat, T.F., Field, M.J., et al.: Rare Diseases and Orphan Products: Accelerating Research and Development. National Academies Press, Washington, DC (2011)
Cameron, M.J., Horst, M., Lawhorne, L.W., Lichtenberg, P.A.: Evaluation of academic detailing for primary care physician dementia education. Am. J. Alzheimer’s Dis. Other Dement.® 25(4), 333–339 (2010)
Che, Z., Purushotham, S., Khemani, R.G., Liu, Y.: Interpretable deep models for ICU outcome prediction. In: AMIA Annual Symposium Proceedings. AMIA Symposium, vol. 2016, pp. 371–380 (2016)
Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15(141), 20170387 (2018)
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Machine Learning for Healthcare Conference, pp. 301–318 (2016)
Choi, E., Bahadori, M.T., Sun, J., Kulas, J., Schuetz, A., Stewart, W.: Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems, pp. 3504–3512 (2016)
Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24(2), 361–370 (2016)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Dai, Z., Yang, Z., Yang, F., Cohen, W.W., Salakhutdinov, R.R.: Good semi-supervised learning that requires a bad GAN. In: Advances in Neural Information Processing Systems, pp. 6510–6520 (2017)
Ghassemi, M., Naumann, T., Schulam, P., Beam, A.L., Ranganath, R.: Opportunities in machine learning for healthcare. arXiv preprint arXiv:1806.00388 (2018)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Goodfellow, I.J.: On distinguishability criteria for estimating generative models. arXiv preprint arXiv:1412.6515 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Kaplan, W., Wirtz, V., Mantel, A., Béatrice, P.: Priority medicines for Europe and the world update 2013 report. Methodology 2(7), 99–102 (2013)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, W., Wang, Y., Cai, Y., Arnold, C., Zhao, E., Yuan, Y.: Semi-supervised rare disease detection using generative adversarial network. arXiv preprint arXiv:1812.00547 (2018)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016)
Obermeyer, Z., Emanuel, E.J.: Predicting the future–big data, machine learning, and clinical medicine. New Engl. J. Med. 375(13), 1216 (2016)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Purves, R.D.: Optimum numerical integration methods for estimation of area-under-the-curve (AUC) and area-under-the-moment-curve (AUMC). J. Pharmacokinet. Biopharm. 20(3), 211–226 (1992)
Rajkomar, A., et al.: Scalable and accurate deep learning with electronic health records. NPJ Dig. Med. 1(1), 18 (2018)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Xiao, C., Choi, E., Sun, J.: Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 25(10), 1419–1428 (2018)
Zhao, J., Mathieu, M., LeCun, Y.: Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, K., Wang, Y., Cai, Y. (2020). Modelling Patient Sequences for Rare Disease Detection with Semi-supervised Generative Adversarial Nets. In: Lemaire, V., Malinowski, S., Bagnall, A., Bondu, A., Guyet, T., Tavenard, R. (eds) Advanced Analytics and Learning on Temporal Data. AALTD 2019. Lecture Notes in Computer Science(), vol 11986. Springer, Cham. https://doi.org/10.1007/978-3-030-39098-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-39098-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39097-6
Online ISBN: 978-3-030-39098-3
eBook Packages: Computer ScienceComputer Science (R0)
-
Published in cooperation with
http://www.ecmlpkdd.org/