Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment

Liu, Xiangyu; Yu, Chao; Huang, Qikai; Wang, Luhao; Wu, Jianfeng; Guan, Xiangdong

doi:10.1007/978-3-030-91415-8_10

Xiangyu Liu ORCID: orcid.org/0000-0003-3255-1467¹²,
Chao Yu ORCID: orcid.org/0000-0002-4371-3663¹²,
Qikai Huang¹³,
Luhao Wang¹⁴,
Jianfeng Wu¹⁴ &
…
Xiangdong Guan¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 13064))

Included in the following conference series:

International Symposium on Bioinformatics Research and Applications

1826 Accesses
2 Citations

Abstract

Sepsis is the main cause of mortality in intensive care units (ICUs), but the optimal treatment strategy still remains unclear. Managing the treatment of sepsis is challenging because individual patients respond differently to the treatment, thus calling for a pressing need of personalized treatment strategies. Reinforcement learning (RL) has been widely used to learn optimal strategies for sepsis treatment, especially for the administration of intravenous fluids and vasopressors. RL can be generally categorized into two types of approaches: the model-based and the model-free approaches. It has been shown that model-based approaches, with the prerequisite of accurate estimation of environment models, are more sample efficient than model-free approaches, but at the same time can only achieve inferior asymptotic performance. In this paper, we propose a policy mixture framework to make the best of both model-based and model-free RL approaches to achieve more efficient personalized sepsis treatment. We demonstrate that the policy derived from our framework outperforms policies prescribed by physicians, model-based only methods, and model-free only approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Byrne, L., Van Haren, F.: Fluid resuscitation in human sepsis: time to rewrite history? Ann. Intensive Care 7(1), 1–8 (2017). https://doi.org/10.1186/s13613-016-0231-8
Article CAS Google Scholar
Friedman, J.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Article Google Scholar
Futoma, J., et al.: Learning to treat sepsis with multi-output Gaussian process deep recurrent Q-networks (2018)
Google Scholar
Gotts, J., Matthay, M.: Sepsis: pathophysiology and clinical management. BMJ 353, i1585 (2016). https://doi.org/10.1136/bmj.i1585
Article PubMed Google Scholar
Hanna, J., Stone, P., Niekum, S.: Bootstrapping with models: confidence intervals for off-policy evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 538–546 (2017)
Google Scholar
Henmi, M., Yoshida, R., Eguchi, S.: Importance sampling via the estimated sampler. Biometrika 94(4), 985–991 (2007)
Article Google Scholar
Johnson, A., Pollard, T., Shen, L., Li Wei, L., Feng, M., Ghassemi, M., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Article Google Scholar
Komorowski, M., Celi, L.A., Badawi, O., Gordon, A., Faisal, A.: The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24(11), 1716–1720 (2018)
Article CAS Google Scholar
Komorowski, M., Gordon, A., Celi, L., Faisal, A.: A Markov decision process to suggest optimal treatment of severe infections in intensive care. In: Neural Information Processing Systems Workshop on Machine Learning for Health (2016)
Google Scholar
Li, L., Komorowski, M., Faisal, A.: The actor search tree critic (ASTC) for off-policy POMDP learning in medical decision making. arXiv preprint arXiv:1805.11548 (2018)
Littman, M.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445–451 (2015)
Article CAS Google Scholar
Marik, P.: The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiol. Scand. 59(5), 561–567 (2015)
Article CAS Google Scholar
Marik, P., Bellomo, R.: A rational approach to fluid therapy in sepsis. BJA Br. J. Anaesthesia 116(3), 339–349 (2016)
Article CAS Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article CAS Google Scholar
Nahler, G.: Last value carried forward (LVCF). In: Dictionary of Pharmaceutical Medicine, pp. 105–105. Springer, Vienna (2009). https://doi.org/10.1007/978-3-211-89836-9_773
Pal, C.V., Leon, F.: Brief survey of model-based reinforcement learning techniques. In: 2020 24th International Conference on System Theory, Control and Computing, pp. 92–97. IEEE (2020)
Google Scholar
Peng, X., Ding, Y., Wihl, D., Gottesman, O., Komorowski, M., Lehman, L.W., Ross, A., et al.: Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. In: AMIA Annual Symposium Proceedings, vol. 2018, p. 887 (2018)
Google Scholar
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep RL for model-based control. arXiv preprint arXiv:1802.09081 (2018)
Raghu, A., Komorowski, M., Ahmed, I., Celi, L.A., Szolovits, P., Ghassemi, M.: Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602 (2017)
Raghu, A., Komorowski, M., Celi, L.A., Szolovits, P., Ghassemi, M.: Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach. In: Machine Learning for Healthcare Conference, pp. 147–163 (2017)
Google Scholar
Raghu, A., Komorowski, M., Singh, S.: Model-based reinforcement learning for sepsis treatment. arXiv preprint arXiv:1811.09602 (2018)
Rhodes, A., Evans, L., Alhazzani, W., Levy, M., Antonelli, M., Ferrer, R., et al.: Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Intensive Care Med. 43(3), 304–377 (2017)
Article Google Scholar
Roggeveen, L., El Hassouni, A., Ahrendt, J., Guo, T., Fleuren, L., Thoral, P., et al.: Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artif. Intell. Med. 112, 102003 (2021)
Article Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Singer, M., Deutschman, C., Seymour, C.W., Shankar Hari, M., Annane, D., Bauer, M., et al.: The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315(8), 801–810 (2016)
Article CAS Google Scholar
Sutton, R.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems, pp. 1038–1044 (1996)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Google Scholar
Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Google Scholar
Thomas, P., Theocharous, G., Ghavamzadeh, M.: High-confidence off-policy evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, pp. 3000–3006 (2015)
Google Scholar
Thomas, P., Theocharous, G., Ghavamzadeh, M.: High confidence policy improvement. In: International Conference on Machine Learning, pp. 2380–2388 (2015)
Google Scholar
Utomo, C.P., Li, X., Chen, W.: Treatment recommendation in critical care: a scalable and interpretable approach in partially observable health states. In: 39th International Conference on Information Systems, pp. 1–9 (2018)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, pp. 2094–2100 (2016)
Google Scholar
Waechter, J., Kumar, A., Lapinsky, S., Marshall, J., Dodek, P., Arabi, Y., et al.: Interaction between fluids and vasoactive agents on mortality in septic shock: a multicenter, observational study. Crit. Care Med. 42(10), 2158–2168 (2014)
Article CAS Google Scholar
Wang, T., et al.: Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057 (2019)
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003 (2016)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. King’s College, Cambridge United Kingdom (1989)
Google Scholar
Yu, C., Liu, J., Nemati, S.: Reinforcement learning in healthcare: a survey. arXiv preprint arXiv:1908.08796 (2019)
Yu, C., Ren, G., Liu, J.: Deep inverse reinforcement learning for sepsis treatment. In: 2019 IEEE International Conference on Healthcare Informatics, pp. 1–3. IEEE (2019)
Google Scholar
Zaheer, M., Reddi, S., Sachan, D., Kale, S., Kumar, S.: Adaptive methods for nonconvex optimization. In: Advances in Neural Information Processing Systems, vol. 31, pp. 9815–9825 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Sun Yat-sen University, Guangzhou, 510275, China
Xiangyu Liu & Chao Yu
Department of Orthopedics, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, 2800 Gongwei Road, Pudong, Shanghai, 201399, China
Qikai Huang
The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510080, China
Luhao Wang, Jianfeng Wu & Xiangdong Guan

Authors

Xiangyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Qikai Huang
View author publications
You can also search for this author in PubMed Google Scholar
Luhao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Guan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Yu .

Editor information

Editors and Affiliations

Shenzhen Institutes of Advanced Technology, Shenzhen, China
Yanjie Wei
Central South University, Changsha, China
Min Li
Georgia State University, Atlanta, GA, USA
Pavel Skums
Georgia State University, Atlanta, GA, USA
Zhipeng Cai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Yu, C., Huang, Q., Wang, L., Wu, J., Guan, X. (2021). Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment. In: Wei, Y., Li, M., Skums, P., Cai, Z. (eds) Bioinformatics Research and Applications. ISBRA 2021. Lecture Notes in Computer Science(), vol 13064. Springer, Cham. https://doi.org/10.1007/978-3-030-91415-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-91415-8_10
Published: 18 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91414-1
Online ISBN: 978-3-030-91415-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics