Abstract
Sepsis is the main cause of mortality in intensive care units (ICUs), but the optimal treatment strategy still remains unclear. Managing the treatment of sepsis is challenging because individual patients respond differently to the treatment, thus calling for a pressing need of personalized treatment strategies. Reinforcement learning (RL) has been widely used to learn optimal strategies for sepsis treatment, especially for the administration of intravenous fluids and vasopressors. RL can be generally categorized into two types of approaches: the model-based and the model-free approaches. It has been shown that model-based approaches, with the prerequisite of accurate estimation of environment models, are more sample efficient than model-free approaches, but at the same time can only achieve inferior asymptotic performance. In this paper, we propose a policy mixture framework to make the best of both model-based and model-free RL approaches to achieve more efficient personalized sepsis treatment. We demonstrate that the policy derived from our framework outperforms policies prescribed by physicians, model-based only methods, and model-free only approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Byrne, L., Van Haren, F.: Fluid resuscitation in human sepsis: time to rewrite history? Ann. Intensive Care 7(1), 1–8 (2017). https://doi.org/10.1186/s13613-016-0231-8
Friedman, J.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Futoma, J., et al.: Learning to treat sepsis with multi-output Gaussian process deep recurrent Q-networks (2018)
Gotts, J., Matthay, M.: Sepsis: pathophysiology and clinical management. BMJ 353, i1585 (2016). https://doi.org/10.1136/bmj.i1585
Hanna, J., Stone, P., Niekum, S.: Bootstrapping with models: confidence intervals for off-policy evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 538–546 (2017)
Henmi, M., Yoshida, R., Eguchi, S.: Importance sampling via the estimated sampler. Biometrika 94(4), 985–991 (2007)
Johnson, A., Pollard, T., Shen, L., Li Wei, L., Feng, M., Ghassemi, M., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)
Komorowski, M., Celi, L.A., Badawi, O., Gordon, A., Faisal, A.: The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24(11), 1716–1720 (2018)
Komorowski, M., Gordon, A., Celi, L., Faisal, A.: A Markov decision process to suggest optimal treatment of severe infections in intensive care. In: Neural Information Processing Systems Workshop on Machine Learning for Health (2016)
Li, L., Komorowski, M., Faisal, A.: The actor search tree critic (ASTC) for off-policy POMDP learning in medical decision making. arXiv preprint arXiv:1805.11548 (2018)
Littman, M.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445–451 (2015)
Marik, P.: The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiol. Scand. 59(5), 561–567 (2015)
Marik, P., Bellomo, R.: A rational approach to fluid therapy in sepsis. BJA Br. J. Anaesthesia 116(3), 339–349 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nahler, G.: Last value carried forward (LVCF). In: Dictionary of Pharmaceutical Medicine, pp. 105–105. Springer, Vienna (2009). https://doi.org/10.1007/978-3-211-89836-9_773
Pal, C.V., Leon, F.: Brief survey of model-based reinforcement learning techniques. In: 2020 24th International Conference on System Theory, Control and Computing, pp. 92–97. IEEE (2020)
Peng, X., Ding, Y., Wihl, D., Gottesman, O., Komorowski, M., Lehman, L.W., Ross, A., et al.: Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. In: AMIA Annual Symposium Proceedings, vol. 2018, p. 887 (2018)
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep RL for model-based control. arXiv preprint arXiv:1802.09081 (2018)
Raghu, A., Komorowski, M., Ahmed, I., Celi, L.A., Szolovits, P., Ghassemi, M.: Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602 (2017)
Raghu, A., Komorowski, M., Celi, L.A., Szolovits, P., Ghassemi, M.: Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach. In: Machine Learning for Healthcare Conference, pp. 147–163 (2017)
Raghu, A., Komorowski, M., Singh, S.: Model-based reinforcement learning for sepsis treatment. arXiv preprint arXiv:1811.09602 (2018)
Rhodes, A., Evans, L., Alhazzani, W., Levy, M., Antonelli, M., Ferrer, R., et al.: Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Intensive Care Med. 43(3), 304–377 (2017)
Roggeveen, L., El Hassouni, A., Ahrendt, J., Guo, T., Fleuren, L., Thoral, P., et al.: Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artif. Intell. Med. 112, 102003 (2021)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Singer, M., Deutschman, C., Seymour, C.W., Shankar Hari, M., Annane, D., Bauer, M., et al.: The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315(8), 801–810 (2016)
Sutton, R.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems, pp. 1038–1044 (1996)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Thomas, P., Theocharous, G., Ghavamzadeh, M.: High-confidence off-policy evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, pp. 3000–3006 (2015)
Thomas, P., Theocharous, G., Ghavamzadeh, M.: High confidence policy improvement. In: International Conference on Machine Learning, pp. 2380–2388 (2015)
Utomo, C.P., Li, X., Chen, W.: Treatment recommendation in critical care: a scalable and interpretable approach in partially observable health states. In: 39th International Conference on Information Systems, pp. 1–9 (2018)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, pp. 2094–2100 (2016)
Waechter, J., Kumar, A., Lapinsky, S., Marshall, J., Dodek, P., Arabi, Y., et al.: Interaction between fluids and vasoactive agents on mortality in septic shock: a multicenter, observational study. Crit. Care Med. 42(10), 2158–2168 (2014)
Wang, T., et al.: Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057 (2019)
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003 (2016)
Watkins, C.J.C.H.: Learning from delayed rewards. King’s College, Cambridge United Kingdom (1989)
Yu, C., Liu, J., Nemati, S.: Reinforcement learning in healthcare: a survey. arXiv preprint arXiv:1908.08796 (2019)
Yu, C., Ren, G., Liu, J.: Deep inverse reinforcement learning for sepsis treatment. In: 2019 IEEE International Conference on Healthcare Informatics, pp. 1–3. IEEE (2019)
Zaheer, M., Reddi, S., Sachan, D., Kale, S., Kumar, S.: Adaptive methods for nonconvex optimization. In: Advances in Neural Information Processing Systems, vol. 31, pp. 9815–9825 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X., Yu, C., Huang, Q., Wang, L., Wu, J., Guan, X. (2021). Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment. In: Wei, Y., Li, M., Skums, P., Cai, Z. (eds) Bioinformatics Research and Applications. ISBRA 2021. Lecture Notes in Computer Science(), vol 13064. Springer, Cham. https://doi.org/10.1007/978-3-030-91415-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-91415-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91414-1
Online ISBN: 978-3-030-91415-8
eBook Packages: Computer ScienceComputer Science (R0)