Skip to main content

Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2021)

Abstract

Sepsis is the main cause of mortality in intensive care units (ICUs), but the optimal treatment strategy still remains unclear. Managing the treatment of sepsis is challenging because individual patients respond differently to the treatment, thus calling for a pressing need of personalized treatment strategies. Reinforcement learning (RL) has been widely used to learn optimal strategies for sepsis treatment, especially for the administration of intravenous fluids and vasopressors. RL can be generally categorized into two types of approaches: the model-based and the model-free approaches. It has been shown that model-based approaches, with the prerequisite of accurate estimation of environment models, are more sample efficient than model-free approaches, but at the same time can only achieve inferior asymptotic performance. In this paper, we propose a policy mixture framework to make the best of both model-based and model-free RL approaches to achieve more efficient personalized sepsis treatment. We demonstrate that the policy derived from our framework outperforms policies prescribed by physicians, model-based only methods, and model-free only approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Byrne, L., Van Haren, F.: Fluid resuscitation in human sepsis: time to rewrite history? Ann. Intensive Care 7(1), 1–8 (2017). https://doi.org/10.1186/s13613-016-0231-8

    Article  CAS  Google Scholar 

  2. Friedman, J.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  Google Scholar 

  3. Futoma, J., et al.: Learning to treat sepsis with multi-output Gaussian process deep recurrent Q-networks (2018)

    Google Scholar 

  4. Gotts, J., Matthay, M.: Sepsis: pathophysiology and clinical management. BMJ 353, i1585 (2016). https://doi.org/10.1136/bmj.i1585

    Article  PubMed  Google Scholar 

  5. Hanna, J., Stone, P., Niekum, S.: Bootstrapping with models: confidence intervals for off-policy evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 538–546 (2017)

    Google Scholar 

  6. Henmi, M., Yoshida, R., Eguchi, S.: Importance sampling via the estimated sampler. Biometrika 94(4), 985–991 (2007)

    Article  Google Scholar 

  7. Johnson, A., Pollard, T., Shen, L., Li Wei, L., Feng, M., Ghassemi, M., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3(1), 1–9 (2016)

    Article  Google Scholar 

  8. Komorowski, M., Celi, L.A., Badawi, O., Gordon, A., Faisal, A.: The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24(11), 1716–1720 (2018)

    Article  CAS  Google Scholar 

  9. Komorowski, M., Gordon, A., Celi, L., Faisal, A.: A Markov decision process to suggest optimal treatment of severe infections in intensive care. In: Neural Information Processing Systems Workshop on Machine Learning for Health (2016)

    Google Scholar 

  10. Li, L., Komorowski, M., Faisal, A.: The actor search tree critic (ASTC) for off-policy POMDP learning in medical decision making. arXiv preprint arXiv:1805.11548 (2018)

  11. Littman, M.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445–451 (2015)

    Article  CAS  Google Scholar 

  12. Marik, P.: The demise of early goal-directed therapy for severe sepsis and septic shock. Acta Anaesthesiol. Scand. 59(5), 561–567 (2015)

    Article  CAS  Google Scholar 

  13. Marik, P., Bellomo, R.: A rational approach to fluid therapy in sepsis. BJA Br. J. Anaesthesia 116(3), 339–349 (2016)

    Article  CAS  Google Scholar 

  14. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  CAS  Google Scholar 

  15. Nahler, G.: Last value carried forward (LVCF). In: Dictionary of Pharmaceutical Medicine, pp. 105–105. Springer, Vienna (2009). https://doi.org/10.1007/978-3-211-89836-9_773

  16. Pal, C.V., Leon, F.: Brief survey of model-based reinforcement learning techniques. In: 2020 24th International Conference on System Theory, Control and Computing, pp. 92–97. IEEE (2020)

    Google Scholar 

  17. Peng, X., Ding, Y., Wihl, D., Gottesman, O., Komorowski, M., Lehman, L.W., Ross, A., et al.: Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. In: AMIA Annual Symposium Proceedings, vol. 2018, p. 887 (2018)

    Google Scholar 

  18. Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep RL for model-based control. arXiv preprint arXiv:1802.09081 (2018)

  19. Raghu, A., Komorowski, M., Ahmed, I., Celi, L.A., Szolovits, P., Ghassemi, M.: Deep reinforcement learning for sepsis treatment. arXiv preprint arXiv:1711.09602 (2017)

  20. Raghu, A., Komorowski, M., Celi, L.A., Szolovits, P., Ghassemi, M.: Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach. In: Machine Learning for Healthcare Conference, pp. 147–163 (2017)

    Google Scholar 

  21. Raghu, A., Komorowski, M., Singh, S.: Model-based reinforcement learning for sepsis treatment. arXiv preprint arXiv:1811.09602 (2018)

  22. Rhodes, A., Evans, L., Alhazzani, W., Levy, M., Antonelli, M., Ferrer, R., et al.: Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Intensive Care Med. 43(3), 304–377 (2017)

    Article  Google Scholar 

  23. Roggeveen, L., El Hassouni, A., Ahrendt, J., Guo, T., Fleuren, L., Thoral, P., et al.: Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artif. Intell. Med. 112, 102003 (2021)

    Article  Google Scholar 

  24. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv preprint arXiv:1511.05952 (2015)

  25. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  26. Singer, M., Deutschman, C., Seymour, C.W., Shankar Hari, M., Annane, D., Bauer, M., et al.: The third international consensus definitions for sepsis and septic shock (Sepsis-3). JAMA 315(8), 801–810 (2016)

    Article  CAS  Google Scholar 

  27. Sutton, R.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)

    Google Scholar 

  28. Sutton, R.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems, pp. 1038–1044 (1996)

    Google Scholar 

  29. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  30. Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)

    Google Scholar 

  31. Thomas, P., Theocharous, G., Ghavamzadeh, M.: High-confidence off-policy evaluation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, pp. 3000–3006 (2015)

    Google Scholar 

  32. Thomas, P., Theocharous, G., Ghavamzadeh, M.: High confidence policy improvement. In: International Conference on Machine Learning, pp. 2380–2388 (2015)

    Google Scholar 

  33. Utomo, C.P., Li, X., Chen, W.: Treatment recommendation in critical care: a scalable and interpretable approach in partially observable health states. In: 39th International Conference on Information Systems, pp. 1–9 (2018)

    Google Scholar 

  34. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, pp. 2094–2100 (2016)

    Google Scholar 

  35. Waechter, J., Kumar, A., Lapinsky, S., Marshall, J., Dodek, P., Arabi, Y., et al.: Interaction between fluids and vasoactive agents on mortality in septic shock: a multicenter, observational study. Crit. Care Med. 42(10), 2158–2168 (2014)

    Article  CAS  Google Scholar 

  36. Wang, T., et al.: Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057 (2019)

  37. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003 (2016)

    Google Scholar 

  38. Watkins, C.J.C.H.: Learning from delayed rewards. King’s College, Cambridge United Kingdom (1989)

    Google Scholar 

  39. Yu, C., Liu, J., Nemati, S.: Reinforcement learning in healthcare: a survey. arXiv preprint arXiv:1908.08796 (2019)

  40. Yu, C., Ren, G., Liu, J.: Deep inverse reinforcement learning for sepsis treatment. In: 2019 IEEE International Conference on Healthcare Informatics, pp. 1–3. IEEE (2019)

    Google Scholar 

  41. Zaheer, M., Reddi, S., Sachan, D., Kale, S., Kumar, S.: Adaptive methods for nonconvex optimization. In: Advances in Neural Information Processing Systems, vol. 31, pp. 9815–9825 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X., Yu, C., Huang, Q., Wang, L., Wu, J., Guan, X. (2021). Combining Model-Based and Model-Free Reinforcement Learning Policies for More Efficient Sepsis Treatment. In: Wei, Y., Li, M., Skums, P., Cai, Z. (eds) Bioinformatics Research and Applications. ISBRA 2021. Lecture Notes in Computer Science(), vol 13064. Springer, Cham. https://doi.org/10.1007/978-3-030-91415-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91415-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91414-1

  • Online ISBN: 978-3-030-91415-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics