From Ads to Interventions: Contextual Bandits in Mobile Health

Tewari, Ambuj; Murphy, Susan A.

doi:10.1007/978-3-319-51394-2_25

Ambuj Tewari⁴ &
Susan A. Murphy⁴

3560 Accesses
39 Citations

Abstract

The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of the American Statistical Association, 74(368), 799–806, 1979) but the term “contextual bandits” was invented only recently in 2008 by Langford and Zhang (Advances in neural information processing systems, pages 817–824, 2008). Woodroofe’s motivating application was clinical trials whereas modern interest in this problem was driven to a great extent by problems on the internet, such as online ad and online news article placement. We have now come full circle because contextual bandits provide a natural framework for sequential decision making in mobile health. We will survey the contextual bandits literature with a focus on modifications needed to adapt existing approaches to the mobile health setting. We discuss specific challenges in this direction such as: good initialization of the learning algorithm, finding interpretable policies, assessing usefulness of tailoring variables, computational considerations, robustness to failure of assumptions, and dealing with variables that are costly to acquire and missing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

John Gittins, Kevin Glazebrook, and Richard Weber. Multi-armed bandit allocation indices. John Wiley & Sons, 2011.
Google Scholar
Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistical Association, 74(368):799–806, 1979.
Article MathSciNet MATH Google Scholar
Chih-Chun Wang, Sanjeev R. Kulkarni, and H. Vincent Poor. Bandit problems with side observations. Automatic Control, IEEE Transactions on, 50(3):338–355, 2005.
Google Scholar
Chih-Chun Wang, Sanjeev R. Kulkarni, and H. Vincent Poor. Arbitrary side observations in bandit problems. Advances in Applied Mathematics, 34(4):903–938, 2005.
Google Scholar
Alexander Goldenshluger and Assaf Zeevi. A note on performance limitations in bandit problems with side information. Information Theory, IEEE Transactions on, 57(3):1707–1713, 2011.
Article MathSciNet MATH Google Scholar
Naoki Abe and Philip M. Long. Associative reinforcement learning using linear probabilistic concepts. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 3–11, 1999.
Google Scholar
Leslie P. Kaelbling. Associative reinforcement learning: A generate and test algorithm. Machine Learning, 15(3):299–319, 1994.
MATH Google Scholar
Leslie P. Kaelbling. Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15(3):279–298, 1994.
MATH Google Scholar
Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4):263–293, 2003.
Article MathSciNet MATH Google Scholar
Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, and Haym Hirsh. Experience-efficient learning in associative bandit problems. In Proceedings of the 23rd international conference on Machine learning, pages 889–896. ACM, 2006.
Google Scholar
Murray K. Clayton. Covariate models for Bernoulli bandits. Sequential Analysis, 8(4):405–426, 1989.
Article MathSciNet MATH Google Scholar
Jyotirmoy Sarkar. One-armed bandit problems with covariates. The Annals of Statistics, pages 1978–2002, 1991.
Google Scholar
Yuhong Yang and Dan Zhu. Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. The Annals of Statistics, 30(1):100–121, 2002.
Article MathSciNet MATH Google Scholar
Philippe Rigollet and Assaf Zeevi. Nonparametric bandits with covariates. In Adam Tauman Kalai and Mehryar Mohri, editors, Proceedings of the 23rd Conference on Learning Theory, pages 54–66, 2010.
Google Scholar
John Langford and Tong Zhang. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems, pages 817–824, 2008.
Google Scholar
Naoki Abe and Atsuyoshi Nakamura. Learning to optimally schedule internet banner advertisements. In Proceedings of the Sixteenth International Conference on Machine Learning, pages 12–21. Morgan Kaufmann Publishers Inc., 1999.
Google Scholar
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web, pages 661–670. ACM, 2010.
Google Scholar
Inbal Nahum-Shani, Shawna N. Smith, Bonnie J. Spring, Linda M. Collins, Katie Witkiewitz, Ambuj Tewari, and Susan A. Murphy. Just-in-time adaptive interventions (JITAIs) in mobile health: Key components and design principles for ongoing health behavior support. Annals of Behavioral Medicine, 2016. accepted subject to revisions.
Google Scholar
Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, and François Laviolette. PAC-Bayesian analysis of contextual bandits. In Advances in Neural Information Processing Systems, pages 1683–1691, 2011.
Google Scholar
Aleksandrs Slivkins. Contextual bandits with similarity information. The Journal of Machine Learning Research, 15(1):2533–2568, 2014.
MathSciNet MATH Google Scholar
Rajeev Agrawal and Demosthenis Teneketzis. Certainty equivalence control with forcing: revisited. Systems & control letters, 13(5):405–412, 1989.
Article MathSciNet MATH Google Scholar
Alexander Goldenshluger and Assaf Zeevi. A linear response bandit problem. Stochastic Systems, 3(1):230–261, 2013.
Article MathSciNet MATH Google Scholar
Alexander Goldenshluger and Assaf Zeevi. Woodroofe’s one-armed bandit problem revisited. The Annals of Applied Probability, 19(4):1603–1633, 2009.
Article MathSciNet MATH Google Scholar
Hamsa Bastani and Mohsen Bayati. Online decision-making with high-dimensional covariates. Available at SSRN 2661896, 2015.
Google Scholar
Alekh Agarwal, Miroslav Dudík, Satyen Kale, John Langford, and Robert E. Schapire. Contextual bandit learning with predictable rewards. In International Conference on Artificial Intelligence and Statistics, pages 19–26, 2012.
Google Scholar
Vianney Perchet and Philippe Rigollet. The multi-armed bandit problem with covariates. The Annals of Statistics, 41(2):693–721, 2013.
Article MathSciNet MATH Google Scholar
Wei Qian and Yuhong Yang. Randomized allocation with arm elimination in a bandit problem with covariates. Electronic Journal of Statistics, 10(1):242–270, 2016.
Article MathSciNet MATH Google Scholar
Miroslav Dudik, Daniel Hsu, Satyen Kale, Nikos Karampatziakis, John Langford, Lev Reyzin, and Tong Zhang. Efficient optimal learning for contextual bandits. In Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence, pages 169–178. AUAI Press, 2011.
Google Scholar
Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In Proceedings of the 31st International Conference on Machine Learning, pages 1638–1646, 2014.
Google Scholar
Consumer Health Information Corporation. Motivating patients to use smartphone health apps, 2011. URL: http://www.consumer-health.com/motivating-patients-to-use-smartphone-health-apps/, accessed: June 30, 2016.
Huitian Lei. An Online Actor Critic Algorithm and a Statistical Decision Procedure for Personalizing Intervention. PhD thesis, University of Michigan, 2016.
Google Scholar
Wei Chu, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandits with linear payoff functions. In International Conference on Artificial Intelligence and Statistics, pages 208–214, 2011.
Google Scholar
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learning Research, 3:397–422, 2003.
MathSciNet MATH Google Scholar
Philip M. Long. On-line evaluation and prediction using linear functions. In Proceedings of the tenth annual conference on Computational learning theory, pages 21–31. ACM, 1997.
Google Scholar
Sarah Filippi, Olivier Cappe, Aurélien Garivier, and Csaba Szepesvári. Parametric bandits: The generalized linear case. In Advances in Neural Information Processing Systems, pages 586–594, 2010.
Google Scholar
Michal Valko, Nathan Korda, Rémi Munos, Ilias Flaounas, and Nello Cristianini. Finite-time analysis of kernelised contextual bandits. In Uncertainty in Artificial Intelligence, page 654, 2013.
Google Scholar
Tyler Lu, Dávid Pál, and Martin Pál. Contextual multi-armed bandits. In International Conference on Artificial Intelligence and Statistics, pages 485–492, 2010.
Google Scholar
Cem Tekin and Mihaela van der Schaar. RELEAF: An algorithm for learning and exploiting relevance. IEEE Journal of Selected Topics in Signal Processing, 9(4):716–727, June 2015.
Google Scholar
Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4):1221–1243, 2014.
Article MathSciNet MATH Google Scholar
Steven L. Scott. A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639–658, 2010.
Article MathSciNet Google Scholar
Shipra Agrawal and Navin Goyal. Thompson sampling for contextual bandits with linear payoffs. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 127–135, 2013.
Google Scholar
Benedict C. May, Nathan Korda, Anthony Lee, and David S. Leslie. Optimistic Bayesian sampling in contextual-bandit problems. The Journal of Machine Learning Research, 13(1):2069–2106, 2012.
MathSciNet MATH Google Scholar
Saul Shiffman. Dynamic influences on smoking relapse process. Journal of Personality, 73(6):1715–1748, 2005.
Article Google Scholar
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77, 2002.
Article MathSciNet MATH Google Scholar
Jean-Yves Audibert and Sébastien Bubeck. Minimax policies for adversarial and stochastic bandits. In Proceedings of the 22nd Annual Conference on Learning Theory, 2004.
Google Scholar
Jacob Abernethy, Chansoo Lee, and Ambuj Tewari. Fighting bandits with a new kind of smoothness. In Advances in Neural Information Processing Systems 28, pages 2188–2196, 2015.
Google Scholar
Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, volume 15 of JMLR Workshop and Conference Proceedings, pages 19–26, 2011.
Google Scholar
Predrag Klasnja, Eric B. Hekler, Saul Shiffman, Audrey Boruvka, Daniel Almirall, Ambuj Tewari, and Susan A. Murphy. Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. Health Psychology, 34(Suppl):1220–1228, Dec 2015.
Google Scholar
John Langford, Alexander Strehl, and Jennifer Wortman. Exploration scavenging. In Proceedings of the 25th international conference on Machine learning, pages 528–535. ACM, 2008.
Google Scholar
Alex Strehl, John Langford, Lihong Li, and Sham M. Kakade. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems, pages 2217–2225, 2010.
Google Scholar
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 297–306. ACM, 2011.
Google Scholar
Lihong Li, Wei Chu, John Langford, Taesup Moon, and Xuanhui Wang. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2 July 2, 2011, Bellevue, Washington, USA, volume 26 of JMLR Workshop and Conference Proceedings, pages 19–36, 2012.
Google Scholar
Miroslav Dudík, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on Machine Learning, pages 1097–1104, 2011.
Google Scholar
Min Qian and Susan A. Murphy. Performance guarantees for individualized treatment rules. Annals of Statistics, 39(2):1180, 2011.
Google Scholar
Yingqi Zhao, Donglin Zeng, A. John Rush, and Michael R. Kosorok. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118, 2012.
Google Scholar
Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian. A robust method for estimating optimal treatment regimes. Biometrics, 68(4):1010–1018, 2012.
Article MathSciNet MATH Google Scholar
Baqun Zhang, Anastasios A. Tsiatis, Marie Davidian, Min Zhang, and Eric Laber. Estimating optimal treatment regimes from a classification perspective. Stat, 1(1):103–114, 2012.
Article MATH Google Scholar
Amir Sani, Alessandro Lazaric, and Rémi Munos. Risk-aversion in multi-armed bandits. In Advances in Neural Information Processing Systems, pages 3275–3283, 2012.
Google Scholar
Sattar Vakili and Qing Zhao. Mean-variance and value at risk in multi-armed bandit problems. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1330–1335. IEEE, 2015.
Google Scholar

Download references

Acknowledgements

This work was supported by awards R01 AA023187 and R01 HL125440 from the National Institutes of Health, and CAREER award IIS-1452099 from the National Science Foundation.

Author information

Authors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Ambuj Tewari & Susan A. Murphy

Authors

Ambuj Tewari
View author publications
You can also search for this author in PubMed Google Scholar
Susan A. Murphy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ambuj Tewari .

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, Atlanta, Georgia, USA
James M. Rehg
Department of Statistics, University of Michigan, Ann Arbor, Michigan, USA
Susan A. Murphy
Department of Computer Science, University of Memphis, Memphis, Tennessee, USA
Santosh Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tewari, A., Murphy, S.A. (2017). From Ads to Interventions: Contextual Bandits in Mobile Health. In: Rehg, J., Murphy, S., Kumar, S. (eds) Mobile Health. Springer, Cham. https://doi.org/10.1007/978-3-319-51394-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-51394-2_25
Published: 13 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51393-5
Online ISBN: 978-3-319-51394-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics