Abstract
Personalization has become a focal point of modern revenue management. However, it is often the case that minimal data are available to appropriately make suggestions tailored to each customer. This has led to many products making use of reinforcement learning-based algorithms to explore sets of offerings to find the best suggestions to improve conversion and revenue. Arguably the most popular of these algorithms are built on the foundation of the multi-arm bandit framework, which has shown great success across a variety of use cases. A general multi-arm bandit algorithm aims to trade-off adaptively exploring available, but under observed, recommendations, with the current known best offering. While much success has been achieved with these relatively understandable procedures, much of the airline industry is losing out on better personalized offers by ignoring the context of the transaction, as is the case in the traditional multi-arm bandit setup. Here, we explore a popular exploration heuristic, Thompson sampling, and note implementation details for multi-arm and contextual bandit variants. While the contextual bandit requires greater computational and technical complexity to include contextual features in the decision process, we illustrate the value it brings by the improvement in overall expected
Similar content being viewed by others
References
Agrawal, S., Goyal, N. 2012. Analysis of thompson sampling for the multi-armed bandit problem. In Conference on learning theory (PP. 39–1).
Audibert, J.Y., R. Munos, and C. Szepesvári. 2009. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science 410 (19): 1876–1902.
Chapelle, O., Li, L. 2011. An empirical evaluation of thompson sampling. In Advances in neural information processing systems (pp. 2249–2257).
Choi, H.M., J.C. Román, et al. 2017. Analysis of polya-gamma GIBBS sampler for Bayesian logistic analysis of variance. Electronic Journal of Statistics 11 (1): 326–337.
Dubé, J- P., Misra, S. 2019. Personalized pricing and customer welfare. Chicago Booth School of Business Working Paper.
Dumitrascu, B., Feng, K., Engelhardt, B. 2018. Pg-ts: Improved thompson sampling for logistic contextual bandits. In Advances in neural information processing systems (pp. 4624–4633).
Ferreira, K.J., D. Simchi-Levi, and H. Wang. 2018. Online network revenue management using Thompson sampling. Operations Research 66 (6): 1586–1602.
Garivier, A., Cappé, O. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th annual conference on learning theory (pp. 359–376).
Gelman, A., J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B. Rubin. 2013. Bayesian data analysis. Boca Raton: CRC Press.
Green, P.E., A.M. Krieger, and Y. Wind. 2001. Thirty years of conjoint analysis: Reflections and prospects. Interfaces 31 (3): S56–S73.
Joulani, P., Gyorgy, A., Szepesvári, C. 2013. Online learning under delayed feedback. In International conference on machine learning (pp. 1453–1461).
Karnin, Z., Koren, T., Somekh, O. 2013. Almost optimal exploration in multi-armed bandits. In International conference on machine learning (pp. 1238–1246).
Kaufmann, E., Korda, N., Munos, R. 2012. Thompson sampling: An asymptotically optimal finite-time analysis. In International conference on algorithmic learning theory (pp. 199–213).
Li, L., Chu, W., Langford, J., Schapire, R. E. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on world wide web (pp. 661–670).
Makalic, E., Schmidt, D. n.d. High-dimensional Bayesian regularised regression with the bayesreg package. arXiv:1611.06649v3
Polson, N.G., J.G. Scott, and J. Windle. 2013. Bayesian inference for logistic models using pólya-gamma latent variables. Journal of the American Statistical Association 108 (504): 1339–1349.
Riquelme, C., Tucker, G., Snoek, J. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127.
Vinod, B., R. Ratliff, and V. Jayaram. 2018. An approach to offer management: Maximizing sales with fare products and ancillaries. Journal of Revenue and Pricing Management 17 (2): 91–101.
Whittle, P. 1980. Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society 42 (2): 143–149.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Byrd, M., Darrow, R. A note on the advantage of context in Thompson sampling. J Revenue Pricing Manag 20, 316–321 (2021). https://doi.org/10.1057/s41272-021-00314-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/s41272-021-00314-1