Abstract
Contextual multi-armed bandit algorithms are widely used to solve online decision-making problems. However, traditional methods assume linear rewards and low dimensional contextual information, leading to high regrets and low online efficiency in real-world applications. In this paper, we propose a novel framework called interconnected neural-linear UCB (InlUCB) that interleaves two learning processes: an offline representation learning part, to convert the original contextual information to low-dimensional latent features via non-linear transformation, and an online exploration part, to update a linear layer using upper confidence bound (UCB). These two processes produce an effective and efficient strategy for online decision-making problems with non-linear rewards and high dimensional contexts. We derive a general expression of the finite-time cumulative regret bound of InlUCB. We also give a tighter regret bound under certain assumptions on neural networks. We test InlUCB against state-of-the-art bandit methods on synthetic and real-world datasets with non-linear rewards and high dimensional contexts. Results demonstrate that InlUCB significantly improves the performance on cumulative regrets and online efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, S., Goyal, N.: Analysis of thompson sampling for the multi-armed bandit problem. J. Mach. Learn. Res. 23(4), 357–364 (2011)
Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML, pp. 127–135 (2013)
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)
Bastani, H., Bayati, M.: Online decision making with high-dimensional covariates. Oper. Res. 68(1), 276–294 (2020)
Chu, W., Li, L., Reyzin, L., Schapire, R.: Contextual bandits with linear payoff functions. In: AISTATS, pp. 208–214 (2011)
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ghosh, A., Chowdhury, S.R., Gopalan, A.: Misspecified linear bandits. In: AAAI, pp. 3761–3767 (2017)
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. In: NIPS, pp. 8580–8589 (2018)
Kim, G.S., Paik, M.C.: Doubly-robust lasso bandit. In: NIPS, pp. 5869–5879 (2019)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW, pp. 661–670 (2010)
Li, L., Lu, Y., Zhou, D.: Provably optimal algorithms for generalized linear contextual bandits. In: ICML, pp. 2071–2080 (2017)
Riquelme, C., Tucker, G., Snoek, J.: Deep Bayesian bandits showdown. In: ICLR (2018)
Valko, M., Korda, N., Munos, R., Flaounas, I., Cristianini, N.: Finite-time analysis of kernelised contextual bandits. In: UAI, pp. 654–663 (2013)
Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M., Lakshmanan, L.V., Schmidt, M.: Model-independent online learning for influence maximization. In: ICML, pp. 3530–3539 (2017)
Wang, X., Wei, M., Yao, T.: Minimax concave penalized multi-armed bandit model with high-dimensional covariates. In: ICML, pp. 5200–5208 (2018)
Weng, J., Hwang, W.S.: Online image classification using IHDR. Int. J. Doc. Anal. Recogn. 5(2–3), 118–125 (2003)
Xie, M., Yin, W., Xu, H.: AutoBandit: a meta bandit online learning system. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 5028–5031 (2021)
Yu, X., Lyu, M.R., King, I.: CBRAP: contextual bandits with random projection. In: AAAI (2017)
Zahavy, T., Mannor, S.: Deep neural linear bandits: overcoming catastrophic forgetting through likelihood matching. arXiv preprint arXiv:1901.08612 (2019)
Zeng, Z., Li, X., Ma, X., Ji, Q.: Adaptive context recognition based on audio signal. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Zhou, D., Li, L., Gu, Q.: Neural contextual bandits with UCB-based exploration. In: ICML, pp. 11492–11502 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Y., Xie, M., Liu, J., Zhao, K. (2022). Interconnected Neural Linear Contextual Bandits with UCB Exploration. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-05933-9_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05932-2
Online ISBN: 978-3-031-05933-9
eBook Packages: Computer ScienceComputer Science (R0)