Interconnected Neural Linear Contextual Bandits with UCB Exploration

Chen, Yang; Xie, Miao; Liu, Jiamou; Zhao, Kaiqi

doi:10.1007/978-3-031-05933-9_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13280))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2892 Accesses

Abstract

Contextual multi-armed bandit algorithms are widely used to solve online decision-making problems. However, traditional methods assume linear rewards and low dimensional contextual information, leading to high regrets and low online efficiency in real-world applications. In this paper, we propose a novel framework called interconnected neural-linear UCB (InlUCB) that interleaves two learning processes: an offline representation learning part, to convert the original contextual information to low-dimensional latent features via non-linear transformation, and an online exploration part, to update a linear layer using upper confidence bound (UCB). These two processes produce an effective and efficient strategy for online decision-making problems with non-linear rewards and high dimensional contexts. We derive a general expression of the finite-time cumulative regret bound of InlUCB. We also give a tighter regret bound under certain assumptions on neural networks. We test InlUCB against state-of-the-art bandit methods on synthetic and real-world datasets with non-linear rewards and high dimensional contexts. Results demonstrate that InlUCB significantly improves the performance on cumulative regrets and online efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, S., Goyal, N.: Analysis of thompson sampling for the multi-armed bandit problem. J. Mach. Learn. Res. 23(4), 357–364 (2011)
Google Scholar
Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML, pp. 127–135 (2013)
Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3(Nov), 397–422 (2002)
MathSciNet MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article Google Scholar
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39(3), 930–945 (1993)
Article MathSciNet Google Scholar
Bastani, H., Bayati, M.: Online decision making with high-dimensional covariates. Oper. Res. 68(1), 276–294 (2020)
Article MathSciNet Google Scholar
Chu, W., Li, L., Reyzin, L., Schapire, R.: Contextual bandits with linear payoff functions. In: AISTATS, pp. 208–214 (2011)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ghosh, A., Chowdhury, S.R., Gopalan, A.: Misspecified linear bandits. In: AAAI, pp. 3761–3767 (2017)
Google Scholar
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: convergence and generalization in neural networks. In: NIPS, pp. 8580–8589 (2018)
Google Scholar
Kim, G.S., Paik, M.C.: Doubly-robust lasso bandit. In: NIPS, pp. 5869–5879 (2019)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: WWW, pp. 661–670 (2010)
Google Scholar
Li, L., Lu, Y., Zhou, D.: Provably optimal algorithms for generalized linear contextual bandits. In: ICML, pp. 2071–2080 (2017)
Google Scholar
Riquelme, C., Tucker, G., Snoek, J.: Deep Bayesian bandits showdown. In: ICLR (2018)
Google Scholar
Valko, M., Korda, N., Munos, R., Flaounas, I., Cristianini, N.: Finite-time analysis of kernelised contextual bandits. In: UAI, pp. 654–663 (2013)
Google Scholar
Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M., Lakshmanan, L.V., Schmidt, M.: Model-independent online learning for influence maximization. In: ICML, pp. 3530–3539 (2017)
Google Scholar
Wang, X., Wei, M., Yao, T.: Minimax concave penalized multi-armed bandit model with high-dimensional covariates. In: ICML, pp. 5200–5208 (2018)
Google Scholar
Weng, J., Hwang, W.S.: Online image classification using IHDR. Int. J. Doc. Anal. Recogn. 5(2–3), 118–125 (2003)
Google Scholar
Xie, M., Yin, W., Xu, H.: AutoBandit: a meta bandit online learning system. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pp. 5028–5031 (2021)
Google Scholar
Yu, X., Lyu, M.R., King, I.: CBRAP: contextual bandits with random projection. In: AAAI (2017)
Google Scholar
Zahavy, T., Mannor, S.: Deep neural linear bandits: overcoming catastrophic forgetting through likelihood matching. arXiv preprint arXiv:1901.08612 (2019)
Zeng, Z., Li, X., Ma, X., Ji, Q.: Adaptive context recognition based on audio signal. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4. IEEE (2008)
Google Scholar
Zhou, D., Li, L., Gu, Q.: Neural contextual bandits with UCB-based exploration. In: ICML, pp. 11492–11502 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Auckland, Auckland, New Zealand
Yang Chen, Jiamou Liu & Kaiqi Zhao
Kuaishou Technology Co. Ltd., Beijing, China
Miao Xie

Authors

Yang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Miao Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jiamou Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kaiqi Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Chen .

Editor information

Editors and Affiliations

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
João Gama
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Tianrui Li
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Yang Yu
School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
Enhong Chen
JD iCity, JD Technology & JD Intelligent Cities Research, Beijing, China
Yu Zheng
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
Fei Teng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Xie, M., Liu, J., Zhao, K. (2022). Interconnected Neural Linear Contextual Bandits with UCB Exploration. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2022. Lecture Notes in Computer Science(), vol 13280. Springer, Cham. https://doi.org/10.1007/978-3-031-05933-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-05933-9_14
Published: 10 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-05932-2
Online ISBN: 978-3-031-05933-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics