Skip to main content
Log in

Counterfactual contextual bandit for recommendation under delayed feedback

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The recommendation system has far-reaching significance and great practical value, which alleviates people’s troubles about choosing from a huge amount of information. The existing recommendation system usually faces the selection bias problem due to the ignorance of samples with delayed feedback. To alleviate this problem, by modeling the recommendation as a batch contextual bandit problem, we propose a counterfactual reward estimation approach in this work. First, we formalize the counterfactual problem as “would the user be interested in the recommended item if the delayed time is before the collection time point?". The above counterfactual reward is estimated in a survival analysis framework, by fully exploring the causal generation process of user feedback on batch data. Second, based on the above estimated counterfactual rewards, the policy of batch contextual bandit is updated for online recommendation in the next episode. Third, new batch data are generated in the online recommendation for further counterfactual reward estimation. The above three steps are iteratively conducted until the optimal policy is learned. We also prove the sub-linear regret bound of the learned bandit policy theoretically. Our method achieved a \(4\%\) improvement in average reward compared to the baseline methods in experiments conducted on synthetic and Criteo datasets, demonstrating the efficacy of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data sets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Wu L, He X, Wang X, Zhang K, Wang M (2023) A survey on accuracy-oriented neural recommendation: from collaborative filtering to information-rich recommendation. IEEE Trans Knowl Data Eng 35(5):4425–4445. https://doi.org/10.1109/TKDE.2022.3145690

    Article  Google Scholar 

  2. Wang S, Hu L, Wang Y, Cao L, Sheng QZ, Orgun M (2019) Sequential recommender systems: Challenges, progress and prospects. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 6332–6338. https://doi.org/10.24963/ijcai.2019/883

  3. Zheng G, Zhang F, Zheng Z, Xiang Y, Yuan NJ, Xie X, Li Z (2018) Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, 167–176. https://doi.org/10.1145/3178876.3185994

  4. Shams S, Anderson D, Leith D (2021) Cluster-based bandits: Fast cold-start for recommender system new users. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/3404835.3463033

  5. Wu S, Wang Y, Jing Q, Dong D, Dou D, Yao Q (2023) Coldnas: Search to modulate for user cold-start recommendation. In: Proceedings of the ACM Web Conference 2023, pp. 1021–1031. https://doi.org/10.1145/3543507.3583344

  6. Chu Z, Wang H, Xiao Y, Long B, Wu L (2023) Meta policy learning for cold-start conversational recommendation. In: Proceedings of the ACM Web Conference 2023, pp. 1021–1031. https://doi.org/10.1145/3539597.3570443

  7. Alabduljabbar R, Alshareef M, Alshareef N (2023) Time-aware recommender systems: A comprehensive survey and quantitative assessment of literature. IEEE Access 45586–45604. https://doi.org/10.1109/ACCESS.2023.3274117

  8. Ghouchan Nezhad Noor Nia R, Jalali M (2022) Recmem: Time aware recommender systems based on memetic evolutionary clustering algorithm. Computational Intelligence and Neuroscience. https://doi.org/10.1155/2022/8714870

  9. Joulani P, Gyorgy A, Szepesvári C (2013) Online learning under delayed feedback. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, pp. 1453–1461

  10. Pike-Burke C, Agrawal S, Szepesvari C, Grunewalder S (2018) Bandits with delayed, aggregated anonymous feedback. In: International Conference on Machine Learning, pp. 4105–4113

  11. Grover A, Markov T, Attia P, Jin N, Perkins N, Cheong B, Chen M, Yang Z, Harris S, Chueh W, Ermon S (2018) Best arm identification in multi-armed bandits with delayed feedback. In: Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, pp. 833–842

  12. Neu G, Olkhovskaya J (2020) Efficient and robust algorithms for adversarial linear contextual bandits. In: Proceedings of Thirty Third Conference on Learning Theory 3049–3068

  13. Zhang X, Jia H, Su H, Wang W, Xu J, Wen J-R (2021) Counterfactual reward modification for streaming recommendation with delayed feedback. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41–50. https://doi.org/10.1145/3404835.3462892

  14. Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 661–670. https://doi.org/10.1145/1772690.1772758

  15. Li S, Karatzoglou A, Gentile C (2016) Collaborative filtering bandits. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 539–548. https://doi.org/10.1145/2911451.2911548

  16. Vernade C, Cappé O, Perchet V (2017) Stochastic bandit models for delayed conversions. In: UAI

  17. Zhang K, Janson L, Murphy S (2020) Inference for batched bandits. Advances in neural information processing systems 9818–9829

  18. Héliou A, Mertikopoulos P, Zhou Z (2020) Gradient-free online learning in continuous games with delayed rewards. In: International Conference on Machine Learning 4172–4181

  19. Advances and challenges in conversational recommender systems (2021) A survey. AI Open 100–126. https://doi.org/10.1016/j.aiopen.2021.06.002

  20. Xu Y, Chen N, Fernandez A, Sinno O, Bhasin A (2015) From infrastructure to culture: A/b testing challenges in large scale social networks. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2227–2236. https://doi.org/10.1145/2783258.2788602

  21. Schnabel T, Bennett PN, Dumais ST, Joachims T (2018) Short-term satisfaction and long-term coverage: Understanding how users tolerate algorithmic exploration. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 513–521. https://doi.org/10.1145/3159652.3159700

  22. He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, Atallah A, Herbrich R, Bowers S, Quiñonero-Candela J (2014) Practical lessons from predicting clicks on ads at facebook. In: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, pp. 5–159. https://doi.org/10.1145/2648584.2648589

  23. Wang W, Lin X, Feng F, He X, Lin M, Chua T-S (2022) Causal representation learning for out-of-distribution recommendation. In: Proceedings of the ACM Web Conference 2022, pp. 3562–3571. https://doi.org/10.1145/3485447.3512251

  24. Khaledian N, Mardukhi F (2022) Cfmt: a collaborative filtering approach based on the nonnegative matrix factorization technique and trust relationships. Journal of Ambient Intelligence and Humanized Computing, 1–17 . https://doi.org/10.1007/s12652-021-03368-6

  25. Khaledian N, Nazari A, Khamforoosh K, Abualigah L, Javaheri D (2023) Trustdl: use of trust-based dictionary learning to facilitate recommendation in social networks. Exp Syst with Appl 128:120487. https://doi.org/10.1016/j.eswa.2023.120487

    Article  Google Scholar 

  26. Heidari N, Moradi P, Koochari A (2022) An attention-based deep learning method for solving the cold-start and sparsity issues of recommender systems. Knowl Based Syst 256:109835. https://doi.org/10.1016/j.knosys.2022.109835

    Article  Google Scholar 

  27. Sánchez-Moreno D, Zheng Y, Moreno-García MN (2020) Time-aware music recommender systems: modeling the evolution of implicit user preferences and user listening habits in a collaborative filtering approach. Appl Sci 10(15):5324. https://doi.org/10.3390/app10155324

    Article  Google Scholar 

  28. Bao J, Zhang Y (2021) Time-aware recommender system via continuous-time modeling. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2872–2876 . https://doi.org/10.1145/3459637.3482202

  29. Wang Y, Liang D, Charlin L, Blei DM (2020) Causal inference for recommender systems. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 426–431. https://doi.org/10.1145/3383313.3412225

  30. Wang W, Zhang Y, Li H, Wu P, Feng F, He X (2023) Causal recommendation: Progresses and future directions. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3432–3435. https://doi.org/10.1145/3539618.3594245

  31. Li Q, Wang X, Wang Z, Xu G (2023) Be causal: de-biasing social network confounding in recommendation. ACM Trans Knowl Discover Data 17(1):1–23. https://doi.org/10.1145/3533725

    Article  Google Scholar 

  32. Wei T, Feng F, Chen J, Wu Z, Yi J, He X (2021) Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 1791–1800. https://doi.org/10.1145/3447548.3467289

  33. Zhang Y, Feng F, He X, Wei T, Song C, Ling G, Zhang Y (2021) Causal intervention for leveraging popularity bias in recommendation. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–20. https://doi.org/10.1145/3404835.3462875

  34. He X, An B, Li Y, Chen H, Guo Q, Li X, Wang Z (2020) Contextual user browsing bandits for large-scale online mobile recommendation. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 63–72. https://doi.org/10.1145/3383313.3412234

  35. Guo D, Ktena SI, Myana PK, Huszar F, Shi W, Tejani A, Kneier M, Das S (2020) Deep bayesian bandits: Exploring in online personalized recommendations. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 456–461. https://doi.org/10.1145/3383313.3412214

  36. Yao T, Yi X, Cheng DZ, Yu F, Chen T, Menon A, Hong L, Chi EH, Tjoa S, Kang J, Ettinger E (2021) Self-supervised learning for large-scale item recommendations. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4321–4330. https://doi.org/10.1145/3459637.3481952

  37. Zhang Y, Cheng DZ, Yao T, Yi X, Hong L, Chi EH (2021) A model of two tales: Dual transfer learning framework for improved long-tail item recommendation. In: Proceedings of the Web Conference 2021, pp. 2220–2231. https://doi.org/10.1145/3442381.3450086

  38. Liu S, Zheng Y (2020) Long-tail session-based recommendation. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 509–514. https://doi.org/10.1145/3383313.3412222

  39. Barraza-Urbina A, Glowacka D (2020) Introduction to bandits in recommender systems. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp. 748–750. https://doi.org/10.1145/3383313.3411547

  40. Kuang K, Li L, Geng Z, Xu L, Zhang K, Liao B, Huang H, Ding P, Miao W, Jiang Z (2020) Causal inference. Engineering, 253–263 . https://doi.org/10.1016/j.eng.2019.08.016

  41. Pearl J, Mackenzie D (2018) The book of why: The new science of cause and effect. Science 855. https://doi.org/10.1126/science.aau9731

  42. Pearl J (2009) Causality: models, reasoning, and inference. Cambridge University Press, Cambridge, UK

    Book  Google Scholar 

  43. Yao L, Chu Z, Li S, Li Y, Gao J, Zhang A (2021) A survey on causal inference. ACM Trans Knowl Discov Data (TKDD) 15(5):1–46. https://doi.org/10.1145/3444944

    Article  Google Scholar 

  44. Glass TA, Goodman SN, Hernán MA, Samet JM (2013) Causal inference in public health. Annual Rev Public Health 34:61–75. https://doi.org/10.1146/annurev-publhealth-031811-124606

    Article  Google Scholar 

  45. Pearl J (2009) Causal inference in statistics: An overview. Stat Surv 3:96–146

    Article  MathSciNet  Google Scholar 

  46. Peters J, Janzing D, Schlkopf B (2017) Elements of causal inference: Foundations and learning algorithms

  47. Chernozhukov V, Fernández-Val I, Melly B (2013) Inference on counterfactual distributions. Econometrica 81(6):2205–2268. https://doi.org/10.3982/ECTA10582

    Article  MathSciNet  Google Scholar 

  48. Saito Y, Joachims T (2021) Counterfactual learning and evaluation for recommender systems: Foundations, implementations, and recent advances. Fifteenth ACM Conference on Recommender Systems 828–830. https://doi.org/10.1145/3460231.3473320

  49. Yang M, Dai Q, Dong Z, Chen X, He X, Wang J (2021) Top-n recommendation with counterfactual user preference simulation. CIKM ’21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1 - 5, 2021, 2342–2351. https://doi.org/10.1145/3459637.3482305

  50. Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 208–214

  51. Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, pp. 127–135

  52. Han Y, Zhou Z, Zhou Z, Blanchet JH, Glynn PW, Ye Y (2020) Sequential batch learning in finite-action linear contextual bandits. ArXiv . https://doi.org/10.48550/arXiv.2004.06321

  53. Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. The Annals of Statistics 660–681

  54. Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 503–513

  55. Yoshikawa Y, Imai Y (2018) A nonparametric delayed feedback model for conversion rate prediction. ArXiv. https://doi.org/10.48550/arXiv.1802.00255

  56. Jenkins SP (2005) Survival analysis. Unpublished manuscript, Institute for Social and Economic Research, University of Essex, Colchester, UK

  57. Sinha NK, Griscik MP (1971) A stochastic approximation method. IEEE Trans Syst, Man, Cybernet 4:338–344. https://doi.org/10.1109/TSMC.1971.4308316

    Article  MathSciNet  Google Scholar 

  58. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7):2121–2159

    MathSciNet  Google Scholar 

  59. Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. CoRR. https://doi.org/10.48550/arXiv.1412.6980

  60. Wang C-H, Cheng G (2020) Online batch decision-making with high-dimensional covariates. International Conference on Artificial Intelligence and Statistics, 3848–3857

  61. Walsh TJ, Szita I, Diuk C, Littman ML (2009) Exploring compact reinforcement-learning representations with linear regression. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 591–598

  62. Balseiro S, Golrezaei N, Mahdian M, Mirrokni V, Schneider J (2019) Contextual bandits with cross-learning. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 9679–9688

  63. Sezener E, Hutter M, Budden D, Wang J, Veness J (2020) Online learning in contextual bandits using gated linear networks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 19467–19477

  64. Bistritz I, Zhou Z, Chen X, Bambos N, Blanchet J (2019) Online exp3 learning in adversarial bandits with delayed feedback. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 11349–11358

  65. Chapelle O (2014) Modeling delayed feedback in display advertising. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1097–1105. https://doi.org/10.1145/2623330.2623634

  66. Vakili S, Ahmed D, Bernacchia A, Pike-Burke C (2023) Delayed feedback in kernel bandits. In: Proceedings of the 40th International Conference on Machine Learning

  67. Saito Y, Morisihta G, Yasui S (2020) Dual learning algorithm for delayed conversions. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1849–1852. https://doi.org/10.1145/3397271.3401282

Download references

Acknowledgements

This research was supported in part by the National Science and Technology Major Project (2021ZD0111500), the National Science Fund for Excellent Young Scholars (62122022), Natural Science Foundation of China (62206064), the major key project of PCL (PCL2021A12).

Author information

Authors and Affiliations

Authors

Contributions

Ruichu Cai: Conceptualization, Methodology, Investigation Supervision, Project administration, Funding acquisition; Ruming Lu: Methodology, Software, Writing - Original Draft, Visualization; Wei Chen: Methodology, Formal analysis, Writing - Review & Editing, Funding acquisition; Zhifeng Hao: Investigation, Supervision, Funding acquisition.

Corresponding author

Correspondence to Wei Chen.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, R., Lu, R., Chen, W. et al. Counterfactual contextual bandit for recommendation under delayed feedback. Neural Comput & Applic (2024). https://doi.org/10.1007/s00521-024-09800-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00521-024-09800-0

Keywords

Navigation