Skip to main content
Log in

CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Sequential recommendations play a crucial role in many real-world applications. Due to the sequential nature, reinforcement learning has been employed to iteratively produce recommendations based on an observed stream of user behavior. In this setting, a recommendation agent interacts with the environments (users) by sequentially recommending items (actions) to maximize users’ overall long-term cumulative rewards. However, most reinforcement learning-based recommendation models only focus on extrinsic rewards based on user feedback, leading to sub-optimal policies if user-item interactions are sparse and fail to obtain the dynamic rewards based on the users’ preferences. As a remedy, we propose a dynamic intrinsic reward signal integrated with a contrastive discriminator-augmented reinforcement learning framework. Concretely, our framework contains two modules: (1) a contrastive learning module is employed to learn the representation of item sequences; (2) an intrinsic reward learning function to imitate the user’s internal dynamics. Furthermore, we combine static extrinsic reward and dynamic intrinsic reward to train a sequential recommender system based on double Q-learning. We integrate our framework with five representative sequential recommendation models. Specifically, our framework augments these recommendation models with two output layers: the supervised layer that applies cross-entropy loss to perform ranking and the other for reinforcement learning. Experimental results on two real-world datasets demonstrate that the proposed framework outperforms several sequential recommendation baselines and exploration with intrinsic reward baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Availability of data and materials

The datasets can be found in https://recsys.acm.org/recsys15/challenge/ and https://www.kaggle.com/retailrocket/ecommerce-dataset.

References

  1. Burda Y, Edwards H, Pathak D, Storkey AJ, Darrell T, Efros AA (2019) Large-scale study of curiosity-driven learning. ICLR (poster). OpenReview.net

  2. Chen M, Beutel A, Covington P, Jain S, Belletti F, Chi EH (2019) Top-k off-policy correction for a REINFORCE recommender system. In: WSDM. ACM, pp 456–464

  3. Chen T, Kornblith S, Norouzi M, Hinton GE (2020) A simple framework for contrastive learning of visual representations. In: ICML, vol 119. PMLR, pp 1597–1607

  4. Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE (2020) Big self-supervised models are strong semi-supervised learners. In: Neurips

  5. Chen T, Wong RC (2020) Handling information loss of graph neural networks for session-based recommendation. In: KDD. ACM, pp 1172–1180

  6. Chen X, Fan H, Girshick RB, He K (2020) Improved baselines with momentum contrastive learning. CoRR, arXiv:2003.04297

  7. Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (1). Association for Computational Linguistics, pp 4171–4186

  8. Glorot X, Bengio Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In: AISTATS, vol 9. JMLR.org, pp 249–256

  9. Haghgoo B, Zhou A, Sharma A, Finn C (2021) Discriminator augmented model-based reinforcement learning. CoRR, arXiv:2103.12999

  10. Hassani K, Ahmadi AHK (2020) Contrastive multi-view representation learning on graphs. In: ICML, vol 119. PMLR, pp 4116–4126

  11. He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: CVPR. IEEE, pp 9726–9735

  12. Hénaff OJ (2020) Data-efficient image recognition with contrastive predictive coding. In: ICML, vol 119. PMLR, pp 4182–4192

  13. Hidasi B, Karatzoglou A (2018) Recurrent neural networks with top-k gains for session-based recommendations. In: CIKM. ACM, pp 843–852

  14. Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. In: ICLR (poster)

  15. Kang W, McAuley JJ (2018) Self-attentive sequential recommendation. In: ICDM. IEEE Computer Society, pp 197–206

  16. Kim H, Kim J, Jeong Y, Levine S, Song HO (2019) EMI: exploration with mutual information. In: ICML, vol 97. PMLR, pp 3360–3369

  17. Kumar NM (2018) Empowerment-driven exploration using mutual information estimation. CoRR, arXiv:1810.05533

  18. Kurbiel T, Khaleghian S (2017) Training of deep neural networks based on distance measures using rmsprop. arXiv preprint arXiv:1708.01911

  19. Lei Y, Pei H, Yan H, Li W (2020) Reinforcement learning based recommendation with graph convolutional q-network. In: SIGIR. ACM, pp 1757–1760

  20. Li J, Ren P, Chen Z, Ren Z, Lian T, Ma J (2017) Neural attentive session-based recommendation. In: CIKM. ACM, pp 1419–1428

  21. Li Y, Gu C, Dullien T, Vinyals O, Kohli P (2019) Graph matching networks for learning the similarity of graph structured objects. In: ICML, vol 97. PMLR, pp 3835–3845

  22. Lydia A, Francis S (2019) Adagrad—an optimizer for stochastic gradient descent. Int J Inf Comput Sci 6(5):566–568

    Google Scholar 

  23. Ma J, Zhao Z, Yi X, Yang J, Chen M, Tang J, Chi EH (2020) Off-policy learning in two-stage recommender systems. In: WWW. ACM/IW3C2, pp 463–473

  24. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, pp 3111–3119

  25. Pathak D, Agrawal P, Efros AA, Darrell T (2017) Curiositydriven exploration by self-supervised prediction. In: ICML, vol 70. PMLR, pp 2778–2787

  26. Sun F, Hoffmann J, Verma V, Tang J (2020) Infograph: unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In: ICLR. OpenReview.net

  27. Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding. In: WSDM. ACM, pp 565–573

  28. van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. CoRR, arXiv:1807.03748

  29. van Hasselt H (2010) Double q-learning. In: NIPS. Curran Associates, Inc., pp 2613–2621

  30. Wang S, Hu L,Wang Y, Cao L, Sheng QZ, Orgun MA (2019) Sequential recommender systems: challenges, progress and prospects. In: IJCAI. ijcai.org, pp 6332–6338

  31. Wang W, Zhang W, Liu S, Liu Q, Zhang B, Lin L, Zha H (2020) Beyond clicks: modeling multi-relational item graph for session-based target behavior prediction. In: WWW. ACM/IW3C2, pp 3056–3062

  32. Wang Z, Wei W, Cong G, Li X, Mao X, Qiu M (2020) Global context enhanced graph neural networks for session-based recommendation. In: SIGIR. ACM, pp 169–178

  33. Wu S, Tang Y, Zhu Y,Wang L, Xie X, Tan T (2019) Session-based recommendation with graph neural networks. In: AAAI. AAAI Press, pp 346–353

  34. Xian Y, Fu Z, Muthukrishnan S, de Melo G, Zhang Y (2019) Reinforcement knowledge graph reasoning for explainable recommendation. In: SIGIR. ACM, pp 285–294

  35. Xie X, Sun F, Liu Z, Gao J, Ding B, Cui B (2020) Contrastive pre-training for sequential recommendation. CoRR, arXiv:2010.14395

  36. Xin X, Karatzoglou A, Arapakis I, Jose JM (2020) Self-supervised reinforcement learning for recommender systems. In: SIGIR. ACM, pp 931–940

  37. Yu X, Lyu Y, Tsang IW (2020) Intrinsic reward driven imitation learning via generative model. In: ICML, vol 119. PMLR, pp 10925–10935

  38. Yuan F, Karatzoglou A, Arapakis I, Jose JM, He X (2019) A simple convolutional generative network for next item recommendation. In: WSDM. ACM, pp 582–590

  39. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701

  40. Zhao X, Zhang L, Ding Z, Xia L, Tang J, Yin D (2018) Recommendations with negative feedback via pairwise deep reinforcement learning. In: KDD. ACM, pp 1040–1048

  41. Zhou C, Ma J, Zhang J, Zhou J, Yang H (2021) Contrastive learning for debiased candidate generation in large-scale recommender systems

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (No.61977002), the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and the State Key Laboratory of Software Development Environment of China (No. SKLSDE-2022ZX-14). The authors of this work take full responsibilities for its content. We thank the anonymous reviewers for their insightful comments and suggestions on this paper.

Funding

This research is partially funded by the National Natural Science Foundation of China (No. 61977002), the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A and the State Key Laboratory of Software Development Environment of China (No. SKLSDE-2022ZX-14).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhuang Liu.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interests regarding the publication of this article.

Code availability

We train all models on a single NVIDIA GeForce GTX 1080 Ti GPU.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Ma, Y., Hildebrandt, M. et al. CDARL: a contrastive discriminator-augmented reinforcement learning framework for sequential recommendations. Knowl Inf Syst 64, 2239–2265 (2022). https://doi.org/10.1007/s10115-022-01711-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01711-7

Keywords

Navigation