Skip to main content
Log in

A knowledge distilled attention-based latent information extraction network for sequential user behavior

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

When modeling user-item interaction sequences to extract sequential patterns, current recommender systems face the dual issues of a) long-distance dependencies in conjunction with b) high levels of noise. In addition, with the complexity of current recommendation model architectures there is a significant increase in computation time. Therefore, these models cannot meet the requirement of fast response needed in application scenarios such as online advertising. To deal with these issues, we propose a Knowledge Distilled Attention-based Latent Information Extraction Network for Sequential user behavior (KD-ALIENS). In this model structure, user and item attributes and history are utilized to model the latent information from high-order feature interactions in conjunction with user sequential historical behavior. With regard to the issues of long-distance dependency and noise, we have adopted the self-attention mechanism to learn the sequential patterns between items in a user-item interaction history. With regard to the issue of a complex model architecture which cannot meet the requirement of fast response, the use of model compression and acceleration is realized by: (a) use of a knowledge-distilled teacher and student module, wherein the complex teacher module extracts a user’s general preference from high-order feature interactions and sequential patterns of long history sequences; and (b) a sampling method to sample both the relatively long-term and short-term item histories. Experimental studies on two real-world datasets demonstrate considerable improvements for click-through rate (CTR) prediction accuracy relative to strong baseline models and also show the effectiveness of the student-model compression and acceleration for speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Chen Q, Zhao H, Li W, Huang P, Ou W (2019) Behavior sequence transformer for e-commerce recommendation in alibaba. CoRR, arXiv:abs/1905.06874

  2. Chen X, Zhang Y, Xu H, Qin Z, Zha H (2019) Adversarial distillation for efficient recommendation with external knowledge. ACM Trans Inf Syst 37(1):12:1–12:28

    Article  Google Scholar 

  3. Cheng H, Koc L, Harmsen J, Shaked T, Chandra T D, Aradhye H, Anderson G, Corrado G S, Chai W, Ispir M et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st workshop on deep learning for recommender systems, Boston, MA, USA, September 15 2016, pp 7–10

  4. Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Sen S, Geyer W, Freyne J, Castells P (eds) Proceedings of the 10th ACM conference on recommender systems, Boston, MA, USA, September 15-19, 2016. ACM, pp 191–198

  5. Duchi J C, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  6. Fawcett T (2006) An introduction to roc analysis. Pattern Recognit Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  7. Gong Y, Jiang Z, Feng Y, Hu B, Zhao K, Liu Q, Ou W (2020) Edgerec: Recommender system on edge in mobile taobao. pp 2477–2484

  8. Gou J, Yu B, Maybank S J, Tao D (2020) Knowledge distillation: A survey. CoRR, arXiv:abs/2006.05525

  9. Graepel T, Candela J Q, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine, pp 13–20

  10. Guo H, Tang R, Ye Y, Li Z, He X (2017) Deepfm: A factorization-machine based neural network for ctr prediction. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, Melbourne, Australia, August 19-25 2017

  11. He R, McAuley J J (2016) Fusing similarity models with markov chains for sparse sequential recommendation. pp 191–200

  12. Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks

  13. Hinton G E, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR, arXiv:abs/1503.02531

  14. Huang R, McIntyre S, Song M, E H, Ou Z (2020) An attention-based latent information extraction network (alien) for high-order feature interactions. Applied Sciences 10(5468)

  15. Kang S, Hwang J, Kweon W, Yu H (2020) De-rrd: A knowledge distillation framework for recommender system. pp 605–614

  16. Kang W-C, McAuley J J (2018) Self-attentive sequential recommendation. In: IEEE international conference on data mining, ICDM 2018, november 17-20, 2018. IEEE Computer Society, Singapore, pp 197–206

  17. Lee J-, Choi M, Lee J, Shim H (2019) Collaborative distillation for top-n recommendation. pp 369–378

  18. López P G, Montresor A, Epema D H J, Datta A, Higashino T, Iamnitchi A, Barcellos M P, Felber P, Rivière E (2015) Edge-centric computing: Vision and challenges. Comput Commun Rev 45(5):37–42

    Article  Google Scholar 

  19. Pan Y, He F, Yu H (2019) A novel enhanced collaborative autoencoder with knowledge distillation for top-n recommender systems. Neurocomputing 332:137–148

    Article  Google Scholar 

  20. Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, Wang J (2016) Product-based neural networks for user response prediction. In: Proceedings of the IEEE 16th international conference on data mining, Barcelona, Spain, December 12-15 2016, pp 1149–1154

  21. Ren R, Liu Z, Li Y, Zhao W X, Wang H, Ding B, Wen J-R (2020) Sequential recommendation with self-attentive multi-adversarial network. pp 89–98

  22. Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y (2015) Fitnets: Hints for thin deep nets. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  23. Sun F, Liu J, Wu J, Pei C, Lin X, Ou W, Jiang P (2019) Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. pp 1441–1450

  24. Tang J, Wang K (2018) Ranking distillation: Learning compact ranking models with high performance for recommender system. pp 2289–2298

  25. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of the Thirty-first conference on neural information processing systems, Long Beach, CA, USA, 4-9 December 2017, pp 5998–6008

  26. Wang R, Fu B, Fu G, Wang M (2017) Deep & cross network for ad click predictions. In: Proceedings of the ADKDD’17, Halifax, NS, Canada, August 13-17 2017, pp 12:1–12:7

  27. Wang S, Hu L, Wang Y, Cao L, Sheng Q Z, Orgun M A (2019) Sequential recommender systems: Challenges, progress and prospects. pp 6332–6338

  28. Wu C-Y, Ahmed A, Beutel A, Smola A J, Jing H (2017) Recurrent recommender networks. pp 495–503

  29. Wu L, Li S, Hsieh C-J, Sharpnack J (2020) Sse-pt: Sequential recommendation via personalized transformer. pp 328–337

  30. Xu C, Li Q, Ge J, Gao J, Yang X, Pei C, Sun F, Wu J, Sun H, Ou W (2020) Privileged features distillation at taobao recommendations. pp 2590–2598

  31. Zhou G, Zhu X, Song C, Fan Y, Zhu H, Ma X, Yan Y, Jin J, Li H, Gai K (2018) Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, London, UK, August 19-23 2018, pp 1059–1068

  32. Zhu J, Liu J, Li W, Lai J, He X, Chen L, Zheng Z (2020) Ensembled CTR prediction via knowledge distillation. In: d’Aquin M, Dietze S, Hauff C, Curry E, Cudré-Mauroux P (eds) CIKM ’20: The 29th ACM international conference on information and knowledge management, virtual event, ireland, october 19-23, 2020. ACM, pp 2941–2958

Download references

Funding

This work is supported by the National Key Research and Development Program of China (2018YFB1403501).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haihong E.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Abbreviations Appendix: The following abbreviations are used in this paper:

Abbreviations Appendix: The following abbreviations are used in this paper:

Table 11 Abbreviations used in this paper

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, R., McIntyre, S., Song, M. et al. A knowledge distilled attention-based latent information extraction network for sequential user behavior. Multimed Tools Appl 82, 1017–1043 (2023). https://doi.org/10.1007/s11042-022-12513-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12513-y

Keywords

Navigation