Skip to main content
Log in

Multimodal data fusion framework based on autoencoders for top-N recommender systems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this paper, we present a novel multimodal framework for video recommendation based on deep learning. Unlike most common solutions, we formulate video recommendations by exploiting simultaneously two data modalities, particularly: (i) the visual (i.e., image sequence) and (ii) the textual modalities, which in conjunction with the audio stream constitute the elementary data of a video document. More specifically, our framework firstly describe textual data by using the bag-of-words and TF-IDF models, fusing those features with deep convolutional descriptors extracted from the visual data. As result, we obtain a multimodal descriptor for each video document, from which we construct a low-dimensional sparse representation by using autoencoders. To qualify the recommendation task, we extend a sparse linear method with side information (SSLIM), by taking into account the sparse representations of video descriptors previously computed. By doing this, we are able to produce a ranking of the top-N most relevant videos to the user. Note that our framework is flexible, i.e., one may use other types of modalities, autoencoders, and fusion architectures. Experimental results obtained on three real datasets (MovieLens-1M, MovieLens-10M and Vine), containing 3,320, 8,400 and 18,576 videos, respectively, show that our framework can improve up to 60.6% the recommendation results, when compared to a single modality recommendation model and up to 31%, when compared to state-of-the art methods used as baselines in our study, demonstrating the effectiveness of our framework and highlighting the usefulness of multimodal information in recommender system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://github.com/felipela2015/MultimodalRecommender.

  2. Here, we used the cosine function to assess how similar are the items. Other functions may be tested in the future.

  3. https://www.themoviedb.org/

References

  1. Ahmed M, Imtiaz MT, Khan R (2018) Movie recommendation system using clustering and pattern recognition network. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, pp 143–147

  2. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  3. Beel J, Gipp B, Langer S, Breitinger C (2015) Research-paper recommender systems: a literature survey. International Journal on Digital Libraries, pp 1–34. https://doi.org/10.1007/s00799-015-0156-0

  4. Beutel A, Covington P, Jain S, Xu C, Li J, Gatto V, Chi EH (2018) Latent cross: Making use of context in recurrent recommender systems. In: Proceedings of the eleventh ACM international conference on Web search and data mining. ACM, pp 46–54

  5. Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012) Collaborative filtering based on significances. Inf Sci 185(1):1–17 . https://doi.org/10.1016/j.ins.2011.09.014, http://www.scopus.com/inward/record.url?eid=2-s2.0-80755139565&partnerID=40&md5=ff16abb2e6d3731d4f4683d0f56018ae

    Article  Google Scholar 

  6. Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl-Based Syst 46:109–132

    Article  Google Scholar 

  7. Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, pp 7–10

  8. Christakou C, Vrettos S, Stafylopatis A (2007) A Hybrid Movie Recommender System Based on Neural Networks. Int J Artif Intell Tools 16(05):771–792. https://doi.org/10.1142/S0218213007003540

    Article  Google Scholar 

  9. da Conceiċão F L A, Pádua F L C, Machado AC, Lacerda AM, Dalip DH (2016) Metodologia para recomendaċão de vídeos baseada em descritores de conteúdo visuais e textuais. Tendências da Pesquisa Brasileira em Ciência da Informaċão 9(1):208–225

  10. Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM conference on recommender systems. ACM, pp 191–198

  11. Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the fourth ACM conference on recommender systems, RecSys’10. ACM, New York, pp 39–46

  12. Cunningham JP, Byron MY (2014) Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17(11):1500–1509

    Article  Google Scholar 

  13. Davidson J, Livingston B, Sampath D, Liebald B, Liu J, Nandy P, Van Vleet T, Gargi U, Gupta S, He Y, Lambert M (2010) The YouTube video recommendation system. Proceedings of the fourth ACM conference on Recommender systems - RecSys ’10, p 293, https://doi.org/10.1145/1864708.1864770, http://portal.acm.org/citation.cfm?doid=1864708.1864770

  14. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems, pp 3844–3852

  15. Deldjoo Y, Quadrana M, Elahi M, Cremonesi P (2017) Using mise-en-scène visual features based on mpeg-7 and deep learning for movie recommendation. arXiv:170406109

  16. Deng Z, Yan M, Sang J, Xu C (2015) Twitter is faster: personalized time-aware video recommendation from Twitter to YouTube. ACM Trans Multimed Comput Commun Appl (TOMM) 11(2):31

    Google Scholar 

  17. Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177

    Article  Google Scholar 

  18. Fan Y, Wang Y, Yu H, Liu B (2017) Movie recommendation based on visual features of trailers. In: International conference on innovative mobile and internet services in ubiquitous computing, Springer, pp 242–253

  19. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  20. He R, McAuley J (2016) Vbpr: Visual bayesian personalized ranking from implicit feedback. In: AAAI, pp 144–150

  21. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedings of the 2008 Eighth IEEE international conference on data mining, ICDM ’08. IEEE Computer Society, Washington, pp 263–272, https://doi.org/10.1109/ICDM.2008.22

  22. Järvelin K, Kekäläinen J (2000) IR Evaluation Methods for Retrieving Highly Relevant Documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, pp 41-48. https://doi.org/10.1145/345508.345545

  23. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  24. Kabbur S, Ning X, Karypis G (2013) Fism: factored item similarity models for top-n recommender systems. In: ACM SIGKDD, pp 659–667

  25. Kataria S, Mitra P, Bhatia S (1999) Utilizing Context in Generative Bayesian Models for Linked Corpus. Aaai 10(Hofmann):1

    Google Scholar 

  26. Kim HN, Ji AT, Ha I, Jo GS (2010) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commer Res Appl 9(1):73–83. https://doi.org/10.1016/j.elerap.2009.08.004

    Article  Google Scholar 

  27. Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81(1):53–67. https://doi.org/10.1007/s10994-010-5205-8

    Article  MathSciNet  Google Scholar 

  28. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  29. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  30. Li X, She J (2017) Collaborative variational autoencoder for recommender systems. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 305–314

  31. Li Z, Peng JY, Geng GH, Chen XJ, Zheng PP (2014) Video recommendation based on multi-modal information and multiple kernel. Multimed Tools Appl 74(13):4599–4616. https://doi.org/10.1007/s11042-013-1825-x

    Article  Google Scholar 

  32. Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinform 8(1):423. https://doi.org/10.1186/1471-2105-8-423

    Article  Google Scholar 

  33. Linden G, Smith B, York J (2003) Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80. https://doi.org/10.1109/MIC.2003.1167344

    Article  Google Scholar 

  34. McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on recommender systems. ACM, pp 165–172

  35. Mei T, Yang B, Hua XS, Yang L, Yang SQ, Li S (2007) VideoReach. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’07. ACM Press, New York, pp 767. https://doi.org/10.1145/1277741.1277899, http://portal.acm.org/citation.cfm?doid=1277741.1277899

  36. Nascimento C, Laender AH, Da Silva AS, Gonçalves MA (2011) A Source Independent Framework for Research Paper Recommendation. In: Proceedings of the 11th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 297–306. https://doi.org/10.1145/1998076.1998132, http://portal.acm.org/citation.cfm?doid=1998076.1998132

  37. Nascimento G, Laranjeira C, Braz V, Lacerda A, Nascimento ER (2018) A robust indoor scene recognition method based on sparse representation. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 22nd Iberoamerican Congress, CIARP 2017, Valparaíso, 2017, proceedings. Springer, vol 10657, pp 408

  38. Ning X, Karypis G (2011) Slim: Sparse linear methods for top-n recommender systems. In: ICDM’11, pp 497–506

  39. Ning X, Karypis G (2012) Sparse linear methods with side information for top-n recommendations. In: ACM RecSys, pp 155–162

  40. Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, ACM, pp 251–260

  41. Rassweiler Filho RJ, Wehrmann J, Barros RC (2017) Leveraging deep visual features for content-based movie recommender systems. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 604–611

  42. Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 Seconds of sound and vision: Creativity in micro-videos. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  43. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009a) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference‘ on uncertainty in artificial intelligence, UAI’09. AUAI Press, Arlington, pp 452–461. http://dl.acm.org/citation.cfm?id=1795114.1795167

  44. Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009b) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 452–461

  45. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088): 533

    Article  MATH  Google Scholar 

  46. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  47. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  48. Vapnik VN (1998) The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., http://portal.acm.org/citation.cfm?id=211359

  49. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(Dec):3371–3408

    MathSciNet  MATH  Google Scholar 

  50. Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 448–456

  51. Yang C, Chen X, Liu L, Liu T, Geng S (2018) A hybrid movie recommendation method based on social similarity and item attributes. In: International conference on sensing and imaging. Springer, pp 275–285

  52. Yang J, Nguyen MN, San PP, Li X, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: Ijcai, vol 15, pp 3995–4001

  53. Zhang F, Yuan NJ, Lian D, Xie X, Ma WY (2016) Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 353–362

  54. Zhang Y, Ai Q, Chen X, Croft WB (2017) Joint representation learning for top-n recommendation with heterogeneous information sources. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1449– 1458

  55. Zheng L, Noroozi V, Yu PS (2017) Joint deep modeling of users and items using reviews for recommendation. In: Proceedings of the Tenth ACM international conference on Web search and data mining. ACM, pp 425–434

  56. Zheng Y, Mobasher B, Burke R (2014) Cslim. In: Proceedings of the 8th ACM conference on recommender systems - RecSys’14, vol 0, pp 301–304. https://doi.org/10.1145/2645710.2645756, http://dl.acm.org/citation.cfm?doid=2645710.2645756

  57. Zhou R, Khemmarat S, Gao L (2010) The Impact of YouTube Recommendation System on Video Views. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 404–410. https://doi.org/10.1145/1879141.1879193

Download references

Acknowledgements

The authors would like to thank the support of CNPq under Procs. 307510/2017-4, 313163/2014-6, 431458/2016-2 and 309291/2017-8, FAPEMIG under Procs. PPM-00542-15, APQ-03445-16 and FAPEMIG-PRONEX-MASWeb, Models, Algorithms and Systems for the Web under Proc. APQ-01400-14, CEFET-MG and CAPES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe L. A. Conceiç ao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conceiç ao, F.L.A., Pádua, F.L.C., Lacerda, A. et al. Multimodal data fusion framework based on autoencoders for top-N recommender systems. Appl Intell 49, 3267–3282 (2019). https://doi.org/10.1007/s10489-019-01430-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01430-7

Keywords

Navigation