Multimodal data fusion framework based on autoencoders for top-N recommender systems

Conceiç ao, Felipe L. A.; Pádua, Flávio L. C.; Lacerda, Anisio; Machado, Adriano C.; Dalip, Daniel H.

doi:10.1007/s10489-019-01430-7

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Published: 30 March 2019

Volume 49, pages 3267–3282, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Felipe L. A. Conceiç ao¹,
Flávio L. C. Pádua¹,
Anisio Lacerda¹,
Adriano C. Machado² &
…
Daniel H. Dalip ORCID: orcid.org/0000-0002-8532-7701¹

1162 Accesses
14 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper, we present a novel multimodal framework for video recommendation based on deep learning. Unlike most common solutions, we formulate video recommendations by exploiting simultaneously two data modalities, particularly: (i) the visual (i.e., image sequence) and (ii) the textual modalities, which in conjunction with the audio stream constitute the elementary data of a video document. More specifically, our framework firstly describe textual data by using the bag-of-words and TF-IDF models, fusing those features with deep convolutional descriptors extracted from the visual data. As result, we obtain a multimodal descriptor for each video document, from which we construct a low-dimensional sparse representation by using autoencoders. To qualify the recommendation task, we extend a sparse linear method with side information (SSLIM), by taking into account the sparse representations of video descriptors previously computed. By doing this, we are able to produce a ranking of the top-N most relevant videos to the user. Note that our framework is flexible, i.e., one may use other types of modalities, autoencoders, and fusion architectures. Experimental results obtained on three real datasets (MovieLens-1M, MovieLens-10M and Vine), containing 3,320, 8,400 and 18,576 videos, respectively, show that our framework can improve up to 60.6% the recommendation results, when compared to a single modality recommendation model and up to 31%, when compared to state-of-the art methods used as baselines in our study, demonstrating the effectiveness of our framework and highlighting the usefulness of multimodal information in recommender system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 2

Hybrid Recommendation of Movies Based on Deep Content Features

Learning Self-supervised Audio-Visual Representations for Sound Recommendations

Toward Building a Content-Based Video Recommendation System Based on Low-Level Features

Notes

https://github.com/felipela2015/MultimodalRecommender.
Here, we used the cosine function to assess how similar are the items. Other functions may be tested in the future.
https://www.themoviedb.org/

References

Ahmed M, Imtiaz MT, Khan R (2018) Movie recommendation system using clustering and pattern recognition network. In: 2018 IEEE 8th annual computing and communication workshop and conference (CCWC). IEEE, pp 143–147
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York
Google Scholar
Beel J, Gipp B, Langer S, Breitinger C (2015) Research-paper recommender systems: a literature survey. International Journal on Digital Libraries, pp 1–34. https://doi.org/10.1007/s00799-015-0156-0
Beutel A, Covington P, Jain S, Xu C, Li J, Gatto V, Chi EH (2018) Latent cross: Making use of context in recurrent recommender systems. In: Proceedings of the eleventh ACM international conference on Web search and data mining. ACM, pp 46–54
Bobadilla J, Hernando A, Ortega F, Gutiérrez A (2012) Collaborative filtering based on significances. Inf Sci 185(1):1–17 . https://doi.org/10.1016/j.ins.2011.09.014, http://www.scopus.com/inward/record.url?eid=2-s2.0-80755139565&partnerID=40&md5=ff16abb2e6d3731d4f4683d0f56018ae
Article Google Scholar
Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl-Based Syst 46:109–132
Article Google Scholar
Cheng HT, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, Anderson G, Corrado G, Chai W, Ispir M et al (2016) Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, pp 7–10
Christakou C, Vrettos S, Stafylopatis A (2007) A Hybrid Movie Recommender System Based on Neural Networks. Int J Artif Intell Tools 16(05):771–792. https://doi.org/10.1142/S0218213007003540
Article Google Scholar
da Conceiċão F L A, Pádua F L C, Machado AC, Lacerda AM, Dalip DH (2016) Metodologia para recomendaċão de vídeos baseada em descritores de conteúdo visuais e textuais. Tendências da Pesquisa Brasileira em Ciência da Informaċão 9(1):208–225
Covington P, Adams J, Sargin E (2016) Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM conference on recommender systems. ACM, pp 191–198
Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the fourth ACM conference on recommender systems, RecSys’10. ACM, New York, pp 39–46
Cunningham JP, Byron MY (2014) Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17(11):1500–1509
Article Google Scholar
Davidson J, Livingston B, Sampath D, Liebald B, Liu J, Nandy P, Van Vleet T, Gargi U, Gupta S, He Y, Lambert M (2010) The YouTube video recommendation system. Proceedings of the fourth ACM conference on Recommender systems - RecSys ’10, p 293, https://doi.org/10.1145/1864708.1864770, http://portal.acm.org/citation.cfm?doid=1864708.1864770
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in neural information processing systems, pp 3844–3852
Deldjoo Y, Quadrana M, Elahi M, Cremonesi P (2017) Using mise-en-scène visual features based on mpeg-7 and deep learning for movie recommendation. arXiv:170406109
Deng Z, Yan M, Sang J, Xu C (2015) Twitter is faster: personalized time-aware video recommendation from Twitter to YouTube. ACM Trans Multimed Comput Commun Appl (TOMM) 11(2):31
Google Scholar
Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177
Article Google Scholar
Fan Y, Wang Y, Yu H, Liu B (2017) Movie recommendation based on visual features of trailers. In: International conference on innovative mobile and internet services in ubiquitous computing, Springer, pp 242–253
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge
MATH Google Scholar
He R, McAuley J (2016) Vbpr: Visual bayesian personalized ranking from implicit feedback. In: AAAI, pp 144–150
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: Proceedings of the 2008 Eighth IEEE international conference on data mining, ICDM ’08. IEEE Computer Society, Washington, pp 263–272, https://doi.org/10.1109/ICDM.2008.22
Järvelin K, Kekäläinen J (2000) IR Evaluation Methods for Retrieving Highly Relevant Documents. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, Athens, pp 41-48. https://doi.org/10.1145/345508.345545
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Kabbur S, Ning X, Karypis G (2013) Fism: factored item similarity models for top-n recommender systems. In: ACM SIGKDD, pp 659–667
Kataria S, Mitra P, Bhatia S (1999) Utilizing Context in Generative Bayesian Models for Linked Corpus. Aaai 10(Hofmann):1
Google Scholar
Kim HN, Ji AT, Ha I, Jo GS (2010) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electron Commer Res Appl 9(1):73–83. https://doi.org/10.1016/j.elerap.2009.08.004
Article Google Scholar
Lao N, Cohen WW (2010) Relational retrieval using a combination of path-constrained random walks. Mach Learn 81(1):53–67. https://doi.org/10.1007/s10994-010-5205-8
Article MathSciNet Google Scholar
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
Li X, She J (2017) Collaborative variational autoencoder for recommender systems. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 305–314
Li Z, Peng JY, Geng GH, Chen XJ, Zheng PP (2014) Video recommendation based on multi-modal information and multiple kernel. Multimed Tools Appl 74(13):4599–4616. https://doi.org/10.1007/s11042-013-1825-x
Article Google Scholar
Lin J, Wilbur WJ (2007) PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinform 8(1):423. https://doi.org/10.1186/1471-2105-8-423
Article Google Scholar
Linden G, Smith B, York J (2003) Amazon.com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80. https://doi.org/10.1109/MIC.2003.1167344
Article Google Scholar
McAuley J, Leskovec J (2013) Hidden factors and hidden topics: understanding rating dimensions with review text. In: Proceedings of the 7th ACM conference on recommender systems. ACM, pp 165–172
Mei T, Yang B, Hua XS, Yang L, Yang SQ, Li S (2007) VideoReach. In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’07. ACM Press, New York, pp 767. https://doi.org/10.1145/1277741.1277899, http://portal.acm.org/citation.cfm?doid=1277741.1277899
Nascimento C, Laender AH, Da Silva AS, Gonçalves MA (2011) A Source Independent Framework for Research Paper Recommendation. In: Proceedings of the 11th ACM/IEEE-CS joint conference on digital libraries. ACM Press, New York, pp 297–306. https://doi.org/10.1145/1998076.1998132, http://portal.acm.org/citation.cfm?doid=1998076.1998132
Nascimento G, Laranjeira C, Braz V, Lacerda A, Nascimento ER (2018) A robust indoor scene recognition method based on sparse representation. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 22nd Iberoamerican Congress, CIARP 2017, Valparaíso, 2017, proceedings. Springer, vol 10657, pp 408
Ning X, Karypis G (2011) Slim: Sparse linear methods for top-n recommender systems. In: ICDM’11, pp 497–506
Ning X, Karypis G (2012) Sparse linear methods with side information for top-n recommendations. In: ACM RecSys, pp 155–162
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th ACM international conference on multimedia, ACM, pp 251–260
Rassweiler Filho RJ, Wehrmann J, Barros RC (2017) Leveraging deep visual features for content-based movie recommender systems. In: 2017 international joint conference on neural networks (IJCNN). IEEE, pp 604–611
Redi M, O’Hare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 Seconds of sound and vision: Creativity in micro-videos. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009a) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference‘ on uncertainty in artificial intelligence, UAI’09. AUAI Press, Arlington, pp 452–461. http://dl.acm.org/citation.cfm?id=1795114.1795167
Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009b) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 452–461
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088): 533
Article MATH Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis (IJCV) 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Vapnik VN (1998) The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., http://portal.acm.org/citation.cfm?id=211359
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(Dec):3371–3408
MathSciNet MATH Google Scholar
Wang C, Blei DM (2011) Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 448–456
Yang C, Chen X, Liu L, Liu T, Geng S (2018) A hybrid movie recommendation method based on social similarity and item attributes. In: International conference on sensing and imaging. Springer, pp 275–285
Yang J, Nguyen MN, San PP, Li X, Krishnaswamy S (2015) Deep convolutional neural networks on multichannel time series for human activity recognition. In: Ijcai, vol 15, pp 3995–4001
Zhang F, Yuan NJ, Lian D, Xie X, Ma WY (2016) Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 353–362
Zhang Y, Ai Q, Chen X, Croft WB (2017) Joint representation learning for top-n recommendation with heterogeneous information sources. In: Proceedings of the 2017 ACM on conference on information and knowledge management. ACM, pp 1449– 1458
Zheng L, Noroozi V, Yu PS (2017) Joint deep modeling of users and items using reviews for recommendation. In: Proceedings of the Tenth ACM international conference on Web search and data mining. ACM, pp 425–434
Zheng Y, Mobasher B, Burke R (2014) Cslim. In: Proceedings of the 8th ACM conference on recommender systems - RecSys’14, vol 0, pp 301–304. https://doi.org/10.1145/2645710.2645756, http://dl.acm.org/citation.cfm?doid=2645710.2645756
Zhou R, Khemmarat S, Gao L (2010) The Impact of YouTube Recommendation System on Video Views. In: Proceedings of the 10th ACM SIGCOMM conference on internet measurement. ACM, pp 404–410. https://doi.org/10.1145/1879141.1879193

Download references

Acknowledgements

The authors would like to thank the support of CNPq under Procs. 307510/2017-4, 313163/2014-6, 431458/2016-2 and 309291/2017-8, FAPEMIG under Procs. PPM-00542-15, APQ-03445-16 and FAPEMIG-PRONEX-MASWeb, Models, Algorithms and Systems for the Web under Proc. APQ-01400-14, CEFET-MG and CAPES.

Author information

Authors and Affiliations

Department of Computing, CEFET-MG, Belo Horizonte, MG, Brazil
Felipe L. A. Conceiç ao, Flávio L. C. Pádua, Anisio Lacerda & Daniel H. Dalip
Department of Computer Science, UFMG, Belo Horizonte, MG, Brazil
Adriano C. Machado

Authors

Felipe L. A. Conceiç ao
View author publications
You can also search for this author in PubMed Google Scholar
Flávio L. C. Pádua
View author publications
You can also search for this author in PubMed Google Scholar
Anisio Lacerda
View author publications
You can also search for this author in PubMed Google Scholar
Adriano C. Machado
View author publications
You can also search for this author in PubMed Google Scholar
Daniel H. Dalip
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Felipe L. A. Conceiç ao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conceiç ao, F.L.A., Pádua, F.L.C., Lacerda, A. et al. Multimodal data fusion framework based on autoencoders for top-N recommender systems. Appl Intell 49, 3267–3282 (2019). https://doi.org/10.1007/s10489-019-01430-7

Download citation

Published: 30 March 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10489-019-01430-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Abstract

Access this article

Similar content being viewed by others

Hybrid Recommendation of Movies Based on Deep Content Features

Learning Self-supervised Audio-Visual Representations for Sound Recommendations

Toward Building a Content-Based Video Recommendation System Based on Low-Level Features

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal data fusion framework based on autoencoders for top-N recommender systems

Abstract

Access this article

Similar content being viewed by others

Hybrid Recommendation of Movies Based on Deep Content Features

Learning Self-supervised Audio-Visual Representations for Sound Recommendations

Toward Building a Content-Based Video Recommendation System Based on Low-Level Features

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation