Multimedia Recommender Systems: Algorithms and Challenges

Deldjoo, Yashar; Schedl, Markus; Hidasi, Balázs; Wei, Yinwei; He, Xiangnan

doi:10.1007/978-1-0716-2197-4_25

Yashar Deldjoo⁴,
Markus Schedl^5,6,
Balázs Hidasi⁷,
Yinwei Wei⁸ &
…
Xiangnan He⁹

6962 Accesses
11 Citations
1 Altmetric

Abstract

This chapter studies state-of-the-art research related to multimedia recommender systems (MMRS), focusing on methods that integrate multimedia content as side information to various recommendation models. The multimedia features are then used by an MMRS to recommend either (1) media items from which the features were derived, or (2) non-media items utilizing the features obtained from a proxy multimedia representation of the item (e.g., images of clothes). We first outline the key considerations and challenges that must be taken into account while developing an MMRS. We then discuss the most popular multimedia content processing approaches to produce item representations that may be utilized as side information in an MMRS. Finally, we discuss recent state-of-the-art MMRS algorithms, which we classify and present according to classical hybrid models (e.g., VBPR), neural approaches, and graph-based approaches. Throughout this work, we mentioned several use-cases of MMRSs in the recommender systems research across several domains or products types such as food, fashion, music, videos, and so forth. We hope this chapter provides fresh insights into the nexus of multimedia and recommender systems, which could be exploited to broaden the frontier in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the latter case, the individual models are combined to form an ensemble model.
2.
For images, segmentation is carried out in the image/pixel space, a process known as spatial segmentation.
3.
Consider an overview of these advances on the multimedia recommender systems tutorial presented by He et al. at the ACM ICMR conference 2018: (http://staff.ustc.edu.cn/~hexn/icmr18-recsys.pdf).
4.
https://code.google.com/archive/p/word2vec/.
5.
https://world.taobao.com/.

References

H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, in The Thirty-Second International Flairs Conference (2019)
Google Scholar
T. Alashkar, S. Jiang, S. Wang, Y. Fu, Examples-rules guided deep neural network for makeup recommendation, in AAAI (2017), pp. 941–947
Google Scholar
M. Albanese, A. d’Acierno, V. Moscato, F. Persia, A. Picariello, A multimedia recommender system. ACM Trans. Int. Technol. 13(1), 1–32 (2013)
Article Google Scholar
I. Andjelkovic, D. Parra, J. O’Donovan, Moodplay: interactive music recommendation based on artists’ mood similarity. Int. J. Human-Comput. Stud. 121, 142–159 (2019)
Article Google Scholar
V.W. Anelli, Y. Deldjoo, T.D. Noia, D. Malitesta, F.A. Merra, A study of defensive methods to protect visual recommendation against adversarial manipulation of images, in The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, vol. 21 (2021), pp. 11–15
Google Scholar
K. Balog, F. Radlinski, Measuring recommendation explanation quality: The conflicting goals of explanations, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR, Virtual, (ACM, New York, 2020), pp. 329–338
Google Scholar
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, M. Aly, Video suggestion and discovery for youtube: taking random walks through the view graph, in Proceedings of the 17th International Conference on World Wide Web (2008), pp. 895–904
Google Scholar
I. Bartolini, V. Moscato, R.G. Pensa, A. Penta, A. Picariello, C. Sansone, M.L. Sapino, Recommending multimedia objects in cultural heritage applications, in International Conference on Image Analysis and Processing (Springer, Berlin, 2013), pp. 257–267
Google Scholar
H. Bay, T. Tuytelaars, L.V. Gool, Surf: Speeded up robust features, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 404–417
Google Scholar
S. Bourke, K. McCarthy, B. Smyth, The social camera: A case-study in contextual image recommendation, in Proceedings of the 16th International Conference on Intelligent User Interfaces (ACM, New York, 2011), pp. 13–22
Google Scholar
S. Boutemedjet, D. Ziou, A graphical model for context-aware visual content recommendation. IEEE Trans. Multimedia 10(1), 52–62 (2008)
Article Google Scholar
J. Bu, S. Tan, C. Chen, C. Wang, H. Wu, L. Zhang, X. He, Music recommendation by unified hypergraph: combining social media information and music content, in Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25–29, 2010. (ACM, New York, 2010), pp. 391–400
Google Scholar
L. Canini, S. Benini, R. Leonardi, Affective recommendation of movies based on selected connotative features. IEEE Trans. Circ. Syst. Video Technol. 23(4), 636–647 (2013)
Article Google Scholar
L. Cella, S. Cereda, M. Quadrana, P. Cremonesi, Deriving item features relevance from past user interactions, in Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, UMAP 2017, Bratislava, Slovakia, July 09–12, 2017 (ACM, New York, 2017), pp. 275–279
Google Scholar
B. Chen, J. Wang, Q. Huang, T. Mei, Personalized video recommendation through tripartite graph propagation, in Proceedings of the 20th ACM International Conference on Multimedia (2012), pp. 1133–1136
Google Scholar
C.-M. Chen, M.-F. Tsai, J.-Y. Liu, Y.-H. Yang, Using emotional context from article for contextual music recommendation, in Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, New York, NY (ACM, New York, 2013), pages 649–652
Google Scholar
J. Chen, H. Zhang, X. He, L. Nie, W. Liu, T.-S. Chua, Attentive collaborative filtering: Multimedia recommendation with item- and component-level attention, in SIGIR (ACM, New York, 2017), pp. 335–344
Google Scholar
J. Chen, H. Zhang, X. He, L. Nie, W. Liu, T.-S. Chua, Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2017), pp. 335–344
Google Scholar
X. Chen, P. Zhao, J. Xu, Z. Li, L. Zhao, Y. Liu, V.S. Sheng, Z. Cui, Exploiting visual contents in posters and still frames for movie recommendation. IEEE Access 6, 68874–68881 (2018)
Article Google Scholar
X. Chen, H. Chen, H. Xu, Y. Zhang, Y. Cao, Z. Qin, H. Zha, Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation, in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019), pp. 765–774
Google Scholar
C.-W. Chen, L. Yang, H. Wen, R. Jones, V. Radosavljevic, H. Bouchard, Podrecs: Workshop on podcast recommendations, in Fourteenth ACM Conference on Recommender Systems, RecSys ’20, New York, NY (Association for Computing Machinery, New York, 2020), pp. 621–622
Book Google Scholar
H.-Y. Chi, C.-C. Chen, W.-H. Cheng, M.-S. Chen, Ubishop: commercial item recommendation using visual part-based object representation. Multimedia Tools Appl. 75(23), 16093–16115 (2016)
Article Google Scholar
W.-T. Chu, Y.-L. Tsai, A hybrid recommendation system considering visual information for predicting favorite restaurants. World Wide Web 20(6), 1313–1331 (2017)
Article Google Scholar
K.-Y. Chung, Effect of facial makeup style recommendation on visual sensibility. Multimedia Tools Appl. 71(2), 843–853 (2014)
Article Google Scholar
R. Cohen, O. Sar Shalom, D. Jannach, A. Amir, A black-box attack model for visually-aware recommender systems, in Proceedings of the 14th ACM International Conference on Web Search and Data Mining (2021), pp. 94–102
Google Scholar
B. Cui, A.K.H. Tung, C. Zhang, Z. Zhao, Multiple feature fusion for social media applications, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2010), pp. 435–446
Google Scholar
P. Cui, Z. Wang, Z. Su, What videos are similar with you?: Learning a common attributed representation for video recommendation, in Proceedings of the 22nd ACM International Conference on Multimedia (ACM, New York, 2014), pp. 597–606
Google Scholar
J. Dai, Y. Li, K. He, J. Sun, R-FCN: Object detection via region-based fully convolutional networks, in Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5–10, 2016, Barcelona (2016), pp. 379–387
Google Scholar
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1 (IEEE, Piscataway, 2005), pp. 886–893
Google Scholar
Y. Deldjoo, M. Elahi, P. Cremonesi, Using visual features and latent factors for movie recommendation, in CBRecSys@ RecSys CEUR-WS (2016)
Google Scholar
Y. Deldjoo, M. Elahi, P. Cremonesi, F. Garzotto, P. Piazzolla, M. Quadrana, Content-based video recommendation system based on stylistic visual features. J. Data Semantics 5(2), 99–113 (2016)
Article Google Scholar
Y. Deldjoo, C. Frà, M. Valla, A. Paladini, D. Anghileri, M. Anil Tuncil, F. Garzotta, P. Cremonesi, et al., Enhancing children’s experience with recommendation systems, in Workshop on Children and Recommender Systems (KidRec’17)-11th ACM Conference of Recommender Systems (2017), pp. N–A
Google Scholar
Y. Deldjoo, M. Elahi, M. Quadrana, P. Cremonesi, Using visual features based on MPEG-7 and deep learning for movie recommendation. Int. J. Multim. Inf. Retr. 7(4), 207–219 (2018)
Article Google Scholar
Y. Deldjoo, M. Gabriel Constantin, H. Eghbal-Zadeh, B. Ionescu, M. Schedl, P. Cremonesi, Audio-visual encoding of multimedia content for enhancing movie recommendations, in Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, October 2–7, 2018 (ACM, New York, 2018), pp. 455–459
Google Scholar
Y. Deldjoo, M. Gabriel Constantin, B. Ionescu, M. Schedl, P. Cremonesi, MMTF-14k: A multifaceted movie trailer feature dataset for recommendation and retrieval, in Proceedings of the 9th ACM Multimedia Systems Conference (ACM, New York, 2018), pp. 450–455
Google Scholar
Y. Deldjoo, M. Schedl, P. Cremonesi, G. Pasi, Content-based multimedia recommendation systems: Definition and application domains, in Proceedings of the 9th Italian Information Retrieval Workshop, Rome, May, 28–30, 2018. CEUR Workshop Proceedings, vol. 2140. CEUR-WS.org (2018)
Google Scholar
Y. Deldjoo, M. Schedl, B. Hidasi, P. Knees, Multimedia recommender systems, in ed. by S. Pera, M.D. Ekstrand, X. Amatriain, J. O’Donovan,Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, October 2–7, 2018 (ACM, New York, 2018), pp. 537–538
Google Scholar
Y. Deldjoo, M. Ferrari Dacrema, M. Gabriel Constantin, H. Eghbal-zadeh, S. Cereda, M. Schedl, B. Ionescu, P. Cremonesi, Movie genome: alleviating new item cold start in movie recommendation. User Model. User Adapt. Interact. 29(2), 291–343 (2019)
Article Google Scholar
Y. Deldjoo, M. Schedl, Retrieving relevant and diverse movie clips using the MFVCD-7K multifaceted video clip dataset, in 2019 International Conference on Content-Based Multimedia Indexing, CBMI 2019, Dublin, September 4–6, 2019 (IEEE, Piscataway, 2019), pp. 1–4
Google Scholar
Y. Deldjoo, V.W. Anelli, H. Zamani, A. Bellogin, T. Di Noia, A flexible framework for evaluating user and item fairness in recommender systems. User Model. User-Adap. Interac., 1–55 (2021)
Google Scholar
Y. Deldjoo, M. Schedl, P. Cremonesi, G. Pasi, Recommender systems leveraging multimedia content. ACM Comput. Surv. 53(5), 38 (2020)
Google Scholar
Y. Deldjoo, A. Bellogin, T. Di Noia, Explaining recommender systems fairness and accuracy through the lens of data characteristics. Inf. Proc. Manag. 58, 102662 (2021)
Article Google Scholar
Y. Deldjoo, T. Di Noia, D. Malitesta, F. Antonio Merra, A study on the relative importance of convolutional neural networks in visually-aware recommender systems, in CVPRW-CVFAD 2021 :The 4th CVPR Workshop on Computer Vision for Fashion, Art, and Design. CVPR Proceedings (2021)
Google Scholar
Y. Deldjoo, T. Di Noia, F. Antonio Merra, A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks. ACM Comput. Surveys 54(2), 1–38 (2021)
Article Google Scholar
Y. Deldjoo, M. Schedl, P. Knees, Content-Driven Music Recommendation: Evolution, State of the Art, and Challenges. Preprint arXiv: 2107.11803 (2021)
Google Scholar
Y. Deldjoo, J.R. Trippas, H. Zamani, Towards multi-modal conversational information seeking, in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)
Google Scholar
Z. Deng, J. Sang, C. Xu, Personalized video recommendation based on cross-platform user modeling, in 2013 IEEE International Conference on Multimedia and Expo (ICME) (IEEE, Piscataway, 2013), pp. 1–6
Google Scholar
J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805 (2018)
Google Scholar
T. Di Noia, D. Malitesta, F. Antonio Merra, Taamr: Targeted adversarial attack against multimedia recommender systems, in 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) (IEEE, Piscataway, 2020), pp. 1–8
Google Scholar
X. Du, X. Wang, X. He, Z. Li, J. Tang, T.-S. Chua, How to learn item representation for cold-start multimedia recommendation? in Proceedings of the 28th ACM International Conference on Multimedia (2020), pp. 3469–3477
Google Scholar
X. Du, H. Yin, L. Chen, Y. Wang, Y. Yang, X. Zhou, Personalized video recommendation using rich contents from videos. IEEE Trans. Knowl. Data Eng. 32(3), 492–505 (2020)
Article Google Scholar
M.D. Ekstrand, J.T. Riedl, J.A. Konstan, et al., Collaborative filtering recommender systems. Foundations Trends® Human–Comput. Int. 4(2), 81–173 (2011)
Google Scholar
M. Elahi, Y. Deldjoo, F. Bakhshandegan Moghaddam, L. Cella, S. Cereda, P. Cremonesi, Exploring the semantic gap for movie recommendations, in Proceedings of the Eleventh ACM Conference on Recommender Systems (ACM, New York, 2017), pp. 326–330
Book Google Scholar
D. Elsweiler, C. Trattner, M. Harvey, Exploiting food choice biases for healthier recipe recommendation, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2017), pp. 575–584
Google Scholar
J.A. Fails, M.S. Pera, N. Kucirkova, F. Garzotto, International and interdisciplinary perspectives on children & recommender systems (kidrec), in Proceedings of the 17th ACM Conference on Interaction Design and Children (2018), pp. 705–712
Google Scholar
A. Farseev, L. Nie, M. Akbari, T.-S. Chua, Harvesting multiple sources for user profile learning: A big data study, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM, New York, 2015), pp. 235–242
Google Scholar
A. Farseev, I. Samborskii, A. Filchenkov, T.-S. Chua, Cross-domain recommendation via clustering on multi-layer graphs, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information (ACM, New York, 2017), pp. 195–204
Google Scholar
C. Feichtenhofer, A. Pinz, A. Zisserman, Convolutional two-stream network fusion for video action recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27–30, 2016 (IEEE Computer Society, Washington, DC, 2016), pp. 1933–1941
Google Scholar
G. Friedrich, M. Zanker, A taxonomy for generating explanations in recommender systems. AI Mag. 32(3), 90–98 (2011)
Google Scholar
X. Gao, F. Feng, X. He, H. Huang, X. Guan, C. Feng, Z. Ming, T.-S. Chua, Hierarchical attention network for visually-aware food recommendation. IEEE Trans. Multim. 22(6), 1647–1659 (2020)
Article Google Scholar
X. Geng, H. Zhang, J. Bian, T.-S. Chua, Learning image and user features for recommendation in social networks, in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, December 7–13, 2015 (2015), pp. 4274–4282
Google Scholar
R.B. Girshick, Fast R-CNN, in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, December 7–13, 2015 (IEEE Computer Society, Washington, DC, 2015), pp. 1440–1448
Google Scholar
R.B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, June 23–28, 2014 (IEEE Computer Society, Washington, DC, 2014), pp. 580–587
Google Scholar
X. Gu, L. Shou, P. Peng, K. Chen, S. Wu, G. Chen, iGlasses: A novel recommendation system for best-fit glasses, in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2016), pp. 1109–1112
Google Scholar
S.C. Guntuku, S. Roy, L. Weisi, Personality modeling based image recommendation, in International Conference on Multimedia Modeling (Springer, Berlin, 2015), pp. 171–182
Google Scholar
T.H. Haveliwala, Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)
Article Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27–30, 2016 (IEEE Computer Society, Washington, DC, 2016), pp. 770–778
Google Scholar
R. He, J. McAuley, Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering, in Proceedings of the 25th International Conference on World Wide Web (2016), pp. 507–517
Google Scholar
R. He, J. McAuley, VBPR: visual bayesian personalized ranking from implicit feedback, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona (2016), pp. 144–150
Google Scholar
X. He, M. Gao, M.-Y. Kan, D. Wang, Birank: towards ranking on bipartite graphs. IEEE Trans. Knowl. Data Eng. 29(1), 57–71 (2016)
Article Google Scholar
X. He, H. Zhang, T.-S. Chua, Recommendation technologies for multimedia content, in ICMR (2018), p. 8
Google Scholar
M. Hou, L. Wu, E. Chen, Z. Li, V.W. Zheng, Q. Liu, Explainable fashion recommendation: A semantic attribute region guided approach, in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, August 10–16, 2019. ijcai.org (2019), pp. 4681–4688
Google Scholar
G. Huang, Z. Liu, L. van der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21–26, 2017 (IEEE Computer Society, Washington, DC, 2017), pp. 2261–2269
Google Scholar
H. Jiang, W. Wang, Y. Wei, Z. Gao, Y. Wang, L. Nie, What aspect do you like: Multi-scale time-aware user interest modeling for micro-video recommendation, in Proceedings of the 28th ACM International Conference on Multimedia (2020), pp. 3487–3495
Google Scholar
M. Kaminskas, F. Ricci, M. Schedl, Location-aware music recommendation using auto-tagging and hybrid matching, in Proceedings of the 7th ACM Conference on Recommender Systems (ACM, New York, 2013), pp. 17–24
Google Scholar
W.-C. Kang, C. Fang, Z. Wang, J.J. McAuley, Visually-aware fashion recommendation and design with generative image models, in ICDM (IEEE Computer Society, Washington, DC, 2017), pp. 207–216
Google Scholar
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, F.-F. Li, Large-scale video classification with convolutional neural networks, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, June 23–28, 2014 (IEEE Computer Society, Washington, DC, 2014), pp. 1725–1732
Google Scholar
R. Kaur, S. Kautish, Multimodal sentiment analysis: A survey and comparison. Int. J. Serv. Sci. Manag. Eng. Technol. 10(2), 38–58 (2019)
Google Scholar
P. Knees, M. Schedl, Music Similarity and Retrieval: An Introduction to Audio- and Web-based Strategies. The Information Retrieval Series, vol. 36 (Springer, Berlin, 2016)
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
Google Scholar
C.-T. Li, M.-K. Shan, Emotion-based impressionism slideshow with automatic music accompaniment, in Proceedings of the 15th ACM International Conference on Multimedia, MM ’07, New York, NY, (ACM, New York, 2007), pp. 839–842
Google Scholar
J. Li, K. Lu, Z. Huang, H.T. Shen, Two birds one stone: On both cold-start and long-tail recommendation, in Proceedings of the 25th ACM International Conference on Multimedia (2017), pp. 898–906
Google Scholar
Y. Li, M. Liu, J. Yin, C. Cui, X.-S. Xu, L. Nie, Routing micro-videos via a temporal graph-guided recommendation system, in Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 1464–1472
Google Scholar
X. Li, X. Wang, X. He, L. Chen, J. Xiao, T.-S. Chua, Hierarchical fashion graph network for personalized outfit recommendation. CoRR abs/2005.12566 (2020)
Google Scholar
X. Li, X. Wang, X. He, L. Chen, J. Xiao, T.-S. Chua, Hierarchical fashion graph network for personalized outfit recommendation, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020)
Google Scholar
D. Liang, M. Zhan, D.P.W. Ellis, Content-aware collaborative music recommendation using pre-trained neural networks, in ed. by M. Müller, F. Wiering, Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Málaga, October 26–30, 2015 (2015), pp. 295–301
Google Scholar
L. Liao, L. Le Hong, Z. Zhang, M. Huang, T.S. Chua, MMConv: an environment for multimodal conversational search across multiple domains, in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)
Google Scholar
Z. Lin, G. Ding, J. Wang, Image annotation based on recommendation model, in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (ACM, New York, 2011), pp. 1097–1098
Google Scholar
Y. Lin, M. Moosaei, H. Yang, Outfitnet: Fashion outfit recommendation with attention-based multiple instance learning, in WWW ’20: The Web Conference 2020, Taipei, April 20–24, 2020 (ACM/IW3C2, New York/Geneva, 2020), pp. 77–87
Google Scholar
J. Liu, Z. Li, J. Tang, Y. Jiang, H. Lu, Personalized geo-specific tag recommendation for photos on social websites. IEEE Trans. Multim. 16(3), 588–600 (2014)
Article Google Scholar
W. Liu, A. Rabinovich, A.C. Berg, Parsenet: Looking wider to see better. CoRR abs/1506.04579 (2015)
Google Scholar
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S.E. Reed, C.-Y. Fu, A.C. Berg, SSD: Single shot multibox detector, in Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, October 11–14, 2016, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9905 (Springer, Berlin, 2016), pp. 21–37
Google Scholar
Q. Liu, S. Wu, L. Wang, Deepstyle: Learning user preferences for visual recommendation, in SIGIR (ACM, New York, 2017), pp. 841–844
Google Scholar
L. Liu, J. Chen, P. Fieguth, G. Zhao, R. Chellappa, M. Pietikäinen, From bow to CNN: Two decades of texture representation for texture classification. Int. J. Comput. Vision 127(1), 74–109 (2019)
Article Google Scholar
B. Logan, Mel frequency cepstral coefficients for music modeling, in ISMIR 2000, 1st International Symposium on Music Information Retrieval, Plymouth, Massachusetts, October 23–25, 2000, Proceedings (2000)
Google Scholar
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
H. Luo, J. Fan, D.A. Keim, S. Satoh, Personalized news video recommendation, in International Conference on Multimedia Modeling (Springer, Berlin, 2009), pp. 459–471
Google Scholar
J. Ma, G. Li, M. Zhong, X. Zhao, L. Zhu, X. Li, LGA: latent genre aware micro-video recommendation on social media. Multim. Tools Appl. 77(3), 2991–3008 (2018)
Article Google Scholar
J. McAuley, C. Targett, Q. Shi, A. Van Den Hengel, Image-based recommendations on styles and substitutes, in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2015), pp. 43–52
Google Scholar
B. McFee, G.R.G. Lanckriet, The natural language of playlists, in ed. by A. Klapuri, C. Leider, Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, October 24–28, 2011 (University of Miami, Coral Gables, 2011), pp. 537–542
Google Scholar
T. Mei, B. Yang, X.-S. Hua, L. Yang, S.-Q. Yang, S. Li, Videoreach: An online video recommendation system, in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2007), pp. 767–768
Google Scholar
T. Mei, B. Yang, X.-S. Hua, S. Li, Contextual video recommendation by multimodal relevance and user feedback. ACM Trans. Inf. Syst. 29(2), 10 (2011)
Google Scholar
P. Melville, R.J. Mooney, R. Nagarajan, Content-boosted collaborative filtering for improved recommendations, in Eighteenth National Conference on Artificial Intelligence, (American Association for Artificial Intelligence, Menlo Park, 2002), pp. 187–192
Google Scholar
L. Meng, F. Feng, X. He, X. Gao, T.-S. Chua, Heterogeneous fusion of semantic and collaborative information for visually-aware food recommendation, in MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, October 12–16, 2020 (ACM, New York, 2020), pp. 3460–3468
Google Scholar
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2–4, 2013 (2013)
Google Scholar
M. Müller, Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications (Springer, Berlin, 2015)
Book Google Scholar
X. Ning, G. Karypis, SLIM: sparse linear methods for top-n recommender systems, in D.J. Cook, J. Pei, W. Wang, O.R. Zaïane, X. Wu, 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, December 11–14, 2011 (IEEE Computer Society, Washington, DC, 2011), pp. 497–506
Google Scholar
X. Ning, G. Karypis, Sparse linear methods with side information for top-n recommendations, in Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ’12 New York, NY, (Association for Computing Machinery, New York, 2012), pp. 155–162
Book Google Scholar
W. Niu, J. Caverlee, H. Lu, Neural personalized ranking for image recommendation, in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (ACM, New York, 2018), pp. 423–431
Google Scholar
T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Analy. Mach. Intell. 24(7), 971–987 (2002)
Article MATH Google Scholar
A.v.d. Oord, S. Dieleman, B. Schrauwen, Deep content-based music recommendation, in ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger, Advances in Neural Information Processing Systems 26 (NIPS) (Curran Associates, Lake Tahoe, NV, 2013), pp. 2643–2651
Google Scholar
S. Oramas, V.C. Ostuni, T. Di Noia, X. Serra, E. Di Sciascio, Sound and music recommendation with knowledge graphs. ACM Trans. Intell. Syst. Technol. 8(2), 21:1–21:21 (2016)
Google Scholar
S. Oramas, O. Nieto, M. Sordo, X. Serra, A deep multimodal approach for cold-start music recommendation, in Proceedings of the 2Nd Workshop on Deep Learning for Recommender Systems, DLRS 2017, New York, NY (ACM, New York, 2017), pp. 32–37
Book Google Scholar
L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
O.M. Parkhi, A. Vedaldi, A. Zisserman, Deep face recognition, in Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, September 7–10, 2015 (BMVA Press, Swansea, 2015), pp. 41.1–41.12
Google Scholar
B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014), pp. 701–710
Google Scholar
L. Peska, H. Trojanova, Towards recommender systems for police photo lineup. Preprint arXiv:1707.01389 (2017)
Google Scholar
J. Pons, X. Serra, musicnn: Pre-trained convolutional neural networks for music audio tagging. Preprint arXiv:1909.06654 (2019)
Google Scholar
L.R. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series (Prentice Hall, Hoboken, 1993)
Google Scholar
D. Ramachandram, G.W. Taylor, Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Proc. Mag. 34(6), 96–108 (2017)
Article Google Scholar
Y.S. Rawat, M.S. Kankanhalli, Clicksmart: a context-aware viewpoint recommendation system for mobile photography. IEEE Trans. Circuits Syst. Video Techn. 27(1), 149–158 (2017)
Article Google Scholar
S. Ren, K. He, R.B. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec (2015), pp. 91–99
Google Scholar
S. Rendle, Factorization machines with libFM. ACM Trans. Intell. Syst. Technol. 3(3), 57 (2012)
Google Scholar
M.T. Ribeiro, S. Singh, C. Guestrin, Why should i trust you?, in Proceeding of the International Conference on Knowledge Discovery and Data Mining (KDD) (ACM, New York, 2016), pp. 1135–1144
Google Scholar
S. Roy, S.C. Guntuku, Latent factor representations for cold-start video recommendation, in Proceedings of the 10th ACM Conference on Recommender Systems (ACM, New York, 2016), pp. 99–106
Google Scholar
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M.S. Bernstein, A.C. Berg, F.-F. Li, Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
P. Sánchez, A. Bellogín, On the effects of aggregation strategies for different groups of users in venue recommendation. Inf. Proc. Manag. 58(5), 102609 (2021)
Google Scholar
M. Schedl, H. Zamani, C.-W. Chen, Y. Deldjoo, M. Elahi, Current challenges and visions in music recommender systems research. Int. J. Multim. Inf. Retr. 7(2), 95–116 (2018)
Article Google Scholar
K. Seyerlehner, G. Widmer, T. Pohle, Fusing block-level features for music similarity estimation, in Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, September 6–10 (2010)
Google Scholar
Y. Shi, M. Larson, A. Hanjalic, Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Comput. Surv. 47(1), 3 (2014)s
Google Scholar
K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec (2014), pp. 568–576
Google Scholar
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in ed. by Y. Bengio, Y. LeCun, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA,May 7–9, 2015, Conference Track Proceedings (2015)
Google Scholar
J. Smith, D. Weeks, M. Jacob, J. Freeman, B. Magerko, Towards a hybrid recommendation system for a sound library, in Joint Proceedings of the ACM IUI 2019 Workshops co-located with the 24th ACM Conference on Intelligent User Interfaces (ACM IUI 2019), Los Angeles, March 20, 2019 (2019)
Google Scholar
J. Smith, D. Weeks, M. Jacob, J. Freeman, B. Magerko, Towards a hybrid recommendation system for a sound library, in IUI Workshops (2019)
Google Scholar
J. Song, Y. Yang, Z. Huang, H.T. Shen, J. Luo, Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multim. 15(8), 1997–2008 (2013)
Article Google Scholar
G.-L. Sun, Z.-Q. Cheng, X. Wu, Q. Peng, Personalized clothing recommendation combining user social circle and fashion style consistency. Multim. Tools Appl. 77(14), 17731–17754 (2018)
Article Google Scholar
J. Tang, X. Du, X. He, F. Yuan, Q. Tian, T.-S. Chua, Adversarial training towards robust multimedia recommender system. IEEE Trans. Knowl. Data Eng. 32, 855–867 (2019)
Article Google Scholar
Z. Tao, Y. Wei, X. Wang, X. He, X. Huang, T.-S. Chua, MGAT: Multimodal graph attention network for recommendation. Inf. Proc. Manag. 57(5), 102277 (2020)
Google Scholar
I. Tautkute, A. Możejko, W. Stokowiec, T. Trzciński, Ł. Brocki, K. Marasek, What looks good with my sofa: Multimodal search engine for interior design, in ed. by M. Ganzha, L. Maciaszek, M. Paprzycki, Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. Annals of Computer Science and Information Systems, vol. 11 (IEEE, Piscataway, 2017), pp. 1275–1282
Google Scholar
J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers, A.W.M. Smeulders, Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
A. Vall, M. Dorfer, H. Eghbal-zadeh, M. Schedl, K. Burjorjee, G. Widmer, Feature-combination hybrid recommender systems for automated music playlist continuation. User Model. User-Adap. Interac. J. Personaliz. Res. 29, 527–572 (2019)
Article Google Scholar
R. van den Berg, T.N. Kipf, M. Welling, Graph convolutional matrix completion, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018)
Google Scholar
H. Wang, C. Schmid, Action recognition with improved trajectories, in IEEE International Conference on Computer Vision, ICCV 2013, Sydney, December 1–8, 2013 (IEEE Computer Society, Washington, DC, 2013), pp. 3551–3558
Google Scholar
X. Wang, Y. Wang, Improving content-based and hybrid music recommendation using deep learning, in Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, November 03–07, 2014 (ACM, New York, 2014), pp. 627–636
Google Scholar
F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, X. Tang, Residual attention network for image classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3156–3164
Google Scholar
S. Wang, Y. Wang, J. Tang, K. Shu, S. Ranganath, H. Liu, What your images reveal: Exploiting visual contents for point-of-interest recommendation, in Proceedings of the 26th International Conference on World Wide Web (2017), pp. 391–400
Google Scholar
J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, D.L. Lee, Billion-scale commodity embedding for e-commerce recommendation in alibaba, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), pp 839–848
Google Scholar
X. Wang, X. He, M. Wang, F. Feng, T.-S. Chua, Neural graph collaborative filtering, in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019), pp. 165–174
Google Scholar
Y. Wei, Z. Cheng, X. Yu, Z. Zhao, L. Zhu, L. Nie, Personalized hashtag recommendation for micro-videos, in Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 1446–1454
Google Scholar
Y. Wei, X. Wang, L. Nie, X. He, R. Hong, T.-S. Chua, MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video, in Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 1437–1445
Google Scholar
Y. Wei, X. Wang, L. Nie, X. He, T.-S. Chua, Graph-refined convolutional network for multimedia recommendation with implicit feedback, in Proceedings of the 28th ACM International Conference on Multimedia (2020), pp. 3541–3549
Google Scholar
J. Wen, J. She, X. Li, H. Mao, Visual background recommendation for dance performances using deep matrix factorization. TOMCCAP 14(1), 11:1–11:19 (2018)
Google Scholar
Z. Wu, S. Jiang, Q. Huang, Friend recommendation according to appearances on photos, in ed. by W. Gao, Y. Rui, A. Hanjalic, C. Xu, E.G. Steinbach, A. El-Saddik, M.X. Zhou, Proceedings of the 17th International Conference on Multimedia 2009, Vancouver, British Columbia, October 19–24, 2009 (ACM, New York, 2009), pp. 987–988
Google Scholar
C.-C. Wu, T. Mei, W.H. Hsu, Y. Rui, Learning to personalize trending image search suggestion, in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (ACM, New York, 2014), pp. 727–736
Google Scholar
J. Wu, X. He, X. Wang, Q. Wang, W. Chen, J. Lian, X. Xie, Y. Zhang, Graph convolution machine for context-aware recommender system. Preprint arXiv:2001.11402 (2020)
Google Scholar
Z. Xing, M. Parandehgheibi, F. Xiao, N. Kulkarni, C. Pouliot, Content-based recommendation for podcast audio-items using natural language processing techniques, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2016), pp. 2378–2383
Book Google Scholar
R. Yan, M. Lapata, X. Li, Tweet recommendation with graph co-ranking, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2012), pp. 516–525
Google Scholar
L. Yang, C.-K. Hsieh, H. Yang, J.P. Pollak, N. Dell, S. Belongie, C. Cole, D. Estrin, Yum-me: a personalized nutrient-based meal recommender system. ACM Trans. Inf. Syst. 36(1), 7 (2017)
Google Scholar
R. Ying, R. He, K. Chen, P. Eksombatchai, W.L. Hamilton, J. Leskovec, Graph convolutional neural networks for web-scale recommender systems, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), pp. 974–983
Google Scholar
K. Yoshii, M. Goto, K. Komatani, T. Ogata, H.G. Okuno, IEEE Trans. Audio Speech Language Proc. 16(2), 435–447 (2008)
Article Google Scholar
W. Yu, H. Zhang, X. He, X. Chen, L. Xiong, Z. Qin, Aesthetic-based clothing recommendation, in Proceedings of the 2018 World Wide Web Conference (2018), pp. 649–658
Google Scholar
J. Zhang, X. Shi, S. Zhao, I. King, Star-gcn: Stacked and reconstructed graph convolutional networks for recommender systems, in The 28th International Joint Conference on Artificial Intelligence (2019), pp. 4264–4270
Google Scholar
S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)
Article Google Scholar
X. Zhao, H. Luan, J. Cai, J. Yuan, X. Chen, Z. Li, Personalized video recommendation based on viewing history with the study on youtube, in Proceedings of the 4th International Conference on Internet Multimedia Computing and Service (ACM, New York, 2012), pp. 161–165
Google Scholar
B. Zhou, À. Lapedriza, J. Xiao, A. Torralba, A. Oliva, Learning deep features for scene recognition using places database, in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec (2014), pp. 487–495
Google Scholar
Q. Zhu, M.-L. Shyu, H. Wang, Videotopic: Content-based video recommendation using a topic model, in 2013 IEEE International Symposium on Multimedia (ISM) (IEEE, Piscataway, 2013), pp. 219–222
Google Scholar
U. Zoelzer, Digital Audio Signal Processing (Wiley, Hoboken, 2008)
Book Google Scholar

Download references

Acknowledgements

We would like to thank Jan Schlüter for providing image examples.

Author information

Authors and Affiliations

Polytechnic University of Bari, Bari, Italy
Yashar Deldjoo
Johannes Kepler University Linz, Institute of Computational Perception, Multimedia Mining and Search Group, Linz, Austria
Markus Schedl
LIT AI Lab, Human centered AI Group, Linz, Austria
Markus Schedl
Gravity R&D, Budapest, Hungary
Balázs Hidasi
National University of Singapore, Singapore, Singapore
Yinwei Wei
University of Science and Technology, Hefei, China
Xiangnan He

Authors

Yashar Deldjoo
View author publications
You can also search for this author in PubMed Google Scholar
Markus Schedl
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Hidasi
View author publications
You can also search for this author in PubMed Google Scholar
Yinwei Wei
View author publications
You can also search for this author in PubMed Google Scholar
Xiangnan He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yashar Deldjoo .

Editor information

Editors and Affiliations

Faculty of Computer Science, Free University of Bozen-Bolzano, Bozen-Bolzano, Italy
Francesco Ricci
Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Lior Rokach
Software and Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Bracha Shapira

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Deldjoo, Y., Schedl, M., Hidasi, B., Wei, Y., He, X. (2022). Multimedia Recommender Systems: Algorithms and Challenges. In: Ricci, F., Rokach, L., Shapira, B. (eds) Recommender Systems Handbook. Springer, New York, NY. https://doi.org/10.1007/978-1-0716-2197-4_25

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2197-4_25
Published: 22 November 2021
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-0716-2196-7
Online ISBN: 978-1-0716-2197-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics