Skip to main content

Multimedia Recommender Systems: Algorithms and Challenges

  • Chapter
  • First Online:
Recommender Systems Handbook

Abstract

This chapter studies state-of-the-art research related to multimedia recommender systems (MMRS), focusing on methods that integrate multimedia content as side information to various recommendation models. The multimedia features are then used by an MMRS to recommend either (1) media items from which the features were derived, or (2) non-media items utilizing the features obtained from a proxy multimedia representation of the item (e.g., images of clothes). We first outline the key considerations and challenges that must be taken into account while developing an MMRS. We then discuss the most popular multimedia content processing approaches to produce item representations that may be utilized as side information in an MMRS. Finally, we discuss recent state-of-the-art MMRS algorithms, which we classify and present according to classical hybrid models (e.g., VBPR), neural approaches, and graph-based approaches. Throughout this work, we mentioned several use-cases of MMRSs in the recommender systems research across several domains or products types such as food, fashion, music, videos, and so forth. We hope this chapter provides fresh insights into the nexus of multimedia and recommender systems, which could be exploited to broaden the frontier in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the latter case, the individual models are combined to form an ensemble model.

  2. 2.

    For images, segmentation is carried out in the image/pixel space, a process known as spatial segmentation.

  3. 3.

    Consider an overview of these advances on the multimedia recommender systems tutorial presented by He et al. at the ACM ICMR conference 2018: (http://staff.ustc.edu.cn/~hexn/icmr18-recsys.pdf).

  4. 4.

    https://code.google.com/archive/p/word2vec/.

  5. 5.

    https://world.taobao.com/.

References

  1. H. Abdollahpouri, R. Burke, B. Mobasher, Managing popularity bias in recommender systems with personalized re-ranking, in The Thirty-Second International Flairs Conference (2019)

    Google Scholar 

  2. T. Alashkar, S. Jiang, S. Wang, Y. Fu, Examples-rules guided deep neural network for makeup recommendation, in AAAI (2017), pp. 941–947

    Google Scholar 

  3. M. Albanese, A. d’Acierno, V. Moscato, F. Persia, A. Picariello, A multimedia recommender system. ACM Trans. Int. Technol. 13(1), 1–32 (2013)

    Article  Google Scholar 

  4. I. Andjelkovic, D. Parra, J. O’Donovan, Moodplay: interactive music recommendation based on artists’ mood similarity. Int. J. Human-Comput. Stud. 121, 142–159 (2019)

    Article  Google Scholar 

  5. V.W. Anelli, Y. Deldjoo, T.D. Noia, D. Malitesta, F.A. Merra, A study of defensive methods to protect visual recommendation against adversarial manipulation of images, in The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, vol. 21 (2021), pp. 11–15

    Google Scholar 

  6. K. Balog, F. Radlinski, Measuring recommendation explanation quality: The conflicting goals of explanations, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR, Virtual, (ACM, New York, 2020), pp. 329–338

    Google Scholar 

  7. S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, M. Aly, Video suggestion and discovery for youtube: taking random walks through the view graph, in Proceedings of the 17th International Conference on World Wide Web (2008), pp. 895–904

    Google Scholar 

  8. I. Bartolini, V. Moscato, R.G. Pensa, A. Penta, A. Picariello, C. Sansone, M.L. Sapino, Recommending multimedia objects in cultural heritage applications, in International Conference on Image Analysis and Processing (Springer, Berlin, 2013), pp. 257–267

    Google Scholar 

  9. H. Bay, T. Tuytelaars, L.V. Gool, Surf: Speeded up robust features, in European Conference on Computer Vision (Springer, Berlin, 2006), pp. 404–417

    Google Scholar 

  10. S. Bourke, K. McCarthy, B. Smyth, The social camera: A case-study in contextual image recommendation, in Proceedings of the 16th International Conference on Intelligent User Interfaces (ACM, New York, 2011), pp. 13–22

    Google Scholar 

  11. S. Boutemedjet, D. Ziou, A graphical model for context-aware visual content recommendation. IEEE Trans. Multimedia 10(1), 52–62 (2008)

    Article  Google Scholar 

  12. J. Bu, S. Tan, C. Chen, C. Wang, H. Wu, L. Zhang, X. He, Music recommendation by unified hypergraph: combining social media information and music content, in Proceedings of the 18th International Conference on Multimedia 2010, Firenze, Italy, October 25–29, 2010. (ACM, New York, 2010), pp. 391–400

    Google Scholar 

  13. L. Canini, S. Benini, R. Leonardi, Affective recommendation of movies based on selected connotative features. IEEE Trans. Circ. Syst. Video Technol. 23(4), 636–647 (2013)

    Article  Google Scholar 

  14. L. Cella, S. Cereda, M. Quadrana, P. Cremonesi, Deriving item features relevance from past user interactions, in Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, UMAP 2017, Bratislava, Slovakia, July 09–12, 2017 (ACM, New York, 2017), pp. 275–279

    Google Scholar 

  15. B. Chen, J. Wang, Q. Huang, T. Mei, Personalized video recommendation through tripartite graph propagation, in Proceedings of the 20th ACM International Conference on Multimedia (2012), pp. 1133–1136

    Google Scholar 

  16. C.-M. Chen, M.-F. Tsai, J.-Y. Liu, Y.-H. Yang, Using emotional context from article for contextual music recommendation, in Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, New York, NY (ACM, New York, 2013), pages 649–652

    Google Scholar 

  17. J. Chen, H. Zhang, X. He, L. Nie, W. Liu, T.-S. Chua, Attentive collaborative filtering: Multimedia recommendation with item- and component-level attention, in SIGIR (ACM, New York, 2017), pp. 335–344

    Google Scholar 

  18. J. Chen, H. Zhang, X. He, L. Nie, W. Liu, T.-S. Chua, Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2017), pp. 335–344

    Google Scholar 

  19. X. Chen, P. Zhao, J. Xu, Z. Li, L. Zhao, Y. Liu, V.S. Sheng, Z. Cui, Exploiting visual contents in posters and still frames for movie recommendation. IEEE Access 6, 68874–68881 (2018)

    Article  Google Scholar 

  20. X. Chen, H. Chen, H. Xu, Y. Zhang, Y. Cao, Z. Qin, H. Zha, Personalized fashion recommendation with visual explanations based on multimodal attention network: Towards visually explainable recommendation, in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019), pp. 765–774

    Google Scholar 

  21. C.-W. Chen, L. Yang, H. Wen, R. Jones, V. Radosavljevic, H. Bouchard, Podrecs: Workshop on podcast recommendations, in Fourteenth ACM Conference on Recommender Systems, RecSys ’20, New York, NY (Association for Computing Machinery, New York, 2020), pp. 621–622

    Book  Google Scholar 

  22. H.-Y. Chi, C.-C. Chen, W.-H. Cheng, M.-S. Chen, Ubishop: commercial item recommendation using visual part-based object representation. Multimedia Tools Appl. 75(23), 16093–16115 (2016)

    Article  Google Scholar 

  23. W.-T. Chu, Y.-L. Tsai, A hybrid recommendation system considering visual information for predicting favorite restaurants. World Wide Web 20(6), 1313–1331 (2017)

    Article  Google Scholar 

  24. K.-Y. Chung, Effect of facial makeup style recommendation on visual sensibility. Multimedia Tools Appl. 71(2), 843–853 (2014)

    Article  Google Scholar 

  25. R. Cohen, O. Sar Shalom, D. Jannach, A. Amir, A black-box attack model for visually-aware recommender systems, in Proceedings of the 14th ACM International Conference on Web Search and Data Mining (2021), pp. 94–102

    Google Scholar 

  26. B. Cui, A.K.H. Tung, C. Zhang, Z. Zhao, Multiple feature fusion for social media applications, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2010), pp. 435–446

    Google Scholar 

  27. P. Cui, Z. Wang, Z. Su, What videos are similar with you?: Learning a common attributed representation for video recommendation, in Proceedings of the 22nd ACM International Conference on Multimedia (ACM, New York, 2014), pp. 597–606

    Google Scholar 

  28. J. Dai, Y. Li, K. He, J. Sun, R-FCN: Object detection via region-based fully convolutional networks, in Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5–10, 2016, Barcelona (2016), pp. 379–387

    Google Scholar 

  29. N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1 (IEEE, Piscataway, 2005), pp. 886–893

    Google Scholar 

  30. Y. Deldjoo, M. Elahi, P. Cremonesi, Using visual features and latent factors for movie recommendation, in CBRecSys@ RecSys CEUR-WS (2016)

    Google Scholar 

  31. Y. Deldjoo, M. Elahi, P. Cremonesi, F. Garzotto, P. Piazzolla, M. Quadrana, Content-based video recommendation system based on stylistic visual features. J. Data Semantics 5(2), 99–113 (2016)

    Article  Google Scholar 

  32. Y. Deldjoo, C. Frà, M. Valla, A. Paladini, D. Anghileri, M. Anil Tuncil, F. Garzotta, P. Cremonesi, et al., Enhancing children’s experience with recommendation systems, in Workshop on Children and Recommender Systems (KidRec’17)-11th ACM Conference of Recommender Systems (2017), pp. N–A

    Google Scholar 

  33. Y. Deldjoo, M. Elahi, M. Quadrana, P. Cremonesi, Using visual features based on MPEG-7 and deep learning for movie recommendation. Int. J. Multim. Inf. Retr. 7(4), 207–219 (2018)

    Article  Google Scholar 

  34. Y. Deldjoo, M. Gabriel Constantin, H. Eghbal-Zadeh, B. Ionescu, M. Schedl, P. Cremonesi, Audio-visual encoding of multimedia content for enhancing movie recommendations, in Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, October 2–7, 2018 (ACM, New York, 2018), pp. 455–459

    Google Scholar 

  35. Y. Deldjoo, M. Gabriel Constantin, B. Ionescu, M. Schedl, P. Cremonesi, MMTF-14k: A multifaceted movie trailer feature dataset for recommendation and retrieval, in Proceedings of the 9th ACM Multimedia Systems Conference (ACM, New York, 2018), pp. 450–455

    Google Scholar 

  36. Y. Deldjoo, M. Schedl, P. Cremonesi, G. Pasi, Content-based multimedia recommendation systems: Definition and application domains, in Proceedings of the 9th Italian Information Retrieval Workshop, Rome, May, 28–30, 2018. CEUR Workshop Proceedings, vol. 2140. CEUR-WS.org (2018)

    Google Scholar 

  37. Y. Deldjoo, M. Schedl, B. Hidasi, P. Knees, Multimedia recommender systems, in ed. by S. Pera, M.D. Ekstrand, X. Amatriain, J. O’Donovan,Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, October 2–7, 2018 (ACM, New York, 2018), pp. 537–538

    Google Scholar 

  38. Y. Deldjoo, M. Ferrari Dacrema, M. Gabriel Constantin, H. Eghbal-zadeh, S. Cereda, M. Schedl, B. Ionescu, P. Cremonesi, Movie genome: alleviating new item cold start in movie recommendation. User Model. User Adapt. Interact. 29(2), 291–343 (2019)

    Article  Google Scholar 

  39. Y. Deldjoo, M. Schedl, Retrieving relevant and diverse movie clips using the MFVCD-7K multifaceted video clip dataset, in 2019 International Conference on Content-Based Multimedia Indexing, CBMI 2019, Dublin, September 4–6, 2019 (IEEE, Piscataway, 2019), pp. 1–4

    Google Scholar 

  40. Y. Deldjoo, V.W. Anelli, H. Zamani, A. Bellogin, T. Di Noia, A flexible framework for evaluating user and item fairness in recommender systems. User Model. User-Adap. Interac., 1–55 (2021)

    Google Scholar 

  41. Y. Deldjoo, M. Schedl, P. Cremonesi, G. Pasi, Recommender systems leveraging multimedia content. ACM Comput. Surv. 53(5), 38 (2020)

    Google Scholar 

  42. Y. Deldjoo, A. Bellogin, T. Di Noia, Explaining recommender systems fairness and accuracy through the lens of data characteristics. Inf. Proc. Manag. 58, 102662 (2021)

    Article  Google Scholar 

  43. Y. Deldjoo, T. Di Noia, D. Malitesta, F. Antonio Merra, A study on the relative importance of convolutional neural networks in visually-aware recommender systems, in CVPRW-CVFAD 2021 :The 4th CVPR Workshop on Computer Vision for Fashion, Art, and Design. CVPR Proceedings (2021)

    Google Scholar 

  44. Y. Deldjoo, T. Di Noia, F. Antonio Merra, A survey on adversarial recommender systems: from attack/defense strategies to generative adversarial networks. ACM Comput. Surveys 54(2), 1–38 (2021)

    Article  Google Scholar 

  45. Y. Deldjoo, M. Schedl, P. Knees, Content-Driven Music Recommendation: Evolution, State of the Art, and Challenges. Preprint arXiv: 2107.11803 (2021)

    Google Scholar 

  46. Y. Deldjoo, J.R. Trippas, H. Zamani, Towards multi-modal conversational information seeking, in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)

    Google Scholar 

  47. Z. Deng, J. Sang, C. Xu, Personalized video recommendation based on cross-platform user modeling, in 2013 IEEE International Conference on Multimedia and Expo (ICME) (IEEE, Piscataway, 2013), pp. 1–6

    Google Scholar 

  48. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint arXiv:1810.04805 (2018)

    Google Scholar 

  49. T. Di Noia, D. Malitesta, F. Antonio Merra, Taamr: Targeted adversarial attack against multimedia recommender systems, in 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W) (IEEE, Piscataway, 2020), pp. 1–8

    Google Scholar 

  50. X. Du, X. Wang, X. He, Z. Li, J. Tang, T.-S. Chua, How to learn item representation for cold-start multimedia recommendation? in Proceedings of the 28th ACM International Conference on Multimedia (2020), pp. 3469–3477

    Google Scholar 

  51. X. Du, H. Yin, L. Chen, Y. Wang, Y. Yang, X. Zhou, Personalized video recommendation using rich contents from videos. IEEE Trans. Knowl. Data Eng. 32(3), 492–505 (2020)

    Article  Google Scholar 

  52. M.D. Ekstrand, J.T. Riedl, J.A. Konstan, et al., Collaborative filtering recommender systems. Foundations Trends® Human–Comput. Int. 4(2), 81–173 (2011)

    Google Scholar 

  53. M. Elahi, Y. Deldjoo, F. Bakhshandegan Moghaddam, L. Cella, S. Cereda, P. Cremonesi, Exploring the semantic gap for movie recommendations, in Proceedings of the Eleventh ACM Conference on Recommender Systems (ACM, New York, 2017), pp. 326–330

    Book  Google Scholar 

  54. D. Elsweiler, C. Trattner, M. Harvey, Exploiting food choice biases for healthier recipe recommendation, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2017), pp. 575–584

    Google Scholar 

  55. J.A. Fails, M.S. Pera, N. Kucirkova, F. Garzotto, International and interdisciplinary perspectives on children & recommender systems (kidrec), in Proceedings of the 17th ACM Conference on Interaction Design and Children (2018), pp. 705–712

    Google Scholar 

  56. A. Farseev, L. Nie, M. Akbari, T.-S. Chua, Harvesting multiple sources for user profile learning: A big data study, in Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ACM, New York, 2015), pp. 235–242

    Google Scholar 

  57. A. Farseev, I. Samborskii, A. Filchenkov, T.-S. Chua, Cross-domain recommendation via clustering on multi-layer graphs, in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information (ACM, New York, 2017), pp. 195–204

    Google Scholar 

  58. C. Feichtenhofer, A. Pinz, A. Zisserman, Convolutional two-stream network fusion for video action recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27–30, 2016 (IEEE Computer Society, Washington, DC, 2016), pp. 1933–1941

    Google Scholar 

  59. G. Friedrich, M. Zanker, A taxonomy for generating explanations in recommender systems. AI Mag. 32(3), 90–98 (2011)

    Google Scholar 

  60. X. Gao, F. Feng, X. He, H. Huang, X. Guan, C. Feng, Z. Ming, T.-S. Chua, Hierarchical attention network for visually-aware food recommendation. IEEE Trans. Multim. 22(6), 1647–1659 (2020)

    Article  Google Scholar 

  61. X. Geng, H. Zhang, J. Bian, T.-S. Chua, Learning image and user features for recommendation in social networks, in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, December 7–13, 2015 (2015), pp. 4274–4282

    Google Scholar 

  62. R.B. Girshick, Fast R-CNN, in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, December 7–13, 2015 (IEEE Computer Society, Washington, DC, 2015), pp. 1440–1448

    Google Scholar 

  63. R.B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, June 23–28, 2014 (IEEE Computer Society, Washington, DC, 2014), pp. 580–587

    Google Scholar 

  64. X. Gu, L. Shou, P. Peng, K. Chen, S. Wu, G. Chen, iGlasses: A novel recommendation system for best-fit glasses, in Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2016), pp. 1109–1112

    Google Scholar 

  65. S.C. Guntuku, S. Roy, L. Weisi, Personality modeling based image recommendation, in International Conference on Multimedia Modeling (Springer, Berlin, 2015), pp. 171–182

    Google Scholar 

  66. T.H. Haveliwala, Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)

    Article  Google Scholar 

  67. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, June 27–30, 2016 (IEEE Computer Society, Washington, DC, 2016), pp. 770–778

    Google Scholar 

  68. R. He, J. McAuley, Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering, in Proceedings of the 25th International Conference on World Wide Web (2016), pp. 507–517

    Google Scholar 

  69. R. He, J. McAuley, VBPR: visual bayesian personalized ranking from implicit feedback, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona (2016), pp. 144–150

    Google Scholar 

  70. X. He, M. Gao, M.-Y. Kan, D. Wang, Birank: towards ranking on bipartite graphs. IEEE Trans. Knowl. Data Eng. 29(1), 57–71 (2016)

    Article  Google Scholar 

  71. X. He, H. Zhang, T.-S. Chua, Recommendation technologies for multimedia content, in ICMR (2018), p. 8

    Google Scholar 

  72. M. Hou, L. Wu, E. Chen, Z. Li, V.W. Zheng, Q. Liu, Explainable fashion recommendation: A semantic attribute region guided approach, in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, August 10–16, 2019. ijcai.org (2019), pp. 4681–4688

    Google Scholar 

  73. G. Huang, Z. Liu, L. van der Maaten, K.Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21–26, 2017 (IEEE Computer Society, Washington, DC, 2017), pp. 2261–2269

    Google Scholar 

  74. H. Jiang, W. Wang, Y. Wei, Z. Gao, Y. Wang, L. Nie, What aspect do you like: Multi-scale time-aware user interest modeling for micro-video recommendation, in Proceedings of the 28th ACM International Conference on Multimedia (2020), pp. 3487–3495

    Google Scholar 

  75. M. Kaminskas, F. Ricci, M. Schedl, Location-aware music recommendation using auto-tagging and hybrid matching, in Proceedings of the 7th ACM Conference on Recommender Systems (ACM, New York, 2013), pp. 17–24

    Google Scholar 

  76. W.-C. Kang, C. Fang, Z. Wang, J.J. McAuley, Visually-aware fashion recommendation and design with generative image models, in ICDM (IEEE Computer Society, Washington, DC, 2017), pp. 207–216

    Google Scholar 

  77. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, F.-F. Li, Large-scale video classification with convolutional neural networks, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, June 23–28, 2014 (IEEE Computer Society, Washington, DC, 2014), pp. 1725–1732

    Google Scholar 

  78. R. Kaur, S. Kautish, Multimodal sentiment analysis: A survey and comparison. Int. J. Serv. Sci. Manag. Eng. Technol. 10(2), 38–58 (2019)

    Google Scholar 

  79. P. Knees, M. Schedl, Music Similarity and Retrieval: An Introduction to Audio- and Web-based Strategies. The Information Retrieval Series, vol. 36 (Springer, Berlin, 2016)

    Google Scholar 

  80. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

    Google Scholar 

  81. C.-T. Li, M.-K. Shan, Emotion-based impressionism slideshow with automatic music accompaniment, in Proceedings of the 15th ACM International Conference on Multimedia, MM ’07, New York, NY, (ACM, New York, 2007), pp. 839–842

    Google Scholar 

  82. J. Li, K. Lu, Z. Huang, H.T. Shen, Two birds one stone: On both cold-start and long-tail recommendation, in Proceedings of the 25th ACM International Conference on Multimedia (2017), pp. 898–906

    Google Scholar 

  83. Y. Li, M. Liu, J. Yin, C. Cui, X.-S. Xu, L. Nie, Routing micro-videos via a temporal graph-guided recommendation system, in Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 1464–1472

    Google Scholar 

  84. X. Li, X. Wang, X. He, L. Chen, J. Xiao, T.-S. Chua, Hierarchical fashion graph network for personalized outfit recommendation. CoRR abs/2005.12566 (2020)

    Google Scholar 

  85. X. Li, X. Wang, X. He, L. Chen, J. Xiao, T.-S. Chua, Hierarchical fashion graph network for personalized outfit recommendation, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (2020)

    Google Scholar 

  86. D. Liang, M. Zhan, D.P.W. Ellis, Content-aware collaborative music recommendation using pre-trained neural networks, in ed. by M. Müller, F. Wiering, Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR 2015, Málaga, October 26–30, 2015 (2015), pp. 295–301

    Google Scholar 

  87. L. Liao, L. Le Hong, Z. Zhang, M. Huang, T.S. Chua, MMConv: an environment for multimodal conversational search across multiple domains, in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (2021)

    Google Scholar 

  88. Z. Lin, G. Ding, J. Wang, Image annotation based on recommendation model, in Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (ACM, New York, 2011), pp. 1097–1098

    Google Scholar 

  89. Y. Lin, M. Moosaei, H. Yang, Outfitnet: Fashion outfit recommendation with attention-based multiple instance learning, in WWW ’20: The Web Conference 2020, Taipei, April 20–24, 2020 (ACM/IW3C2, New York/Geneva, 2020), pp. 77–87

    Google Scholar 

  90. J. Liu, Z. Li, J. Tang, Y. Jiang, H. Lu, Personalized geo-specific tag recommendation for photos on social websites. IEEE Trans. Multim. 16(3), 588–600 (2014)

    Article  Google Scholar 

  91. W. Liu, A. Rabinovich, A.C. Berg, Parsenet: Looking wider to see better. CoRR abs/1506.04579 (2015)

    Google Scholar 

  92. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S.E. Reed, C.-Y. Fu, A.C. Berg, SSD: Single shot multibox detector, in Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, October 11–14, 2016, Proceedings, Part I. Lecture Notes in Computer Science, vol. 9905 (Springer, Berlin, 2016), pp. 21–37

    Google Scholar 

  93. Q. Liu, S. Wu, L. Wang, Deepstyle: Learning user preferences for visual recommendation, in SIGIR (ACM, New York, 2017), pp. 841–844

    Google Scholar 

  94. L. Liu, J. Chen, P. Fieguth, G. Zhao, R. Chellappa, M. Pietikäinen, From bow to CNN: Two decades of texture representation for texture classification. Int. J. Comput. Vision 127(1), 74–109 (2019)

    Article  Google Scholar 

  95. B. Logan, Mel frequency cepstral coefficients for music modeling, in ISMIR 2000, 1st International Symposium on Music Information Retrieval, Plymouth, Massachusetts, October 23–25, 2000, Proceedings (2000)

    Google Scholar 

  96. D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)

    Article  Google Scholar 

  97. H. Luo, J. Fan, D.A. Keim, S. Satoh, Personalized news video recommendation, in International Conference on Multimedia Modeling (Springer, Berlin, 2009), pp. 459–471

    Google Scholar 

  98. J. Ma, G. Li, M. Zhong, X. Zhao, L. Zhu, X. Li, LGA: latent genre aware micro-video recommendation on social media. Multim. Tools Appl. 77(3), 2991–3008 (2018)

    Article  Google Scholar 

  99. J. McAuley, C. Targett, Q. Shi, A. Van Den Hengel, Image-based recommendations on styles and substitutes, in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2015), pp. 43–52

    Google Scholar 

  100. B. McFee, G.R.G. Lanckriet, The natural language of playlists, in ed. by A. Klapuri, C. Leider, Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, Miami, Florida, October 24–28, 2011 (University of Miami, Coral Gables, 2011), pp. 537–542

    Google Scholar 

  101. T. Mei, B. Yang, X.-S. Hua, L. Yang, S.-Q. Yang, S. Li, Videoreach: An online video recommendation system, in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2007), pp. 767–768

    Google Scholar 

  102. T. Mei, B. Yang, X.-S. Hua, S. Li, Contextual video recommendation by multimodal relevance and user feedback. ACM Trans. Inf. Syst. 29(2), 10 (2011)

    Google Scholar 

  103. P. Melville, R.J. Mooney, R. Nagarajan, Content-boosted collaborative filtering for improved recommendations, in Eighteenth National Conference on Artificial Intelligence, (American Association for Artificial Intelligence, Menlo Park, 2002), pp. 187–192

    Google Scholar 

  104. L. Meng, F. Feng, X. He, X. Gao, T.-S. Chua, Heterogeneous fusion of semantic and collaborative information for visually-aware food recommendation, in MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event/Seattle, WA, October 12–16, 2020 (ACM, New York, 2020), pp. 3460–3468

    Google Scholar 

  105. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2–4, 2013 (2013)

    Google Scholar 

  106. M. Müller, Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications (Springer, Berlin, 2015)

    Book  Google Scholar 

  107. X. Ning, G. Karypis, SLIM: sparse linear methods for top-n recommender systems, in D.J. Cook, J. Pei, W. Wang, O.R. Zaïane, X. Wu, 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, December 11–14, 2011 (IEEE Computer Society, Washington, DC, 2011), pp. 497–506

    Google Scholar 

  108. X. Ning, G. Karypis, Sparse linear methods with side information for top-n recommendations, in Proceedings of the Sixth ACM Conference on Recommender Systems, RecSys ’12 New York, NY, (Association for Computing Machinery, New York, 2012), pp. 155–162

    Book  Google Scholar 

  109. W. Niu, J. Caverlee, H. Lu, Neural personalized ranking for image recommendation, in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (ACM, New York, 2018), pp. 423–431

    Google Scholar 

  110. T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Analy. Mach. Intell. 24(7), 971–987 (2002)

    Article  MATH  Google Scholar 

  111. A.v.d. Oord, S. Dieleman, B. Schrauwen, Deep content-based music recommendation, in ed. by C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K. Weinberger, Advances in Neural Information Processing Systems 26 (NIPS) (Curran Associates, Lake Tahoe, NV, 2013), pp. 2643–2651

    Google Scholar 

  112. S. Oramas, V.C. Ostuni, T. Di Noia, X. Serra, E. Di Sciascio, Sound and music recommendation with knowledge graphs. ACM Trans. Intell. Syst. Technol. 8(2), 21:1–21:21 (2016)

    Google Scholar 

  113. S. Oramas, O. Nieto, M. Sordo, X. Serra, A deep multimodal approach for cold-start music recommendation, in Proceedings of the 2Nd Workshop on Deep Learning for Recommender Systems, DLRS 2017, New York, NY (ACM, New York, 2017), pp. 32–37

    Book  Google Scholar 

  114. L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  115. O.M. Parkhi, A. Vedaldi, A. Zisserman, Deep face recognition, in Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, September 7–10, 2015 (BMVA Press, Swansea, 2015), pp. 41.1–41.12

    Google Scholar 

  116. B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2014), pp. 701–710

    Google Scholar 

  117. L. Peska, H. Trojanova, Towards recommender systems for police photo lineup. Preprint arXiv:1707.01389 (2017)

    Google Scholar 

  118. J. Pons, X. Serra, musicnn: Pre-trained convolutional neural networks for music audio tagging. Preprint arXiv:1909.06654 (2019)

    Google Scholar 

  119. L.R. Rabiner, B.-H. Juang, Fundamentals of Speech Recognition. Prentice Hall Signal Processing Series (Prentice Hall, Hoboken, 1993)

    Google Scholar 

  120. D. Ramachandram, G.W. Taylor, Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Proc. Mag. 34(6), 96–108 (2017)

    Article  Google Scholar 

  121. Y.S. Rawat, M.S. Kankanhalli, Clicksmart: a context-aware viewpoint recommendation system for mobile photography. IEEE Trans. Circuits Syst. Video Techn. 27(1), 149–158 (2017)

    Article  Google Scholar 

  122. S. Ren, K. He, R.B. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec (2015), pp. 91–99

    Google Scholar 

  123. S. Rendle, Factorization machines with libFM. ACM Trans. Intell. Syst. Technol. 3(3), 57 (2012)

    Google Scholar 

  124. M.T. Ribeiro, S. Singh, C. Guestrin, Why should i trust you?, in Proceeding of the International Conference on Knowledge Discovery and Data Mining (KDD) (ACM, New York, 2016), pp. 1135–1144

    Google Scholar 

  125. S. Roy, S.C. Guntuku, Latent factor representations for cold-start video recommendation, in Proceedings of the 10th ACM Conference on Recommender Systems (ACM, New York, 2016), pp. 99–106

    Google Scholar 

  126. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M.S. Bernstein, A.C. Berg, F.-F. Li, Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  127. P. Sánchez, A. Bellogín, On the effects of aggregation strategies for different groups of users in venue recommendation. Inf. Proc. Manag. 58(5), 102609 (2021)

    Google Scholar 

  128. M. Schedl, H. Zamani, C.-W. Chen, Y. Deldjoo, M. Elahi, Current challenges and visions in music recommender systems research. Int. J. Multim. Inf. Retr. 7(2), 95–116 (2018)

    Article  Google Scholar 

  129. K. Seyerlehner, G. Widmer, T. Pohle, Fusing block-level features for music similarity estimation, in Proceedings of the 13th International Conference on Digital Audio Effects (DAFx-10), Graz, September 6–10 (2010)

    Google Scholar 

  130. Y. Shi, M. Larson, A. Hanjalic, Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Comput. Surv. 47(1), 3 (2014)s

    Google Scholar 

  131. K. Simonyan, A. Zisserman, Two-stream convolutional networks for action recognition in videos, in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec (2014), pp. 568–576

    Google Scholar 

  132. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, in ed. by Y. Bengio, Y. LeCun, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA,May 7–9, 2015, Conference Track Proceedings (2015)

    Google Scholar 

  133. J. Smith, D. Weeks, M. Jacob, J. Freeman, B. Magerko, Towards a hybrid recommendation system for a sound library, in Joint Proceedings of the ACM IUI 2019 Workshops co-located with the 24th ACM Conference on Intelligent User Interfaces (ACM IUI 2019), Los Angeles, March 20, 2019 (2019)

    Google Scholar 

  134. J. Smith, D. Weeks, M. Jacob, J. Freeman, B. Magerko, Towards a hybrid recommendation system for a sound library, in IUI Workshops (2019)

    Google Scholar 

  135. J. Song, Y. Yang, Z. Huang, H.T. Shen, J. Luo, Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multim. 15(8), 1997–2008 (2013)

    Article  Google Scholar 

  136. G.-L. Sun, Z.-Q. Cheng, X. Wu, Q. Peng, Personalized clothing recommendation combining user social circle and fashion style consistency. Multim. Tools Appl. 77(14), 17731–17754 (2018)

    Article  Google Scholar 

  137. J. Tang, X. Du, X. He, F. Yuan, Q. Tian, T.-S. Chua, Adversarial training towards robust multimedia recommender system. IEEE Trans. Knowl. Data Eng. 32, 855–867 (2019)

    Article  Google Scholar 

  138. Z. Tao, Y. Wei, X. Wang, X. He, X. Huang, T.-S. Chua, MGAT: Multimodal graph attention network for recommendation. Inf. Proc. Manag. 57(5), 102277 (2020)

    Google Scholar 

  139. I. Tautkute, A. Możejko, W. Stokowiec, T. Trzciński, Ł. Brocki, K. Marasek, What looks good with my sofa: Multimodal search engine for interior design, in ed. by M. Ganzha, L. Maciaszek, M. Paprzycki, Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. Annals of Computer Science and Information Systems, vol. 11 (IEEE, Piscataway, 2017), pp. 1275–1282

    Google Scholar 

  140. J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers, A.W.M. Smeulders, Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)

    Article  Google Scholar 

  141. A. Vall, M. Dorfer, H. Eghbal-zadeh, M. Schedl, K. Burjorjee, G. Widmer, Feature-combination hybrid recommender systems for automated music playlist continuation. User Model. User-Adap. Interac. J. Personaliz. Res. 29, 527–572 (2019)

    Article  Google Scholar 

  142. R. van den Berg, T.N. Kipf, M. Welling, Graph convolutional matrix completion, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018)

    Google Scholar 

  143. H. Wang, C. Schmid, Action recognition with improved trajectories, in IEEE International Conference on Computer Vision, ICCV 2013, Sydney, December 1–8, 2013 (IEEE Computer Society, Washington, DC, 2013), pp. 3551–3558

    Google Scholar 

  144. X. Wang, Y. Wang, Improving content-based and hybrid music recommendation using deep learning, in Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, November 03–07, 2014 (ACM, New York, 2014), pp. 627–636

    Google Scholar 

  145. F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, X. Tang, Residual attention network for image classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 3156–3164

    Google Scholar 

  146. S. Wang, Y. Wang, J. Tang, K. Shu, S. Ranganath, H. Liu, What your images reveal: Exploiting visual contents for point-of-interest recommendation, in Proceedings of the 26th International Conference on World Wide Web (2017), pp. 391–400

    Google Scholar 

  147. J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, D.L. Lee, Billion-scale commodity embedding for e-commerce recommendation in alibaba, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), pp 839–848

    Google Scholar 

  148. X. Wang, X. He, M. Wang, F. Feng, T.-S. Chua, Neural graph collaborative filtering, in Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019), pp. 165–174

    Google Scholar 

  149. Y. Wei, Z. Cheng, X. Yu, Z. Zhao, L. Zhu, L. Nie, Personalized hashtag recommendation for micro-videos, in Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 1446–1454

    Google Scholar 

  150. Y. Wei, X. Wang, L. Nie, X. He, R. Hong, T.-S. Chua, MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video, in Proceedings of the 27th ACM International Conference on Multimedia (2019), pp. 1437–1445

    Google Scholar 

  151. Y. Wei, X. Wang, L. Nie, X. He, T.-S. Chua, Graph-refined convolutional network for multimedia recommendation with implicit feedback, in Proceedings of the 28th ACM International Conference on Multimedia (2020), pp. 3541–3549

    Google Scholar 

  152. J. Wen, J. She, X. Li, H. Mao, Visual background recommendation for dance performances using deep matrix factorization. TOMCCAP 14(1), 11:1–11:19 (2018)

    Google Scholar 

  153. Z. Wu, S. Jiang, Q. Huang, Friend recommendation according to appearances on photos, in ed. by W. Gao, Y. Rui, A. Hanjalic, C. Xu, E.G. Steinbach, A. El-Saddik, M.X. Zhou, Proceedings of the 17th International Conference on Multimedia 2009, Vancouver, British Columbia, October 19–24, 2009 (ACM, New York, 2009), pp. 987–988

    Google Scholar 

  154. C.-C. Wu, T. Mei, W.H. Hsu, Y. Rui, Learning to personalize trending image search suggestion, in Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (ACM, New York, 2014), pp. 727–736

    Google Scholar 

  155. J. Wu, X. He, X. Wang, Q. Wang, W. Chen, J. Lian, X. Xie, Y. Zhang, Graph convolution machine for context-aware recommender system. Preprint arXiv:2001.11402 (2020)

    Google Scholar 

  156. Z. Xing, M. Parandehgheibi, F. Xiao, N. Kulkarni, C. Pouliot, Content-based recommendation for podcast audio-items using natural language processing techniques, in 2016 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2016), pp. 2378–2383

    Book  Google Scholar 

  157. R. Yan, M. Lapata, X. Li, Tweet recommendation with graph co-ranking, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2012), pp. 516–525

    Google Scholar 

  158. L. Yang, C.-K. Hsieh, H. Yang, J.P. Pollak, N. Dell, S. Belongie, C. Cole, D. Estrin, Yum-me: a personalized nutrient-based meal recommender system. ACM Trans. Inf. Syst. 36(1), 7 (2017)

    Google Scholar 

  159. R. Ying, R. He, K. Chen, P. Eksombatchai, W.L. Hamilton, J. Leskovec, Graph convolutional neural networks for web-scale recommender systems, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2018), pp. 974–983

    Google Scholar 

  160. K. Yoshii, M. Goto, K. Komatani, T. Ogata, H.G. Okuno, IEEE Trans. Audio Speech Language Proc. 16(2), 435–447 (2008)

    Article  Google Scholar 

  161. W. Yu, H. Zhang, X. He, X. Chen, L. Xiong, Z. Qin, Aesthetic-based clothing recommendation, in Proceedings of the 2018 World Wide Web Conference (2018), pp. 649–658

    Google Scholar 

  162. J. Zhang, X. Shi, S. Zhao, I. King, Star-gcn: Stacked and reconstructed graph convolutional networks for recommender systems, in The 28th International Joint Conference on Artificial Intelligence (2019), pp. 4264–4270

    Google Scholar 

  163. S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 1–38 (2019)

    Article  Google Scholar 

  164. X. Zhao, H. Luan, J. Cai, J. Yuan, X. Chen, Z. Li, Personalized video recommendation based on viewing history with the study on youtube, in Proceedings of the 4th International Conference on Internet Multimedia Computing and Service (ACM, New York, 2012), pp. 161–165

    Google Scholar 

  165. B. Zhou, À. Lapedriza, J. Xiao, A. Torralba, A. Oliva, Learning deep features for scene recognition using places database, in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8–13 2014, Montreal, Quebec (2014), pp. 487–495

    Google Scholar 

  166. Q. Zhu, M.-L. Shyu, H. Wang, Videotopic: Content-based video recommendation using a topic model, in 2013 IEEE International Symposium on Multimedia (ISM) (IEEE, Piscataway, 2013), pp. 219–222

    Google Scholar 

  167. U. Zoelzer, Digital Audio Signal Processing (Wiley, Hoboken, 2008)

    Book  Google Scholar 

Download references

Acknowledgements

We would like to thank Jan Schlüter for providing image examples.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yashar Deldjoo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Science+Business Media, LLC, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Deldjoo, Y., Schedl, M., Hidasi, B., Wei, Y., He, X. (2022). Multimedia Recommender Systems: Algorithms and Challenges. In: Ricci, F., Rokach, L., Shapira, B. (eds) Recommender Systems Handbook. Springer, New York, NY. https://doi.org/10.1007/978-1-0716-2197-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2197-4_25

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-0716-2196-7

  • Online ISBN: 978-1-0716-2197-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics