Using visual features based on MPEG-7 and deep learning for movie recommendation

  • Yashar Deldjoo
  • Mehdi ElahiEmail author
  • Massimo Quadrana
  • Paolo Cremonesi
Regular Paper


Item features play an important role in movie recommender systems, where recommendations can be generated by using explicit or implicit preferences of users on traditional features (attributes) such as tag, genre, and cast. Typically, movie features are human-generated, either editorially (e.g., genre and cast) or by leveraging the wisdom of the crowd (e.g., tag), and as such, they are prone to noise and are expensive to collect. Moreover, these features are often rare or absent for new items, making it difficult or even impossible to provide good quality recommendations. In this paper, we show that users’ preferences on movies can be well or even better described in terms of the mise-en-scène features, i.e., the visual aspects of a movie that characterize design, aesthetics and style (e.g., colors, textures). We use both MPEG-7 visual descriptors and Deep Learning hidden layers as examples of mise-en-scène features that can visually describe movies. These features can be computed automatically from any video file, offering the flexibility in handling new items, avoiding the need for costly and error-prone human-based tagging, and providing good scalability. We have conducted a set of experiments on a large catalog of 4K movies. Results show that recommendations based on mise-en-scène features consistently outperform traditional metadata attributes (e.g., genre and tag).


Multimedia recommendation Video analysis Deep learning Cold start Visual features 



This work is supported by Telecom Italia S.p.A., Open Innovation Department, Joint Open Lab S-Cube, Milan. The work has been also supported by the Amazon AWS Cloud Credits for Research program.


  1. 1.
    Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749CrossRefGoogle Scholar
  2. 2.
    Bao X, Fan S, Varshavsky A, Li K, Roy Choudhury R (2013) Your reactions suggest you liked the movie: automatic content rating via reaction sensing. In: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing. ACM, pp 197–206Google Scholar
  3. 3.
    Bastan M, Cam H, Gudukbay U, Ulusoy O (2010) Bilvideo-7: an mpeg-7-compatible video indexing and retrieval system. IEEE MultiMed 17(3):62–73CrossRefGoogle Scholar
  4. 4.
    Bogdanov D, Serrà J, Wack N, Herrera P, Serra X (2011) Unifying low-level and high-level music similarity measures. IEEE Trans Multimed 13(4):687–701CrossRefGoogle Scholar
  5. 5.
    Braunhofer M, Elahi M, Ricci F (2014) Techniques for cold-starting context-aware mobile recommender systems for tourism. Intelligenza Artificiale 8(2):129–143Google Scholar
  6. 6.
    Brezeale D, Cook DJ (2008) Automatic video classification: a survey of the literature. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3):416–430CrossRefGoogle Scholar
  7. 7.
    Buckland W (2008) What does the statistical style analysis of film involve? A review of moving into pictures. More on film history, style, and analysis. Lit Linguist Comput 23(2):219–230CrossRefGoogle Scholar
  8. 8.
    Cantador I, Szomszor M, Alani H, Fernández M, Castells P (2008) Enriching ontological user profiles with tagging history for multi-domain recommendations. In: 1st International workshop on collective semantics: collective intelligence & the semantic web (CISWeb 2008), Tenerife, SpainGoogle Scholar
  9. 9.
    Cremonesi P, Elahi M, Garzotto F (2015) Interaction design patterns in recommender systems. In: Proceedings of the 11th biannual conference on Italian SIGCHI chapter. ACM, pp 66–73Google Scholar
  10. 10.
    Cremonesi P, Elahi M, Garzotto F (2017) User interface patterns in recommendation-empowered content intensive multimedia applications. Multimed Tools Appl 76(4):5275–5309CrossRefGoogle Scholar
  11. 11.
    Cremonesi P, Garzotto F, Negro S, Papadopoulos AV, Turrin R (2011) Looking for good recommendations: a comparative evaluation of recommender systems. In: Human–computer interaction–INTERACT 2011. Springer, pp 152–168Google Scholar
  12. 12.
    Cremonesi P, Koren Y, Turrin R (2010) Performance of recommender algorithms on top-n recommendation tasks. In: Proceedings of the 2010 ACM conference on recommender systems, RecSys 2010, Barcelona, Spain, September 26–30, 2010, pp 39–46Google Scholar
  13. 13.
    Deldjoo Y, Atani RE (2016) A low-cost infrared-optical head tracking solution for virtual 3d audio environment using the nintendo wii-remote. Entertain Comput 12:9–27CrossRefGoogle Scholar
  14. 14.
    Deldjoo Y, Constantin MG, Schedl M, Ionescu B, Cremonesi P (2018) Mmtf-14k: a multifaceted movie trailer feature dataset for recommendation and retrieval. In: Proceedings of the 9th ACM multimedia systems conference. ACMGoogle Scholar
  15. 15.
    Deldjoo Y, Cremonesi P, Schedl M, Quadrana M (2017) The effect of different video summarization models on the quality of video recommendation based on low-level visual features. In: Proceedings of the 15th international workshop on content-based multimedia indexing. ACM, p 20Google Scholar
  16. 16.
    Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P (2016) Recommending movies based on mise-en-scene design. In: Proceedings of the 2016 CHI conference extended abstracts on human factors in computing systems. ACM, pp 1540–1547Google Scholar
  17. 17.
    Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P, Quadrana M (2016) Content-based video recommendation system based on stylistic visual features. J Data Semant 5:1–15CrossRefGoogle Scholar
  18. 18.
    Deldjoo Y, Elahi M, Quadrana M, Cremonesi P, Garzotto F (2015) Toward effective movie recommendations based on mise-en-scène film styles. In: Proceedings of the 11th biannual conference on Italian SIGCHI chapter. ACM, pp 162–165Google Scholar
  19. 19.
    Deldjoo Y, Elahi Y, Cremonesi P, Moghaddam FB, Caielli ALE (2017) How to combine visual features with tags to improve movie recommendation accuracy? In: E-commerce and web technologies: 17th international conference, EC-Web 2016, Porto, Portugal, September 5–8, 2016, Revised Selected Papers, vol. 278. Springer, p 34Google Scholar
  20. 20.
    Deldjoo Y, Frà C, Valla M, Cremonesi P (2017) Letting users assist what to watch: an interactive query-by-example movie recommendation system. In: Proceedings of the 8th Italian information retrieval workshop, Lugano, Switzerland, June 05–07, 2017, pp 63–66. Accessed 15 Dec 2017
  21. 21.
    Dorai C, Venkatesh S (2001) Computational media aesthetics: finding meaning beautiful. IEEE MultiMed 8(4):10–12CrossRefGoogle Scholar
  22. 22.
    Elahi M, Braunhofer M, Ricci F, Tkalcic M (2013) Personality-based active learning for collaborative filtering recommender systems. In: Congress of the Italian association for artificial intelligence. Springer, pp 360–371Google Scholar
  23. 23.
    Elahi M, Deldjoo Y, Bakhshandegan Moghaddam F, Cella L, Cereda S, Cremonesi P (2017) Exploring the semantic gap for movie recommendations. In: Proceedings of the eleventh ACM conference on recommender systems. ACM, pp 326–330Google Scholar
  24. 24.
    Elahi M, Ricci F, Repsys V (2011) System-wide effectiveness of active learning in collaborative filtering. In: Proceedings of the international workshop on social web mining, co-located with IJCAI, Barcelona, SpainGoogle Scholar
  25. 25.
    Elahi M, Ricci F, Rubens N (2013) Active learning strategies for rating elicitation in collaborative filtering: a system-wide perspective. ACM Trans Intell Syst Technol (TIST) 5(1):13Google Scholar
  26. 26.
    Elahi M, Ricci F, Rubens N (2016) A survey of active learning in collaborative filtering recommender systems. Comput Sci Rev 20:29–50MathSciNetCrossRefGoogle Scholar
  27. 27.
    Fleischman M, Hovy E (2003) Recommendations without user preferences: a natural language processing approach. In: Proceedings of the 8th international conference on Intelligent user interfaces. ACM, pp 242–244Google Scholar
  28. 28.
    Gedikli F, Jannach D, Ge M (2014) How should i explain? A comparison of different explanation types for recommender systems. Int J Hum Comput Stud 72(4):367–382CrossRefGoogle Scholar
  29. 29.
    Haghighat M, Abdel-Mottaleb M, Alhalabi W (2016) Fully automatic face normalization and single sample face recognition in unconstrained environments. Expert Syst Appl 47:23–34CrossRefGoogle Scholar
  30. 30.
    Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664CrossRefGoogle Scholar
  31. 31.
    Harper FM, Konstan JA (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst (TiiS) 5(4):19Google Scholar
  32. 32.
    He R, McAuley J (2015) Vbpr: visual bayesian personalized ranking from implicit feedback. arXiv preprint arXiv:1510.01784
  33. 33.
    Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 41(6):797–819CrossRefGoogle Scholar
  34. 34.
    Jakob N, Weber SH, Müller MC, Gurevych I (2009) Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations. In: Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. ACM, pp 57–64Google Scholar
  35. 35.
    Lika B, Kolomvatsos K, Hadjiefthymiades S (2014) Facing the cold start problem in recommender systems. Expert Syst Appl 41(4):2065–2073CrossRefGoogle Scholar
  36. 36.
    Manjunath BS, Ohm JR, Vasudevan VV, Yamada A (2001) Color and texture descriptors. IEEE Trans Circuits Syst Video Technol 11(6):703–715CrossRefGoogle Scholar
  37. 37.
    Manjunath BS, Salembier P, Sikora T (2002) Introduction to MPEG-7: multimedia content description interface, vol 1. Wiley, ChichesterGoogle Scholar
  38. 38.
    Melville P, Sindhwani V (2011) Recommender systems. In: Encyclopedia of machine learning. Springer, pp 829–838Google Scholar
  39. 39.
    Musto C, Narducci F, Lops P, Semeraro G, de Gemmis M, Barbieri M, Korst J, Pronk V, Clout R (2012) Enhanced semantic tv-show representation for personalized electronic program guides. In: User modeling, adaptation, and personalization. Springer, pp 188–199Google Scholar
  40. 40.
    Nasery M, Elahi M, Cremonesi P (2015) Polimovie: a feature-based dataset for recommender systems. In: ACM RecSys workshop on crowdsourcing and human computation for recommender systems (CrawdRec), vol 3, pp 25–30Google Scholar
  41. 41.
    Ning X, Karypis G (2012) Sparse linear methods with side information for top-n recommendations. In: Proceedings of the sixth ACM conference on Recommender systems. ACM, pp 155–162Google Scholar
  42. 42.
    Rasheed Z, Shah M (2003) Video categorization using semantics and semiotics. In: Video mining. Springer, pp 185–217Google Scholar
  43. 43.
    Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Trans Circuits Syst Video Technol 15(1):52–64CrossRefGoogle Scholar
  44. 44.
    Rendle S, Freudenthaler C, Gantner Z, Schmidt-Thieme L (2009) Bpr: Bayesian personalized ranking from implicit feedback. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. AUAI Press, pp 452–461Google Scholar
  45. 45.
    Rubens N, Elahi M, Sugiyama M, Kaplan D (2015) Active learning in recommender systems. In: Recommender systems handbook. Springer, pp 809–846Google Scholar
  46. 46.
    Saveski M, Mantrach A (2014) Item cold-start recommendations: learning local collective embeddings. In: Proceedings of the 8th ACM conference on recommender systems. ACM, pp 89–96Google Scholar
  47. 47.
    Schedl M, Zamani H, Chen CW, Deldjoo Y, Elahi M (2018) Current challenges and visions in music recommender systems research. Int J Multimed Inf Retr. CrossRefGoogle Scholar
  48. 48.
    Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 253–260Google Scholar
  49. 49.
    Shi Y, Larson M, Hanjalic A (2014) Collaborative filtering beyond the user-item matrix: a survey of the state of the art and future challenges. ACM Comput Surv (CSUR) 47(1):3CrossRefGoogle Scholar
  50. 50.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
  51. 51.
    Szomszor M, Cattuto C, Alani H, O’Hara K, Baldassarri A, Loreto V, Servedio VDP (2007) Folksonomies, the semantic web, and movie recommendation. In: 4th European Semantic Web Conference, Bridging the Gap between Semantic Web and Web 2.0, Innsbruck, AustriaGoogle Scholar
  52. 52.
    Tubularinsights: 500 hours of video uploaded to youtube every minute [forecast]. Accessed 19 Jan 2018
  53. 53.
    Vig J, Sen S, Riedl J (2009) Tagsplanations: explaining recommendations using tags. In: Proceedings of the 14th international conference on intelligent user interfaces. ACM, pp 47–56Google Scholar
  54. 54.
    Wang XY, Zhang BB, Yang HY (2014) Content-based image retrieval by integrating color and texture features. Multimed Tools Appl 68(3):545–569CrossRefGoogle Scholar
  55. 55.
    Wang Y, Xing C, Zhou L (2006) Video semantic models: survey and evaluation. Int J Comput Sci Netw Secur 6:10–20Google Scholar
  56. 56.
    Xu S, Jiang H, Lau F (2008) Personalized online document, image and video recommendation via commodity eye-tracking. In: Proceedings of the 2008 ACM conference on recommender systems. ACM, pp 83–90Google Scholar
  57. 57.
    Yang B, Mei T, Hua XS, Yang L, Yang SQ, Li M (2007) Online video recommendation based on multimodal fusion and relevance feedback. In: Proceedings of the 6th ACM international conference on image and video retrieval. ACM, pp 73–80Google Scholar
  58. 58.
    Zettl H (2002) Essentials of applied media aesthetics. In: Dorai C, Venkatesh S (eds) Media computing. The Springer international series in video computing, vol 4. Springer, Berlin, pp 11–38Google Scholar
  59. 59.
    Zettl H (2013) Sight, sound, motion: applied media aesthetics. Cengage Learning, BostonGoogle Scholar
  60. 60.
    Zhang ZK, Liu C, Zhang YC, Zhou T (2010) Solving the cold-start problem in recommender systems with social tags. EPL (Europhys Lett 92(2):28,002CrossRefGoogle Scholar
  61. 61.
    Zhao X, Li G, Wang M, Yuan J, Zha ZJ, Li Z, Chua TS (2011) Integrating rich information for video recommendation with multi-task rank aggregation. In: Proceedings of the 19th ACM international conference on Multimedia. ACM, pp 1521–1524Google Scholar
  62. 62.
    Zhou H, Hermans T, Karandikar AV, Rehg JM (2010) Movie genre classification via scene categorization. In: Proceedings of the international conference on multimedia. ACM, pp 747–750Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Politecnico di MilanoMilanItaly
  2. 2.Free University of Bozen - BolzanoBolzanoItaly

Personalised recommendations