Abstract
The increasing popularity of digital music and the growing ubiquity of network connection have promoted the expansion of online music sharing platforms (e.g., YouTube, Spotify). In this paper, we focus on a challenging problem of connotation-aware music retrieval with visual inputs. The goal of the problem is to explore the implicit feeling or emotion expressed beyond the explicit contents in music and image and retrieves music pieces relevant to the connotation implicitly conveyed in the visual inputs. Two critical challenges exist in solving the connotation-aware music retrieval problem: (1) it is challenging to accurately identify the implicit connotation from both images and music pieces; (2) it is non-trivial to establish the correct connotative association across different data modalities. To address the above challenges, we present a novel classic-enriched connotation-aware music retrieval (CCMR) system to effectively identify connotation-aware music for visual inputs. We evaluate the proposed CCMR system on a real-world dataset. Results show that CCMR outperforms state-of-the-art baselines in retrieving music pieces that are highly relevant to the connotation of the visual inputs.
Similar content being viewed by others
Notes
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Adhikari K, Panda RK (2018) Users’ information privacy concerns and privacy protection behaviors in social networks. J Glob Mark 31(2):96–110
Aguiar L (2017) Let the music play? Free streaming and its effects on digital music consumption. Inf Econ Policy 41:1–14
Ahn HJ (2006) Utilizing popularity characteristics for product recommendation. Int J Electron Commer 11(2):59–80
Andjelkovic I, Parra D, O’Donovan J (2016) Moodplay: interactive mood-based music discovery and recommendation. In: Proceedings of the 2016 conference on user modeling adaptation and personalization, pp 275–279
Cao Y, Wang X, He X, Hu Z, Chua T-S (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In: The world wide web conference, pp 151–161
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229
Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Trans Inf Syst (TOIS) 34(2):1–32
Choi J-H, Lee J-S (2014) Evotunes: crowdsourcing-based music recommendation. In: International conference on multimedia modeling, pp 331–338
Forceville C, Urios-Aparisi E (2009) Multimodal metaphor. Walter de Gruyter 11
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864
Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: econvolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861,
Hsia C-C, Lai K-H, Chen Y, Wang C-J, Tsai M-F (2018) Representation learning for image-based music recommendation. arXiv preprint arXiv:1808.09198
Huang L, Wang W, Xia Y, Chen J (2019) Adaptively aligned image captioning via adaptive attention time. In: Advances in neural information processing systems, pp 8940–8949
Hui H, Zhou C, Xu S, Lin F (2020) A novel secure data transmission scheme in industrial internet of things. China Commun 17(1):73–88
Hukkelås H, Mester R, Lindseth F (2019) Deepprivacy: a generative adversarial network for face anonymization. In: International symposium on visual computing, pp 565–578
Khan SK, Zhang DY, Kou Z, Zhang Y, Wang D (2021) Photostylist: altering the style of photos based on the connotations of texts. In: Pacific-Asia conference on knowledge discovery and data mining, pp 642–654
Krause J, Johnson J, Krishna R, Fei-Fei L (2017) A hierarchical approach for generating descriptive image paragraphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–325
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73
Liang F, Lin C, Guo R, Sun M, Wu W, Yan J, Ouyang W (2019) Computation reallocation for object detection. arXiv preprint arXiv:1912.11234,
Liem CC, Larson M, Hanjalic A (2013) When music makes a scene. Int J Multimed Inf Retr 2(1):15–30
Marshall J, Wang D (2016) Mood-sensitive truth discovery for reliable recommendation systems in social sensing. In: Proceedings of the 10th ACM conference on recommender systems, pp 167–174
Mihalcea R, Strapparava C (2012) Lyrics, music, and emotions. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 590–599
Müller-Zettelmann E, Rubik M (2005) Theory into poetry: new approaches to the lyric. Rodopi 89
Niu W, Caverlee J, Lu H (2018) Neural personalized ranking for image recommendation. In: Proceedings of the 11th ACM international conference on web search and data mining, pp 423–431
Oramas S, Ostuni VC, Noia TD, Serra X, Sciascio ED (2016) Sound and music recommendation with knowledge graphs. ACM Trans Intell Syst Technol (TIST) 8(2)
Palumbo E, Rizzo G, Troncy R, Baralis E, Osella M, Ferro E (2018) Knowledge graph embeddings with node2vec for item recommendation. In: European semantic web conference, pp 117–120
Pannese A, Rappaz M-A, Grandjean D (2016) Metaphor and music emotion: ancient views and future directions. Conscious Cogn 44:61–71
Patra BG, Das D, Bandyopadhyay S (2017) Retrieving similar lyrics for music recommendation system. In: Proceedings of the 14th international conference on natural language processing, pp 290–297
Qiu Y, Kataoka H (2018) Image generation associated with music data. In: CVPR workshops, pp 2510–2513
Rashid MT, Wang D (2020) Covidsens: a vision on reliable social sensing for covid-19. Artif Intell Rev 1–25
Roche C (2018) Copyright-free image sites: a comparison. Sch Libr 66(3):150–150
Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145
Sánchez-Moreno D, González ABG, Vicente MDM, Batista VFL, García MNM (2016) A collaborative filtering method for music recommendation using playing coefficients for artists and users. Expert Syst Appl 66:234–244
Sarker M, Kamal M, Rashwan HA, Abdel-Nasser M, Singh VK, Banu SF, Akram F, Chowdhury FU, Choudhury KA, Chambon S et al (2019) Mobilegan: skin lesion segmentation using a lightweight generative adversarial network. arXiv preprint arXiv:1907.00856
Sasaki S, Hirai T, Ohya H, Morishima S (2013) Affective music recommendation system reflecting the mood of input image. In: 2013 International conference on culture and computing, pp 153–154
Seyler D, Chandar P, Davis M (2018) An information retrieval framework for contextual suggestion based on heterogeneous information network embeddings. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 953–956
Shang L, Zhang Y, Zhang D, Wang D (2020) Fauxward: a graph neural network approach to fauxtography detection using social media comments. Soc Netw Anal Min 10(1):1–16
Shang L, Yue ZD, Karim KS, Shen J, Wang D (2020) Camr: towards connotation-aware music retrieval on social media with visual inputs. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 425–429
Shang L, Zhang Y, Zha Y, Chen Y, Youn C, Wang D (2021) Aomd: an analogy-aware approach to offensive meme detection on social media. arXiv preprint arXiv:2106.11229
Shutova E, Kiela D, Maillard J (2016) Black holes and white rabbits: metaphor identification with visual features. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 160–170
Speck JA, Schmidt EM, Morton BG, Kim YE (2011) A comparative study of collaborative versus traditional musical mood annotation. In: 12th international society for music information retrieval conference (ISMIR), vol 104, pp 549–554
Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Advances in neural information processing systems, pp 2643–2651
Volokhin S, Agichtein E (2018) Understanding music listening intents during daily activities with implications for contextual music recommendation. In: Proceedings of the 2018 conference on human information interaction & retrieval, pp 313–316
Wang D, Kaplan L, Abdelzaher T, Aggarwal CC (2013) On credibility estimation tradeoffs in assured social sensing. IEEE J Sel Areas Commun 31(6):1026–1037
Wang D, Szymanski BK, Abdelzaher T, Ji H, Kaplan L (2019) The age of social sensing. Computer 52(1):36–45
Wang X, Rosenblum D, Wang Y (2012) Context-aware mobile music recommendation for daily activities. In: Proceedings of the 20th ACM international conference on multimedia, pp 99–108
Wang D, Xu G, Deng S (2017) Music recommendation via heterogeneous information graph embedding. In: 2017 International joint conference on neural networks (IJCNN), pp 596–603
Wang H, Zhang F, Xie X, Guo M (2018) Dkn: deep knowledge-aware network for news recommendation. In: Proceedings of the 2018 world wide web conference, pp 1835–1844
Yang L, Tang K, Yang J, Li L-J (2017) Dense captioning with joint inference and visual context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2193–2202
Yu J, Lu Y, Qin Z, Zhang W, Liu Y, Tan J, Guo L (2018) Modeling text with graph convolutional network for cross-modal information retrieval. In: Pacific rim conference on multimedia, pp 223–234
Zhang DY, Shang L, Geng B, Lai S, Li K, Zhu H, Amin MT, Wang D (2018) Fauxbuster: a content-free fauxtography detector using social media comments. In: 2018 IEEE international conference on big data (big data), pp 891–900
Zhang Y, Dong X, Shang L, Zhang D, Wang D (2020) A multi-modal graph neural network approach to traffic risk forecasting in smart urban sensing. In: 2020 17th Annual IEEE international conference on sensing, communication, and networking (SECON), pp 1–9
Zhang D, Ni B, Zhi Q, Plummer T, Li Q, Zheng H, Zeng Q, Zhang Y, Wang D (2019) Through the eyes of a poet: classical poetry recommendation with visual input on social media. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 333–340
Zhang Q, Wang J, Huang H, Huang X, Gong Y (2017) Hashtag recommendation for multimodal microblog using co-attention network. In: International joint conferences on artificial intelligence (IJCAI), pp 3420–3426
Zhao Z-Q, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Acknowledgements
This research is supported in part by the National Science Foundation under Grant Nos. CHE-2105005, IIS-2008228, CNS-1845639, CNS-1831669, Army Research Office under Grant W911NF-17-1-0409. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shang, L., Zhang, D.(., Shen, J. et al. CCMR: A Classic-enriched Connotation-aware Music Retrieval System on Social Media with Visual Inputs. Soc. Netw. Anal. Min. 11, 119 (2021). https://doi.org/10.1007/s13278-021-00821-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-021-00821-4