Skip to main content
Log in

CCMR: A Classic-enriched Connotation-aware Music Retrieval System on Social Media with Visual Inputs

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The increasing popularity of digital music and the growing ubiquity of network connection have promoted the expansion of online music sharing platforms (e.g., YouTube, Spotify). In this paper, we focus on a challenging problem of connotation-aware music retrieval with visual inputs. The goal of the problem is to explore the implicit feeling or emotion expressed beyond the explicit contents in music and image and retrieves music pieces relevant to the connotation implicitly conveyed in the visual inputs. Two critical challenges exist in solving the connotation-aware music retrieval problem: (1) it is challenging to accurately identify the implicit connotation from both images and music pieces; (2) it is non-trivial to establish the correct connotative association across different data modalities. To address the above challenges, we present a novel classic-enriched connotation-aware music retrieval (CCMR) system to effectively identify connotation-aware music for visual inputs. We evaluate the proposed CCMR system on a real-world dataset. Results show that CCMR outperforms state-of-the-art baselines in retrieving music pieces that are highly relevant to the connotation of the visual inputs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.ibm.com/watson/services/tone-analyzer/.

  2. https://developer.spotify.com.

  3. The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

  4. https://unsplash.com/developers.

  5. https://www.youtube.com.

  6. https://www.poetryfoundation.org.

  7. https://www.mturk.com.

References

  • Adhikari K, Panda RK (2018) Users’ information privacy concerns and privacy protection behaviors in social networks. J Glob Mark 31(2):96–110

  • Aguiar L (2017) Let the music play? Free streaming and its effects on digital music consumption. Inf Econ Policy 41:1–14

    Article  Google Scholar 

  • Ahn HJ (2006) Utilizing popularity characteristics for product recommendation. Int J Electron Commer 11(2):59–80

    Article  Google Scholar 

  • Andjelkovic I, Parra D, O’Donovan J (2016) Moodplay: interactive mood-based music discovery and recommendation. In: Proceedings of the 2016 conference on user modeling adaptation and personalization, pp 275–279

  • Cao Y, Wang X, He X, Hu Z, Chua T-S (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In: The world wide web conference, pp 151–161

  • Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229

  • Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Trans Inf Syst (TOIS) 34(2):1–32

    Article  Google Scholar 

  • Choi J-H, Lee J-S (2014) Evotunes: crowdsourcing-based music recommendation. In: International conference on multimedia modeling, pp 331–338

  • Forceville C, Urios-Aparisi E (2009) Multimodal metaphor. Walter de Gruyter 11

  • Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864

  • Hamilton WL, Ying R, Leskovec J (2017) Inductive representation learning on large graphs. arXiv preprint arXiv:1706.02216

  • Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: econvolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861,

  • Hsia C-C, Lai K-H, Chen Y, Wang C-J, Tsai M-F (2018) Representation learning for image-based music recommendation. arXiv preprint arXiv:1808.09198

  • Huang L, Wang W, Xia Y, Chen J (2019) Adaptively aligned image captioning via adaptive attention time. In: Advances in neural information processing systems, pp 8940–8949

  • Hui H, Zhou C, Xu S, Lin F (2020) A novel secure data transmission scheme in industrial internet of things. China Commun 17(1):73–88

    Article  Google Scholar 

  • Hukkelås H, Mester R, Lindseth F (2019) Deepprivacy: a generative adversarial network for face anonymization. In: International symposium on visual computing, pp 565–578

  • Khan SK, Zhang DY, Kou Z, Zhang Y, Wang D (2021) Photostylist: altering the style of photos based on the connotations of texts. In: Pacific-Asia conference on knowledge discovery and data mining, pp 642–654

  • Krause J, Johnson J, Krishna R, Fei-Fei L (2017) A hierarchical approach for generating descriptive image paragraphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–325

  • Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li L-J, Shamma DA et al (2017) Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis 123(1):32–73

    Article  MathSciNet  Google Scholar 

  • Liang F, Lin C, Guo R, Sun M, Wu W, Yan J, Ouyang W (2019) Computation reallocation for object detection. arXiv preprint arXiv:1912.11234,

  • Liem CC, Larson M, Hanjalic A (2013) When music makes a scene. Int J Multimed Inf Retr 2(1):15–30

    Article  Google Scholar 

  • Marshall J, Wang D (2016) Mood-sensitive truth discovery for reliable recommendation systems in social sensing. In: Proceedings of the 10th ACM conference on recommender systems, pp 167–174

  • Mihalcea R, Strapparava C (2012) Lyrics, music, and emotions. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 590–599

  • Müller-Zettelmann E, Rubik M (2005) Theory into poetry: new approaches to the lyric. Rodopi 89

  • Niu W, Caverlee J, Lu H (2018) Neural personalized ranking for image recommendation. In: Proceedings of the 11th ACM international conference on web search and data mining, pp 423–431

  • Oramas S, Ostuni VC, Noia TD, Serra X, Sciascio ED (2016) Sound and music recommendation with knowledge graphs. ACM Trans Intell Syst Technol (TIST) 8(2)

  • Palumbo E, Rizzo G, Troncy R, Baralis E, Osella M, Ferro E (2018) Knowledge graph embeddings with node2vec for item recommendation. In: European semantic web conference, pp 117–120

  • Pannese A, Rappaz M-A, Grandjean D (2016) Metaphor and music emotion: ancient views and future directions. Conscious Cogn 44:61–71

    Article  Google Scholar 

  • Patra BG, Das D, Bandyopadhyay S (2017) Retrieving similar lyrics for music recommendation system. In: Proceedings of the 14th international conference on natural language processing, pp 290–297

  • Qiu Y, Kataoka H (2018) Image generation associated with music data. In: CVPR workshops, pp 2510–2513

  • Rashid MT, Wang D (2020) Covidsens: a vision on reliable social sensing for covid-19. Artif Intell Rev 1–25

  • Roche C (2018) Copyright-free image sites: a comparison. Sch Libr 66(3):150–150

    Google Scholar 

  • Russell JA (2003) Core affect and the psychological construction of emotion. Psychol Rev 110(1):145

    Article  Google Scholar 

  • Sánchez-Moreno D, González ABG, Vicente MDM, Batista VFL, García MNM (2016) A collaborative filtering method for music recommendation using playing coefficients for artists and users. Expert Syst Appl 66:234–244

    Article  Google Scholar 

  • Sarker M, Kamal M, Rashwan HA, Abdel-Nasser M, Singh VK, Banu SF, Akram F, Chowdhury FU, Choudhury KA, Chambon S et al (2019) Mobilegan: skin lesion segmentation using a lightweight generative adversarial network. arXiv preprint arXiv:1907.00856

  • Sasaki S, Hirai T, Ohya H, Morishima S (2013) Affective music recommendation system reflecting the mood of input image. In: 2013 International conference on culture and computing, pp 153–154

  • Seyler D, Chandar P, Davis M (2018) An information retrieval framework for contextual suggestion based on heterogeneous information network embeddings. In: The 41st international ACM SIGIR conference on research & development in information retrieval, pp 953–956

  • Shang L, Zhang Y, Zhang D, Wang D (2020) Fauxward: a graph neural network approach to fauxtography detection using social media comments. Soc Netw Anal Min 10(1):1–16

    Article  Google Scholar 

  • Shang L, Yue ZD, Karim KS, Shen J, Wang D (2020) Camr: towards connotation-aware music retrieval on social media with visual inputs. In: 2020 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 425–429

  • Shang L, Zhang Y, Zha Y, Chen Y, Youn C, Wang D (2021) Aomd: an analogy-aware approach to offensive meme detection on social media. arXiv preprint arXiv:2106.11229

  • Shutova E, Kiela D, Maillard J (2016) Black holes and white rabbits: metaphor identification with visual features. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 160–170

  • Speck JA, Schmidt EM, Morton BG, Kim YE (2011) A comparative study of collaborative versus traditional musical mood annotation. In: 12th international society for music information retrieval conference (ISMIR), vol 104, pp 549–554

  • Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. In: Advances in neural information processing systems, pp 2643–2651

  • Volokhin S, Agichtein E (2018) Understanding music listening intents during daily activities with implications for contextual music recommendation. In: Proceedings of the 2018 conference on human information interaction & retrieval, pp 313–316

  • Wang D, Kaplan L, Abdelzaher T, Aggarwal CC (2013) On credibility estimation tradeoffs in assured social sensing. IEEE J Sel Areas Commun 31(6):1026–1037

    Article  Google Scholar 

  • Wang D, Szymanski BK, Abdelzaher T, Ji H, Kaplan L (2019) The age of social sensing. Computer 52(1):36–45

    Article  Google Scholar 

  • Wang X, Rosenblum D, Wang Y (2012) Context-aware mobile music recommendation for daily activities. In: Proceedings of the 20th ACM international conference on multimedia, pp 99–108

  • Wang D, Xu G, Deng S (2017) Music recommendation via heterogeneous information graph embedding. In: 2017 International joint conference on neural networks (IJCNN), pp 596–603

  • Wang H, Zhang F, Xie X, Guo M (2018) Dkn: deep knowledge-aware network for news recommendation. In: Proceedings of the 2018 world wide web conference, pp 1835–1844

  • Yang L, Tang K, Yang J, Li L-J (2017) Dense captioning with joint inference and visual context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2193–2202

  • Yu J, Lu Y, Qin Z, Zhang W, Liu Y, Tan J, Guo L (2018) Modeling text with graph convolutional network for cross-modal information retrieval. In: Pacific rim conference on multimedia, pp 223–234

  • Zhang DY, Shang L, Geng B, Lai S, Li K, Zhu H, Amin MT, Wang D (2018) Fauxbuster: a content-free fauxtography detector using social media comments. In: 2018 IEEE international conference on big data (big data), pp 891–900

  • Zhang Y, Dong X, Shang L, Zhang D, Wang D (2020) A multi-modal graph neural network approach to traffic risk forecasting in smart urban sensing. In: 2020 17th Annual IEEE international conference on sensing, communication, and networking (SECON), pp 1–9

  • Zhang D, Ni B, Zhi Q, Plummer T, Li Q, Zheng H, Zeng Q, Zhang Y, Wang D (2019) Through the eyes of a poet: classical poetry recommendation with visual input on social media. In: 2019 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), pp 333–340

  • Zhang Q, Wang J, Huang H, Huang X, Gong Y (2017) Hashtag recommendation for multimodal microblog using co-attention network. In: International joint conferences on artificial intelligence (IJCAI), pp 3420–3426

  • Zhao Z-Q, Zheng P, Xu S, Wu X (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported in part by the National Science Foundation under Grant Nos. CHE-2105005, IIS-2008228, CNS-1845639, CNS-1831669, Army Research Office under Grant W911NF-17-1-0409. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Office or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shang, L., Zhang, D.(., Shen, J. et al. CCMR: A Classic-enriched Connotation-aware Music Retrieval System on Social Media with Visual Inputs. Soc. Netw. Anal. Min. 11, 119 (2021). https://doi.org/10.1007/s13278-021-00821-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-021-00821-4

Navigation