Multimedia Tools and Applications

, Volume 78, Issue 1, pp 161–176 | Cite as

Music auto-tagging based on the unified latent semantic modeling

  • Xi ShaoEmail author
  • Zhiyong Cheng
  • Mohan S. Kankanhalli


We proposed a music auto-tagging approach based on the latent space modeling both for music context and content. First, we introduce the latent semantic analysis for music tags with Sparse Nonnegative Matrix Factorization. Then the music contents semantics will be learnt by decomposing the music content into a pre-trained dictionary and an adaptive dictionary learning algorithm is proposed. Finally, the two latent spaces will be associated with a certain subspace mapping algorithm. The experimental results show that our proposed approach outperforms the state-of-the-art auto-tagging systems when applied to the CAL500 dataset in the 5-fold cross-validation experiments.


Music tag Latent semantic analysis Music recommendation 



This work is supported by the National Nature Science Foundation of China under Grant No. 60902065, No. 61401227, and by Beijing Natural Science Foundation (No.4152053).


  1. 1.
    Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322CrossRefGoogle Scholar
  2. 2.
    Bertin-Mahieux T, Eck D, Maillet F et al (2008) Autotagger: a model for predicting social tags from acoustic features on large music databases. J New Music Res 37(2):115–135CrossRefGoogle Scholar
  3. 3.
    Coviello E, Chan AB, Lanckriet G (2011) Time series models for semantic music annotation. IEEE Trans Audio Speech Lang Process 19(5):1343–1359CrossRefGoogle Scholar
  4. 4.
    Coviello E, Lanckriet GR, Chan AB (2012) The variational hierarchical EM algorithm for clustering hidden markov models. In: Advances in Neural Information Processing Systems, pp 404–412Google Scholar
  5. 5.
    David MB, Andrew YN, Michael IJ (2003) Latent Dirichlet allocation. J Mach Learn Res 2003(3):993–1022zbMATHGoogle Scholar
  6. 6.
    Domingues MA, Gouyon F, Jorge AM et al (2013) Combining usage and content in an online recommendation system for music in the long tail. Int J Multimed Inf Retr 2(1):3–13CrossRefGoogle Scholar
  7. 7.
    Ellis K, Coviello E, Chan AB et al (2013) A bag of systems representation for music auto-tagging. IEEE Trans Audio Speech Lang Process 21(12):2554–2569CrossRefGoogle Scholar
  8. 8.
    Engan K, Aase SO, Husoy JH (1999) Method of optimal directions for frame design. In: Proceedings of 1999 I.E. international conference on acoustics, speech, and signal processing vol. 5, IEEE, pp 2443–2446Google Scholar
  9. 9.
    He X, Niyogi P (2004) Locality preserving projections. In: Advances in neural information processing systems, pp 153–160Google Scholar
  10. 10.
    Hoffman MD, Blei DM, Cook PR (2009) Easy as CBA: a simple probabilistic model for tagging music. In: Proceedings of international society for music information retrieval conference pp 369–374Google Scholar
  11. 11.
    Hoyer P (2004) O.: non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5(Nov):1457–1469MathSciNetzbMATHGoogle Scholar
  12. 12.
    Kim H, Park H (2007) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12):1495–1502CrossRefGoogle Scholar
  13. 13.
    Knees P, Schedl M (2013) A survey of music similarity and recommendation from music context data. ACM Trans Multimed Comput Commun Appl (TOMM) 10(1):2Google Scholar
  14. 14.
    Lamere P (2008) Social tagging and music information retrieval. J New Music Res 37(2):101–114CrossRefGoogle Scholar
  15. 15.
    Levy M, Schedl M (2008) Learning latent semantic models for music from social tags. J New Music Res 37(2):137–150CrossRefGoogle Scholar
  16. 16.
    Mairal J, Bach F, Ponce J, et al. (2009) Online dictionary learning for sparse coding. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 689–696Google Scholar
  17. 17.
    Mandel MI, Ellis DPW (2008) Multiple-instance learning for music information retrieval. In: Proceedings of international society for music information retrieval conference, pp 577–582Google Scholar
  18. 18.
    Miotto R, Lanckriet G (2012) A generative context model for semantic music annotation and retrieval. IEEE Trans Audio Speech Lang Process 20(4):1096–1108CrossRefGoogle Scholar
  19. 19.
    Nam J, Herrera J, Slaney M, Smith J (2012) Learning sparse feature representations for music annotation and retrieval. In: Proceedings of the international society for music information retrieval conference pp 565–570Google Scholar
  20. 20.
    Panagakis Y, Kotropoulos C (2012) Automatic music tagging by low rank representation. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 497–500Google Scholar
  21. 21.
    Schedl M, Schnitzer D (2013) Hybrid retrieval approaches to geospatial music recommendation. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 793–796Google Scholar
  22. 22.
    Schedl M, Gómez E, Goto M (2013) Multimedia information retrieval: music and audio. In: Proceedings of the 21st ACM international conference on multimedia, ACM, pp 1117–1118Google Scholar
  23. 23.
    Skretting K, Engan K (2010) Recursive least squares dictionary learning algorithm. IEEE Trans Signal Process 58(4):2121–2130MathSciNetCrossRefGoogle Scholar
  24. 24.
    Tao L, Tzanetakis G (2003) Factors in automatic musical genre classification of audio signals. In: 2003 I.E. workshop on applications of signal processing to audio and acoustics, IEEE, pp 143–146Google Scholar
  25. 25.
    Turnbull D, Barrington L, Torres D et al (2008) Semantic annotation and retrieval of music and sound effects. IEEE Trans Audio Speech Lang Process 16(2):467–476CrossRefGoogle Scholar
  26. 26.
    Weiqing M, Bingkun B, Changsheng X (2014) Multimodal spatio-temporal theme modeling for landmark analysis. IEEE Multimedia 21(3):20–29CrossRefGoogle Scholar
  27. 27.
    Weiqing M, Bing-Kun B, Shuhuan M, Yaohui Z, Yong R, Shuqiang J (2017) You are what you eat: exploring rich recipe information for cross-region food analysis. IEEE Transactions on Multimedia PP(99):1–15Google Scholar
  28. 28.
    Xie B, Bian W, Tao D, et al. (2011) Music tagging with regularized logistic regression. In: Proceedings of international society for music information retrieval conference, pp 711–716Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Xi Shao
    • 1
    • 2
    Email author
  • Zhiyong Cheng
    • 2
  • Mohan S. Kankanhalli
    • 2
  1. 1.College of Communication and Information EngineeringNanjing University of Posts and TelecommunicationsNanjingChina
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations