Advertisement

Machine learning for music genre: multifaceted review and experimentation with audioset

  • Jaime RamírezEmail author
  • M. Julia Flores
Article

Abstract

Music genre classification is one of the sub-disciplines of music information retrieval (MIR) with growing popularity among researchers, mainly due to the already open challenges. Although research has been prolific in terms of number of published works, the topic still suffers from a problem in its foundations: there is no clear and formal definition of what genre is. Music categorizations are vague and unclear, suffering from human subjectivity and lack of agreement. In its first part, this paper offers a survey trying to cover the many different aspects of the matter. Its main goal is give the reader an overview of the history and the current state-of-the-art, exploring techniques and datasets used to the date, as well as identifying current challenges, such as this ambiguity of genre definitions or the introduction of human-centric approaches. The paper pays special attention to new trends in machine learning applied to the music annotation problem. Finally, we also include a music genre classification experiment that compares different machine learning models using Audioset.

Keywords

Machine learning Datasets Music information retrieval Classification algorithms Music Feed-forward neural networks 

Notes

Acknowledgements

This work has been partially funded by FEDER funds and the Spanish Government (MICINN) through projects SBPLY/17/180501/000493 and TIN2016-77902-C3-1-P.

References

  1. Abdallah, S.A. (2002). Towards music perception by redundancy reduction and unsupervised learning in probabilistic models. PhD thesis: Queen Mary University of London.Google Scholar
  2. Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., et al. (2016). Deep speech 2: End-to-end speech recognition in english and mandarin. In: International conference on machine learning, pp 173–182.Google Scholar
  3. An, Y., Sun, S., Wang, S. (2017). Naive Bayes classifiers for music emotion classification based on lyrics.Google Scholar
  4. Aucouturier, J.J., & Pachet, F. (2003). Representing musical genre: a state of the art. Journal of New Music Research, 32(1), 83–93.CrossRefGoogle Scholar
  5. Aucouturier, J.J., Pachet, F., Roy, P., Beurivé, A. (2007). Signal + context= better classification. In Proc. of the 8th ISMIR conference, Vienna, Austria, pp. 425–430.Google Scholar
  6. Basili, R., Serafini, A., Stellato, A. (2004). Classification of musical genre: a machine learning approach. In: Proc of the 5th ISMIR Conference, Barcelona, Spain.Google Scholar
  7. Bayle, Y., Maršík, L., Rusek, M., Robine, M., Hanna, P., Slaninová, K., Martinovic, J., Pokornỳ, J. (2017). Kara1k: A karaoke dataset for cover song identification and singing voice analysis. In 2017 IEEE International Symposium on Multimedia (ISM). IEEE, 177–184.Google Scholar
  8. Benetos, E., & Weyde, T. (2015). An efficient Temporally-Constrained probabilistic model for Multiple-Instrument music transcription. In Proc. of the 16th ISMIR conference, Malaga, Spain, pp. 701–707.Google Scholar
  9. Bertin-Mahieux, T., Ellis, D.P., Whitman, B., Lamere, P. (2011). The million song dataset. In: Proc. of the 12th ISMIR conference, Miami, USA, pp. 591–596.Google Scholar
  10. Böck, S, Krebs, F., Widmer, G. (2016). Joint beat and downbeat tracking with recurrent neural networks. In Proc. of the 17th ISMIR conference, New York City, USA, pp 255–261.Google Scholar
  11. Bogdanov, D., Wack, N., Gómez Gutiérrez, E., Gulati, S., Herrera Boyer, P., Mayor, O., Roma Trepat, G., Salamon, J., Zapata González, J.R., Serra, X. (2013). Essentia: An audio analysis library for music information retrieval. In Proc. of the 14th ISMIR conference, Curitiba, Brazil, pp. 493–498.Google Scholar
  12. Bogdanov, D., Porter, A., Urbano, J., Schreiber, H. (2017). The mediaeval 2017 acousticbrainz genre task: Content-based music genre recognition from multiple sources. In Proc. of the mediaeval 2016 Workshop. Dublin, Ireland.Google Scholar
  13. Bogdanov, D., Porter, A., Schreiber, H., Urbano, J., Oramas, S. (2019). The acousticBrainz genre dataset: multi-Source, multi-Level, multi-Label, and large-Scale. In: Proc of the 20th ISMIR Conference, Delft, The Netherlands.Google Scholar
  14. Bonnin, G., & Jannach, D. (2015). Automated generation of music playlists: Survey and experiments. ACM Computing Surveys (CSUR), 47(2), 26.Google Scholar
  15. Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.F. (2013). Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proc. of the 21st ACM international conference on multimedia. Barcelona, Spain, pp. 223–232.Google Scholar
  16. Burges, C.J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121–167.CrossRefGoogle Scholar
  17. Burred, J.J., & Lerch, A. (2003). A hierarchical approach to automatic musical genre classification. In: Proc. of the 6th international conference on digital audio effects, pp. 8–11.Google Scholar
  18. Cano, P., Gómez Gutiérrez, E., Gouyon, F., Herrera Boyer, P., Koppenberger, M., Ong, B.S., Serra, X., Streich, S., Wack, N. (2006). ISMIR 2004 audio description contest. Tech rep., Universitat Pompeu Fabra, Music technology Group.Google Scholar
  19. Celma, O. (2010). Music recommendation. In: Music recommendation and discovery, Springer, pp. 43–85.Google Scholar
  20. Chang, K.K., Jang, J.S.R., Iliopoulos, C.S. (2010). Music genre classification via compressive sampling. In Proc. of the 11th ISMIR conference, Utrecht, Netherlands (pp. 387–392).Google Scholar
  21. Choi, K., Fazekas, G., Sandler, M. (2016). Automatic tagging using deep convolutional neural networks. In Proc. of the 17th ISMIR conference, New York City, USA, pp. 805–811.Google Scholar
  22. Choi, K., Fazekas, G., Sandler, M.B., Cho, K. (2017). Transfer learning for music classification and regression tasks. In Proc. of the 18th ISMIR conference, Suzhou, China, pp. 141–149.Google Scholar
  23. Chollet, F., & et al. (2015). Keras. https://keras.io.
  24. Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 deep learning and representation learning workshop.Google Scholar
  25. Conneau, A., Schwenk, H., Barrault, L., LeCun, Y. (2016). Very deep convolutional networks for natural language processing. arXiv:1606.01781.
  26. Corrêa, D C, & Rodrigues, F.A. (2016). A survey on symbolic data-based music genre classification. Expert Systems with Applications, 60, 190–210.CrossRefGoogle Scholar
  27. Costa, Y.M., Oliveira, L.S., Silla, Jr C.N. (2017). An evaluation of convolutional neural networks for music classification using spectrograms. Applied soft computing, 52, 28–38.CrossRefGoogle Scholar
  28. De Clercq, T., & Temperley, D. (2011). A corpus analysis of rock harmony. Popular Music, 30(1), 47–70.CrossRefGoogle Scholar
  29. Dechter, R. (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems.Google Scholar
  30. Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X. (2017). FMA: A dataset for music analysis. In Proc. of the 18th ISMIR conference, Suzhou, China, pp 316–323.Google Scholar
  31. Delbouys, R., Hennequin, R., Piccoli, F., Royo-letelier, J., Moussallam, M. (2018). Music mood detection based on audio and lyrics with deep neural net. In Proc. of the 19th ISMIR conference, Paris, France, pp. 370–375.Google Scholar
  32. Deng, L., Yu, D., et al. (2014). Deep learning: methods and applications. Foundations and Trends in Signal Processing, 7(3–4), 197–387.MathSciNetzbMATHCrossRefGoogle Scholar
  33. Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. In: 2014 IEEE ICASSP, pp 6964–6968.Google Scholar
  34. Downie, J.S. (2003). Music information retrieval. Annual review of information science and technology, 37(1), 295–340.CrossRefGoogle Scholar
  35. Egermann, H., Pearce, M.T., Wiggins, G.A., McAdams, S. (2013). Probabilistic models of expectation violation predict psychophysiological emotional responses to live concert music. Cognitive, Affective, & Behavioral Neuroscience, 13(3), 533–553.CrossRefGoogle Scholar
  36. Fabbri, F. (1999). Browsing music spaces: Categories and the musical mind. In: Proc. of int. association for the study of popular music.Google Scholar
  37. Fan, J., Tatar, K., Thorogood, M., Pasquier, P. (2017). Ranking-based emotion recognition for experimental music. In: Proc. of the 18th ISMIR conference, Suzhou, China, pp. 368–375.Google Scholar
  38. Fayek, H.M., Lech, M., Cavedon, L. (2017). Evaluating deep learning architectures for speech emotion recognition. Neural Networks, 92, 60–68.CrossRefGoogle Scholar
  39. Flores, M.J., Gámez, J A, Martínez, A.M. (2012). Supervised classification with bayesian networks: A review on models and applications. Intelligent data analysis for real-life applications: Theory and practice, pp. 72–102.CrossRefGoogle Scholar
  40. Fonseca, E., Pons Puig, J., Favory, X., Font Corbera, F., Bogdanov, D., Ferraro, A., Oramas, S., Porter, A., Serra, X. (2017). Freesound datasets: a platform for the creation of open audio datasets. In Proc. of the 18th ISMIR conference, Suzhou, China, pp. 486–493.Google Scholar
  41. Font, F., Roma, G., Serra, X. (2013). Freesound technical demo. In: Proc. of the 21st ACM international conference on Multimedia, ACM, pp. 411–412.Google Scholar
  42. Fu, Z., Lu, G., Ting, K.M., Zhang, D. (2011). A survey of audio-based music classification and annotation. IEEE Trans on multimedia, 13(2), 303–319.CrossRefGoogle Scholar
  43. Gao, R., Feris, R., Grauman, K. (2018). Learning to separate object sounds by watching unlabeled video. In: Proc. of the European conference on computer vision (ECCV), pp 35–53.CrossRefGoogle Scholar
  44. Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., Ritter, M. (2017). Audio set: An ontology and human-labeled dataset for audio events. In: Proc of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 776–780.Google Scholar
  45. Genussov, M., & Cohen, I. (2010). Musical genre classification of audio signals using geometric methods. In: Signal processing conference, 2010 18th European, IEEE, pp. 497–501.Google Scholar
  46. Gibaja, E., & Ventura, S. (2015). A tutorial on multilabel learning. ACM Computing Surveys (CSUR), 47(3), 52.CrossRefGoogle Scholar
  47. Gȯmez, J.S., Abeßer, J., Cano, E. (2018). Jazz solo instrument classification with convolutional neural networks, source separation, and transfer learning. In Proc. of the 19th ISMIR conference, Paris, France, pp. 577–584.Google Scholar
  48. Gordon, A., Eban, E., Nachum, O., Chen, B., Wu, H., Yang, T.J., Choi, E. (2018). Morphnet: Fast & simple resource-constrained structure learning of deep networks. In: IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  49. Gouvert, O., Oberlin, T., Fėvotte, C. (2018). Matrix co-factorization for cold-start recommendation. In Proc. of the 19th ISMIR conference, Paris, France, pp. 792–798.Google Scholar
  50. Gouyon, F., Dixon, S., Pampalk, E., Widmer, G. (2004). Evaluating rhythmic descriptors for musical genre classification. In: Proc. of the AES 25th international conference, pp. 196–204.Google Scholar
  51. Graves, A. (2012). Supervised sequence labelling. In: Supervised sequence labelling with recurrent neural networks, Springer, pp. 5–13.Google Scholar
  52. Graves, A., Mohamed, Ar, Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In: IEEE ICASSP, pp. 6645–6649.Google Scholar
  53. Guaus, E. (2009). Audio content processing for automatic music genre classification: descriptors, databases, and classifiers. PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.Google Scholar
  54. Gururani, S., Summers, C., Lerch, A. (2018). Instrument activity detection in polyphonic music using deep neural networks. In Proc. of the 19th ISMIR conference, Paris, France, pp. 569–576.Google Scholar
  55. Hamel, P., & Eck, D. (2010). Learning features from music audio with deep belief networks. In Proc. of the 11th ISMIR conference, Utrecht, The Netherlands, pp. 339–344.Google Scholar
  56. Han, B.J., Rho, S., Jun, S., Hwang, E. (2010). Music emotion classification and context-based music recommendation. Multimedia Tools and Applications, 47(3), 433–460.CrossRefGoogle Scholar
  57. He, K., Zhang, X., Ren, S., Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proc. of the IEEE international conference on computer vision, pp. 1026–1034.Google Scholar
  58. He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. In: Proc. of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778.Google Scholar
  59. Henaff, M., Jarrett, K., Kavukcuoglu, K., Lecun, Y. (2011). Unsupervised learning of sparse features for scalable audio classification. In Proc. of the 12th ISMIR conference, Miami, USA, pp. 681–686.Google Scholar
  60. Hennequin, R., Royo-letelier, J., Moussallam, M. (2018). Audio based disambiguation of music genre tags. In Proc. of the 19th ISMIR conference, Paris, France, pp. 645–652.Google Scholar
  61. Herrera-Boyer, P., Peeters, G., Dubnov, S. (2003). Automatic classification of musical instrument sounds. J of New Music Research, 32(1), 3–21.CrossRefGoogle Scholar
  62. Hershey, S., Chaudhuri, S., Ellis, D.P., Gemmeke, J.F., Jansen, A., Moore, R.C., Plakal, M., Platt, D., Saurous, R.A., Seybold, B., et al. (2017). Cnn architectures for large-scale audio classification. In: Proc. of the IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 131–135.Google Scholar
  63. Hinton, G.E., Osindero, S., Teh, Y.W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527–1554.MathSciNetzbMATHCrossRefGoogle Scholar
  64. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Ar, Mohamed, Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRefGoogle Scholar
  65. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  66. Hockman, J., Davies, M.E., Fujinaga, I. (2012). One In the jungle: Downbeat detection in hardcore, jungle, and drum and bass. In Proc. of the 13th ISMIR conference, Porto, Portugal, pp. 169–174.Google Scholar
  67. Hoffman, M.D., Blei, D.M., Cook, P.R. (2009). Easy as CBA: A simple probabilistic model for tagging music. In Proc. of the 10th ISMIR conference, Kobe, Japan, pp. 369–374.Google Scholar
  68. Hoffmann, P., & Kostek, B. (2016). Bass enhancement settings in portable devices based on music genre recognition. Journal of the Audio Engineering Society, 63(12), 980–989.CrossRefGoogle Scholar
  69. Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M. (2014). A comparative study of decision tree id3 and c4.5. International Journal of Advanced Computer Science and applications(IJACSA). Special Issue on Advances in Vehicular Ad Hoc Networking and App.lications, 4(2), 2014.  https://doi.org/10.14569/SpecialIssue.2014.040203.Google Scholar
  70. Huang, Y.S., Chou, S.Y., Yang, Y.H. (2017). Music thumbnailing via neural attention modeling of music emotion. In: Proc. Asia pacific signal and information processing association annual summit and conference, pp. 347–350.Google Scholar
  71. Hubel, D.H., & Wiesel, T.N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of physiology, 160(1), 106–154.CrossRefGoogle Scholar
  72. Humphrey, E.J., Bello, J.P., LeCun, Y. (2013). Feature learning and deep architectures: New directions for music informatics. J of Intelligent Information Systems, 41(3), 461–481.CrossRefGoogle Scholar
  73. Iloga, S., Romain, O., Tchuenté, M. (2018). A sequential pattern mining app.roach to design taxonomies for hierarchical music genre recognition. Pattern Analysis and App.lications, 21(2), 363–380.CrossRefGoogle Scholar
  74. Jansen, A., Plakal, M., Pandya, R., Ellis, D., Hershey, S., Liu, J., Moore, C., Saurous, R.A. (2017). Towards learning semantic audio representations from unlabeled data. Signal, 2(3), 7–11.Google Scholar
  75. Kingma, D.P., & Ba, J. (2015). Adam: a method for stochastic optimization. In Proc. of the 3rd international conference on learning representations, ICLR 2015 San Diego, CA USA.Google Scholar
  76. Kitahara, T. (2017). Music generation using bayesian networks. In Altun, Y., Das, K., Mielikäinen, T., Malerba, D., Stefanowski, J., Read, J., žitnik, M., Ceci, M., Džeroski, S. (Eds.) Machine learning and knowledge discovery in databases (pp. 368–372). Cham: Springer International Publishing.Google Scholar
  77. Knees, P., & Schedl, M. (2013). A survey of music similarity and recommendation from music context data. ACM Trans on Multimedia Computing, Communications, and Applications (TOMM), 10(1), 1–21.CrossRefGoogle Scholar
  78. Koenigstein, N., Dror, G., Koren, Y. (2011). Yahoo! music recommendations: modeling music ratings with temporal dynamics and item taxonomy. In: Proc. of the 5th ACM conference on recommender systems, ACM, pp. 165–172.Google Scholar
  79. Kong, Q., Xu, Y., Wang, W., Plumbley, M.D. (2018). Audio set classification with attention model: A probabilistic perspective. In Proc. of the IEEE international conference on acoustics, speech and signal processing, ICASSP, IEEE, pp. 316–320.Google Scholar
  80. Korvel, G., Treigys, P., Tamulevicus, G., Bernataviciene, J., Kostek, B. (2018). Analysis of 2d feature spaces for deep learning-based speech recognition. Journal of the Audio Engineering Society, 66(12), 1072–1081.CrossRefGoogle Scholar
  81. Kostek, B., Kupryjanow, A., Zwan, P., Jiang, W., Ras, Z.W., Wojnarski, M., Swietlicka, J. (2011). Report Of the ISMIS 2011 contest: Music information retrieval. In Proc. of the 19th ISMIS conference, Warsaw, Poland (pp. 715–724).Google Scholar
  82. Kostek, B., Hoffmann, P., Kaczmarek, A., Spaleniak, P. (2014). Creating a reliable music discovery and recommendation system. In: Intelligent tools for building a scientific information platform, From Research to Implementation, Springer, pp. 107–130.Google Scholar
  83. Kotropoulos, C., Arce, G.R., Panagakis, Y. (2010). Ensemble discriminant sparse projections applied to music genre classification. In: International conference on pattern recognition, IEEE, pp. 822–825.Google Scholar
  84. Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105.Google Scholar
  85. Längkvist, M, Karlsson, L., Loutfi, A. (2014). A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters, 42, 11–24.CrossRefGoogle Scholar
  86. Larose, D.T., & Larose, C.D. (2014). Discovering knowledge in data: An introduction to data mining. New York: Wiley.zbMATHGoogle Scholar
  87. Laurier, C., Meyers, O., Serra, J., Blech, M., Herrera, P. (2009). Music mood annotator design and integration. In: 7th International Workshop on Content-Based Multimedia Indexing. CBMI’09., IEEE, pp. 156–161.Google Scholar
  88. Law, E., & Von Ahn, L. (2009). Input-agreement: a new mechanism for collecting data using human computation games. In: Proc of the SIGCHI conference on human factors in computing systems, ACM, pp. 1197–1206.Google Scholar
  89. Law, E., West, K., Mandel, M.I., Bay, M., Downie, J.S. (2009). Evaluation Of algorithms using games: The case of music tagging. In Proc. of the 10th ISMIR conference, Kobe, Japan, pp. 387–392.Google Scholar
  90. Lee, H., Pham, P., Largman, Y., Ng, A.Y. (2009). Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in neural information processing systems, pp. 1096–1104.Google Scholar
  91. Levy, M., & Sandler, M. (2007). A semantic space for music derived from social tags. Austrian Computer Society, 1, 12–17.Google Scholar
  92. Li, T., Ogihara, M., Li, Q. (2003). A comparative study on content-based music genre classification. In: Proc. of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, ACM, pp. 282–289.Google Scholar
  93. Libeks, J., & Turnbull, D. (2011). You can judge an artist by an album cover: Using images for music annotation. IEEE MultiMedia, 18(4), 30–37.CrossRefGoogle Scholar
  94. Liem, C.C.S., Orio, N., Peeters, G., Schedl, M. (2013). Musiclef 2013: Soundtrack Selection for commercials. In Proc. of the mediaeval 2013 multimedia benchmark workshop, Barcelona, Spain, October 18-19 2013.Google Scholar
  95. Logan, B., & et al. (2000). Mel frequency cepstral coefficients for music modeling. In: Proc of the 1st ISMIR conference, Plymouth, USA.Google Scholar
  96. Mandel, M.I., & Ellis, D. (2005). Song-level features and supp.ort vector machines for music classification. In Proc. of the 6th ISMIR conference, London, UK, pp. 594–599.Google Scholar
  97. Mandel, M.I., & Ellis, D.P. (2008). A web-based game for collecting music metadata. J of New Music Research, 37(2), 151–165.CrossRefGoogle Scholar
  98. Marchand, U., & Peeters, G. (2014). The modulation scale spectrum and its application to rhythm-content description. In: Proc. of the 17th international conference on digital audio effects, pp. 167–172.Google Scholar
  99. Mayer, R., Neumayer, R., Rauber, A. (2008). Rhyme and style features for musical genre classification by song lyrics. In Proc. of the 9th ISMIR conference, Philadelphia, USA (pp. 337–342).Google Scholar
  100. McFee, B., & Lanckriet, G.R. (2009). Heterogeneous embedding for subjective artist similarity. In Proc. of the 10th ISMIR conference, Kobe, Japan, pp. 513–518.Google Scholar
  101. McFee, B., & Lanckriet, G.R. (2011). The natural language of playlists. In Proc. of the 12th ISMIR conference, Miami, USA, pp. 537–542.Google Scholar
  102. McFee, B., Bertin-Mahieux, T., Ellis, D.P., Lanckriet, G.R. (2012). The million song dataset challenge. In: Proc. of the 21st international conference on world wide web, ACM, pp. 909–916.Google Scholar
  103. McKay, C., & Fujinaga, I. (2006). Musical genre classification: Is it worth pursuing and how can it be improved?. In Proc. of the 7th ISMIR conference, Victoria, Canada, pp. 101–106.Google Scholar
  104. Medhat, F., Chesmore, D., Robinson, J. (2017). Masked conditional neural networks for audio classification. In: International conference on artificial neural networks, Springer, pp. 349–358.Google Scholar
  105. Menendez, J.A. (2016). Towards a computational account of art cognition: unifying perception, visual art, and music through bayesian inference. Electronic Imaging, 2016 (16), 1–10.CrossRefGoogle Scholar
  106. Meyer, L.B. (1957). Meaning in music and information theory. The Journal of Aesthetics and Art Criticism, 15(4), 412–424.CrossRefGoogle Scholar
  107. Moore, A.F. (2001). Categorical conventions in music discourse: Style and genre. Music and Letters, 82(3), 432–442.CrossRefGoogle Scholar
  108. Müller, M. (2015). Fundamentals of music processing: Audio, analysis, algorithms, applications. Berlin: Springer.CrossRefGoogle Scholar
  109. Nair, V., & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines. In: Proc. of the 27th international conference on machine learning (ICML-10), pp. 807–814.Google Scholar
  110. Nanni, L., Costa, Y.M., Lumini, A., Kim, M.Y., Baek, S.R. (2016). Combining visual and acoustic features for music genre classification. Expert Systems with App.lications, 45, 108–117.CrossRefGoogle Scholar
  111. Nanni, L., Costa, Y.M., Aguiar, R.L., Silla, Jr C.N., Brahnam, S. (2018). Ensemble of deep learning, visual and acoustic features for music genre classification. J of New Music Research, pp. 1–15.Google Scholar
  112. Ness, S.R., Theocharis, A., Tzanetakis, G., Martins, L.G. (2009). Improving automatic music tag annotation using stacked generalization of probabilistic svm outputs. In: Proc. of the 17th ACM international conference on Multimedia, pp. 705–708.Google Scholar
  113. Oliphant, T.E. (2006). A guide to NumPy, vol 1. Trelgol Publishing USA.Google Scholar
  114. Olshausen, B.A., & Field, D.J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.CrossRefGoogle Scholar
  115. Oramas, S., Nieto, O., Barbieri, F., Serra, X. (2017). Multi-label music genre classification from audio, text and images using deep features. In Proc. of the 18th ISMIR conference, Suzhou, China, pp. 23–30.Google Scholar
  116. Pachet, F., & Cazaly, D. (2000). A taxonomy of musical genres. In Content-based multimedia information access-volume 2, pp. 1238–1245.Google Scholar
  117. Pálmason, H, Jónsson, B.Þ., Amsaleg, L., Schedl, M., Knees, P. (2017a). On competitiveness of nearest-neighbor-based music classification: A methodological critique. In: International conference on similarity search and applications, Springer, pp. 275–283.Google Scholar
  118. Pálmason, H., Jónsson, B.Þ., Schedl, M., Knees, P. (2017b). Music genre classification revisited: An in-depth examination guided by music experts. In: International Symposium on Computer Music Multidisciplinary Research, pp. 49–62.Google Scholar
  119. Panagakis, Y., & Kotropoulos, C. (2010). Music genre classification via topology preserving non-negative tensor factorization and sparse representations. In: Proc of ICASSP, IEEE, pp. 249–252.Google Scholar
  120. Panagakis, Y., Kotropoulos, C., Arce, G.R. (2009). Music genre classification using locality preserving non-negative tensor factorization and sparse representations. In: Proc. of the 10th ISMIR conference, Kobe, Japan, pp. 249–254.Google Scholar
  121. Park, H.S., Yoo, J.O., Cho, S.B. (2006). A context-aware music recommendation system using fuzzy bayesian networks with utility theory. In: International conference on Fuzzy systems and knowledge discovery, Springer, pp. 970–979.Google Scholar
  122. Paulus, J., & Klapuri, A. (2009). Music structure analysis using a probabilistic fitness measure and a greedy search algorithm. IEEE Trans. Audio Speech Language Process., 17(6), 1159–1170.CrossRefGoogle Scholar
  123. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. J. Mach. Learn. Res., 12(Oct), 2825–2830.MathSciNetzbMATHGoogle Scholar
  124. Pickens, J. (2000). A comparison of language modeling and probabilistic text information retrieval app.roaches to monophonic music retrieval. In: Proc of the 1st ISMIR Conference, Plymouth, USA.Google Scholar
  125. Pons, J., Lidy, T., Serra, X. (2016). Experimenting with musically motivated convolutional neural networks. In: 2016 14th international workshop on content-based multimedia indexing (CBMI)., pp. 1–6.Google Scholar
  126. Pons, J., Nieto, O., Prockup, M., Schmidt, E.M., Ehmann, A.F., Serra, X. (2018). End-to-end learning for music audio tagging at scale. In: Proc. of the 19th ISMIR conference, Paris, France, pp. 637–644.Google Scholar
  127. Porter, A., Bogdanov, D., Kaye, R., Tsukanov, R., Serra, X. (2015). Acousticbrainz: A community platform for gathering music information obtained from audio. In: Proc. of the 16th ISMIR conference, Malaga, Spain, pp. 786–792.Google Scholar
  128. Prockup, M., Ehmann, A.F., Gouyon, F., Schmidt, E.M., Celma, O., Kim, Y.E. (2015). Modeling genre with the music genome project: Comparing human-labeled attributes and audio features. In: Proc. of the 16th ISMIR conference, Malaga, Spain, pp 31–37.Google Scholar
  129. Rabiner, L.R., & Juang, B.H. (1993). Fundamentals of speech recognition, vol 14. PTR Prentice Hall Englewood Cliffs.Google Scholar
  130. Rodríguez-Algarra, F., Sturm, B.L., Maruri-Aguilar, H. (2016). Analysing scattering-based music content analysis systems: Where’s the music?. In: Proc. of the 17th ISMIR conference, New York City, USA, pp. 344–350.Google Scholar
  131. Rosner, A., & Kostek, B. (2018). Automatic music genre classification based on musical instrument track separation. Journal of Intelligent Information Systems, 50(2), 363–384.CrossRefGoogle Scholar
  132. Schedl, M., Flexer, A., Urbano, J. (2013). The neglected user in music information retrieval research. Journal of Intelligent Information Systems, 41(3), 523–539.CrossRefGoogle Scholar
  133. Schmidt, E.M., & Kim, Y.E. (2011a). Learning emotion-based acoustic features with deep belief networks. In: 2011 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), IEEE, pp. 65–68.Google Scholar
  134. Schmidt, E.M., & Kim, Y.E. (2011b). Modeling musical emotion dynamics with conditional random fields. In: Proc. of the 12th ISMIR conference, Miami, USA, pp 777–782.Google Scholar
  135. Schmidt, E.M., & Kim, Y. (2013). Learning rhythm and melody features with deep belief networks. In: Proc. of the 14th ISMIR conference, Curitiba, Brazil, pp. 21–26.Google Scholar
  136. Schuller, B., Hage, C., Schuller, D., Rigoll, G. (2010). ‘mister dj, cheer me up!’: Musical and textual features for automatic mood classification. J. New Music Res., 39(1), 13–34.CrossRefGoogle Scholar
  137. Senac, C., Pellegrini, T., Mouret, F., Pinquier, J. (2017). Music feature maps with convolutional neural networks for music genre classification. In: Proc. of the 15th international workshop on content-based multimedia indexing, ACM, pp. 19–23.Google Scholar
  138. Sigtia, S., & Dixon, S. (2014). Improved music feature learning with deep neural networks. In: IEEE ICASSP, pp. 6959–6963.Google Scholar
  139. Silla, Jr, C.N., Koerich, A.L., Kaestner, C.A. (2008). The latin music database. In: Proc. of the 9th ISMIR conference, Philadelphia, USA, pp. 451–456.Google Scholar
  140. Silla, C.N., Koerich, A.L., Kaestner, C.A.A. (2010). Improving automatic music genre classification with hybrid content-based feature vectors. In: Proc. of the 2010 ACM symposium on applied computing. ACM, pp. 1702–1707.Google Scholar
  141. Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In: Proc. of 3rd international conference on learning representations, ICLR, 2015, San Diego, CA, USA.Google Scholar
  142. Smith, E.C., & Lewicki, M.S. (2006). Efficient auditory coding. Nature, 439 (7079), 978–982.CrossRefGoogle Scholar
  143. Sturm, B.L. (2012a). An analysis of the gtzan music genre dataset. In: Proc. of the 2nd international ACM workshop on Music information retrieval with user-centered and multimodal strategies, ACM, pp. 7–12.Google Scholar
  144. Sturm, B.L. (2012b). A survey of evaluation in music genre recognition. In: International workshop on adaptive multimedia retrieval, Springer, pp. 29–66.Google Scholar
  145. Sturm, B.L. (2014). The state of the art ten years after a state of the art: Future research in music information retrieval. J. New Music Res., 43(2), 147–172.CrossRefGoogle Scholar
  146. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In: Proc. of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826.Google Scholar
  147. Tang, C.P., Chui, K.L., Yu, Y.K., Zeng, Z., Wong, K.H., et al. (2018). Music genre classification using a hierarchical Long Short Term Memory (LSTM) model. In: Proc. of the 3rd international workshop on pattern recognition.Google Scholar
  148. Temperley, D. (2009). A unified probabilistic model for polyphonic music analysis. J. New Music Res., 38(1), 3–18.CrossRefGoogle Scholar
  149. Turnbull, D.R., Barrington, L., Lanckriet, G., Yazdani, M. (2009). Combining audio content and social context for semantic music discovery. In: Proc. of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM, pp. 387–394.Google Scholar
  150. Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Trans. Speech and Audio Process, 10(5), 293–302.CrossRefGoogle Scholar
  151. Ulaganathan, A.S., & Ramanna, S. (2018). Granular methods in automatic music genre classification: a case study. J of Intelligent Information Systems pp. 1–21.Google Scholar
  152. Van den Oord, A., Dieleman, S., Schrauwen, B. (2013). Deep content-based music recommendation. In: Advances in neural information processing systems, pp. 2643–2651.Google Scholar
  153. Vigliensoni, G., & Fujinaga, I. (2017). The music listening histories dataset. In: Proc. of the 18th ISMIR conference, Suzhou, China, pp. 96–102.Google Scholar
  154. Vryzas, N., Kotsakis, R., Liatsou, A., Dimoulas, C.A., Kalliris, G. (2018). Speech emotion recognition for performance interaction. J. Audio Eng. Soc., 66(6), 457–467.CrossRefGoogle Scholar
  155. Wang, H., & Yeung, D.Y. (2016). Towards bayesian deep learning: a framework and some existing methods. IEEE Trans. Knowledge Data Eng., 28(12), 3395–3408.CrossRefGoogle Scholar
  156. Wang, K., An, N., Li, B.N., Zhang, Y., Li, L. (2015). Speech emotion recognition using fourier parameters. IEEE Trans. Affective Comput., 6(1), 69–75.CrossRefGoogle Scholar
  157. Wu, T.L., & Jeng, S.K. (2008). Probabilistic estimation of a novel music emotion model. In: International conference on multimedia modeling, Springer, pp. 487–497.Google Scholar
  158. Wu, Y., & Lee, T. (2018).Google Scholar
  159. Xiong, W., Wu, L., Alleva, F., Dropp.o, J., Huang, X., Stolcke, A. (2018). The microsoft 2017 conversational speech recognition system. In: Proc. of the IEEE international conference on acoustics, speech and signal processing, ICASSP, IEEE, pp. 5934–5938.Google Scholar
  160. Xu, Y., Kong, Q., Wang, W., Plumbley, M.D. (2017). Surrey-cvssp system for DCASE2017 challenge task4. arXiv:170900551.
  161. Yang, Y.H., & Chen, H.H. (2012). Machine recognition of music emotion: a review. ACM Trans. Intell. Syst. Technol. (TIST), 3(3), 40.Google Scholar
  162. Yang, Y.H., & Liu, J.Y. (2013). Quantitative study of music listening behavior in a social and affective context. IEEE Trans. Multimed., 15(6), 1304–1315.CrossRefGoogle Scholar
  163. Yang, D., Chen, T., Zhang, W., Lu, Q., Yu, Y. (2012). Local implicit feedback mining for music recommendation. In: Proc. of the 6th ACM conference on Recommender systems, ACM, pp. 91–98.Google Scholar
  164. Yoshii, K., Goto, M., Komatani, K., Ogata, T., Okuno, H.G. (2008). An efficient hybrid music recommender system using an incrementally trainable probabilistic generative model. IEEE Trans on Audio, Speech, and Language Processing, 16(2), 435–447.CrossRefGoogle Scholar
  165. Zangerle, E., Gassler, W., Specht, G. (2012). Exploiting twitter’s collective knowledge for music recommendations. In: Proc. of the WWW’12 workshop on ‘making sense of microposts’, lyon, france, April 16, 2012, pp. 14–17.Google Scholar
  166. Zeghidour, N., Usunier, N., Synnaeve, G., Collobert, R., Dupoux, E. (2018). End-to-end speech recognition from the raw waveform. In: Interspeech 2018, 19th annual conference of the international speech communication association, Hyderabad, India, 2018, pp. 781–785.Google Scholar
  167. Zhou, Y., Wang, Z., Fang, C., Bui, T., Berg, T.L. (2018). Visual to sound: Generating natural sound for videos in the wild. In: Proc. of the IEEE conference on computer vision and pattern recognition, pp. 3550–3558.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computing Systems DepartmentUCLMAlbaceteSpain

Personalised recommendations