Skip to main content

Musical Genre Recognition Based on Deep Descriptors of Harmony, Instrumentation, and Segments

  • Conference paper
  • First Online:
Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART 2023)

Abstract

Deep learning has recently established itself as a cluster of methods of choice for almost all classification tasks in music information retrieval. However, despite very good classification performance, it sometimes brings disadvantages including long training times and higher energy costs, lower interpretability of classification models, or an increased risk of overfitting when applied to small training sets due to a very large number of trainable parameters. In this paper, we investigate the combination of both deep and shallow algorithms for recognition of musical genres using a transfer learning approach. We train deep classification models once to predict harmonic, instrumental, and segment properties from datasets with respective annotations. Their predictions for another dataset with annotated genres are used as features for shallow classification methods. They can be trained over and again for different categories, and are particularly useful when the training sets are small, in a real world scenario when listeners define various musical categories selecting only a few prototype tracks. The experiments show the potential of the proposed approach for genre recognition. In particular, when combined with evolutionary feature selection which identifies the most relevant deep feature dimensions, the classification errors became significantly lower in almost all cases, compared to a baseline based on MFCCs or results reported in the previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/napulen/AugmentedNet, accessed on 31.01.2023.

  2. 2.

    http://www.seyerlehner.info/joomla/index.php/datasets, accessed on 31.01.2023.

References

  1. Berenzweig, A., Ellis, D.P.W., Lawrence, S.: Anchor space for classification and similarity measurement of music. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME, pp. 29–32. IEEE Computer Society (2003)

    Google Scholar 

  2. Beume, N., Naujoks, B., Emmerich, M.T.M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)

    Article  MATH  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  4. Choi, K., Fazekas, G., Sandler, M., Cho, K.: Transfer learning for music classification and regression tasks. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR, pp. 141–149. International Society for Music Information Retrieval (2017)

    Google Scholar 

  5. Costa, Y.M., Oliveira, L.S., Silla, C.N.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)

    Article  Google Scholar 

  6. Gotham, M., Kleinertz, R., Weiss, C., Müller, M., Klauk, S.: What if the ‘when’ implies the ‘what’?: Human harmonic analysis datasets clarify the relative role of the separate steps in automatic tonal analysis. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 229–236 (2021)

    Google Scholar 

  7. Grill, T., Schlüter, J.: Music boundary detection using neural networks on combined features and two-level annotations. In: Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pp. 531–537 (2015)

    Google Scholar 

  8. Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)

    Article  Google Scholar 

  9. Hofmann, M., Klinkenberg, R.: RapidMiner: data mining use cases and business analytics applications. Chapman & Hall/CRC (2013)

    Google Scholar 

  10. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems, NIPS, pp. 1106–1114 (2012)

    Google Scholar 

  11. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  12. Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 135–140 (2010)

    Google Scholar 

  13. McLeod, A., Rohrmeier, M.A.: A modular system for the harmonic analysis of musical scores using a large vocabulary. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 435–442 (2021)

    Google Scholar 

  14. Micchi, G., Kosta, K., Medeot, G., Chanquion, P.: A deep learning method for enforcing coherence in automatic chord recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 443–451 (2021)

    Google Scholar 

  15. Nápoles López, N., Gotham, M., Fujinaga, I.: AugmentedNet: a roman numeral analysis network with synthetic training examples and additional tonal tasks. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 404–411 (2021)

    Google Scholar 

  16. Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text and images using deep features. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR, pp. 23–30 (2017)

    Google Scholar 

  17. Ostermann, F., Vatolkin, I.: AAM: Artificial audio multitracks dataset (2022). https://doi.org/10.5281/zenodo.5794629

  18. Pachet, F., Cazaly, D.: A taxonomy of musical genres. In: Proceedings of the 6th International Conference on Computer-Assisted Information Retrieval (Recherche d’Information et ses Applications), RIAO, pp. 1238–1245. CID (2000)

    Google Scholar 

  19. Pasupa, K., Sunhem, W.: A comparison between shallow and deep architecture classifiers on small dataset. In: Proceedings of the 8th International Conference on Information Technology and Electrical Engineering, ICITEE, pp. 1–6. IEEE (2016)

    Google Scholar 

  20. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)

    Google Scholar 

  21. Seyerlehner, K., Widmer, G., Knees, P.: Frame level audio similarity - a codebook approach. In: Proceedings of the 11th International Conference on Digital Audio Effects, DAFx (2008)

    Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, ICLR (2015)

    Google Scholar 

  23. Smith, J.B.L., Burgoyne, J.A., Fujinaga, I., De Roure, D., Downie, J.S.: Design and creation of a large-scale database of structural annotations. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR, pp. 555–560. University of Miami (2011)

    Google Scholar 

  24. Sturm, B.L.: A survey of evaluation in music genre recognition. In: Proceedings of the 10th International Workshop on Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, AMR, pp. 29–66 (2012)

    Google Scholar 

  25. Sturm, B.L.: Classification accuracy is not enough - on the evaluation of music genre recognition systems. J. Intell. Inf. Syst. 41(3), 371–406 (2013)

    Article  Google Scholar 

  26. van den Oord, A. and Dieleman, S. and Schrauwen, B.: Transfer learning by supervised pre-training for audio-based music classification. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR, pp. 29–34 (2014)

    Google Scholar 

  27. Vatolkin, I., Ginsel, P., Rudolph, G.: Advancements in the music information retrieval framework AMUSE over the last decade. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 2383–2389. ACM (2021)

    Google Scholar 

  28. Vatolkin, I., Adrian, B., Kuzmic, J.: A fusion of deep and shallow learning to predict genres based on instrument and timbre features. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds.) EvoMUSART 2021. LNCS, vol. 12693, pp. 313–326. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72914-1_21

    Chapter  Google Scholar 

  29. Vatolkin, I., Ostermann, F., Müller, M.: An evolutionary multi-objective feature selection approach for detecting music segment boundaries of specific types. In: Proceedings of the 2021 Genetic and Evolutionary Computation Conference, GECCO, pp. 1061–1069 (2021)

    Google Scholar 

  30. Vatolkin, I., Rudolph, G., Weihs, C.: Evaluation of album effect for feature selection in music genre recognition. In: Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pp. 169–175 (2015)

    Google Scholar 

  31. Yu, H., Kim, S.: SVM tutorial - classification, regression and ranking. In: Rozenberg, G., Bäck, T., Kok, J.N. (eds.) Handbook of Natural Computing, vol. 1, pp. 479–506. Springer, Berlin Heidelberg (2012). https://doi.org/10.1007/978-3-540-92910-9_15

    Chapter  Google Scholar 

  32. Zhang, W., Lei, W., Xu, X., Xing, X.: Improved music genre classification with convolutional neural networks. In: Proceedings of the 17th Annual Conference of the International Speech Communication Association, Interspeech, pp. 3304–3308. ISCA (2016)

    Google Scholar 

  33. Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021)

    Article  Google Scholar 

Download references

Acknowledgement

The authors gratefully acknowledge the computing time provided on the Linux HPC cluster at Technical University Dortmund (LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German Research Foundation (DFG) as project 271512359.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Vatolkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vatolkin, I., Gotham, M., López, N.N., Ostermann, F. (2023). Musical Genre Recognition Based on Deep Descriptors of Harmony, Instrumentation, and Segments. In: Johnson, C., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2023. Lecture Notes in Computer Science, vol 13988. Springer, Cham. https://doi.org/10.1007/978-3-031-29956-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-29956-8_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-29955-1

  • Online ISBN: 978-3-031-29956-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics