Musical Genre Recognition Based on Deep Descriptors of Harmony, Instrumentation, and Segments

Vatolkin, Igor; Gotham, Mark; López, Néstor Nápoles; Ostermann, Fabian

doi:10.1007/978-3-031-29956-8_27

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13988))

Included in the following conference series:

International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar)

1312 Accesses
1 Citations

Abstract

Deep learning has recently established itself as a cluster of methods of choice for almost all classification tasks in music information retrieval. However, despite very good classification performance, it sometimes brings disadvantages including long training times and higher energy costs, lower interpretability of classification models, or an increased risk of overfitting when applied to small training sets due to a very large number of trainable parameters. In this paper, we investigate the combination of both deep and shallow algorithms for recognition of musical genres using a transfer learning approach. We train deep classification models once to predict harmonic, instrumental, and segment properties from datasets with respective annotations. Their predictions for another dataset with annotated genres are used as features for shallow classification methods. They can be trained over and again for different categories, and are particularly useful when the training sets are small, in a real world scenario when listeners define various musical categories selecting only a few prototype tracks. The experiments show the potential of the proposed approach for genre recognition. In particular, when combined with evolutionary feature selection which identifies the most relevant deep feature dimensions, the classification errors became significantly lower in almost all cases, compared to a baseline based on MFCCs or results reported in the previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/napulen/AugmentedNet, accessed on 31.01.2023.
2.
http://www.seyerlehner.info/joomla/index.php/datasets, accessed on 31.01.2023.

References

Berenzweig, A., Ellis, D.P.W., Lawrence, S.: Anchor space for classification and similarity measurement of music. In: Proceedings of the IEEE International Conference on Multimedia and Expo, ICME, pp. 29–32. IEEE Computer Society (2003)
Google Scholar
Beume, N., Naujoks, B., Emmerich, M.T.M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)
Article MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Transfer learning for music classification and regression tasks. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR, pp. 141–149. International Society for Music Information Retrieval (2017)
Google Scholar
Costa, Y.M., Oliveira, L.S., Silla, C.N.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)
Article Google Scholar
Gotham, M., Kleinertz, R., Weiss, C., Müller, M., Klauk, S.: What if the ‘when’ implies the ‘what’?: Human harmonic analysis datasets clarify the relative role of the separate steps in automatic tonal analysis. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 229–236 (2021)
Google Scholar
Grill, T., Schlüter, J.: Music boundary detection using neural networks on combined features and two-level annotations. In: Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pp. 531–537 (2015)
Google Scholar
Han, Y., Kim, J., Lee, K.: Deep convolutional neural networks for predominant instrument recognition in polyphonic music. IEEE ACM Trans. Audio Speech Lang. Process. 25(1), 208–221 (2017)
Article Google Scholar
Hofmann, M., Klinkenberg, R.: RapidMiner: data mining use cases and business analytics applications. Chapman & Hall/CRC (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 26th Annual Conference on Neural Information Processing Systems, NIPS, pp. 1106–1114 (2012)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Mauch, M., Dixon, S.: Approximate note transcription for the improved identification of difficult chords. In: Proceedings of the 11th International Society for Music Information Retrieval Conference, ISMIR, pp. 135–140 (2010)
Google Scholar
McLeod, A., Rohrmeier, M.A.: A modular system for the harmonic analysis of musical scores using a large vocabulary. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 435–442 (2021)
Google Scholar
Micchi, G., Kosta, K., Medeot, G., Chanquion, P.: A deep learning method for enforcing coherence in automatic chord recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 443–451 (2021)
Google Scholar
Nápoles López, N., Gotham, M., Fujinaga, I.: AugmentedNet: a roman numeral analysis network with synthetic training examples and additional tonal tasks. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, pp. 404–411 (2021)
Google Scholar
Oramas, S., Nieto, O., Barbieri, F., Serra, X.: Multi-label music genre classification from audio, text and images using deep features. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR, pp. 23–30 (2017)
Google Scholar
Ostermann, F., Vatolkin, I.: AAM: Artificial audio multitracks dataset (2022). https://doi.org/10.5281/zenodo.5794629
Pachet, F., Cazaly, D.: A taxonomy of musical genres. In: Proceedings of the 6th International Conference on Computer-Assisted Information Retrieval (Recherche d’Information et ses Applications), RIAO, pp. 1238–1245. CID (2000)
Google Scholar
Pasupa, K., Sunhem, W.: A comparison between shallow and deep architecture classifiers on small dataset. In: Proceedings of the 8th International Conference on Information Technology and Electrical Engineering, ICITEE, pp. 1–6. IEEE (2016)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)
Google Scholar
Seyerlehner, K., Widmer, G., Knees, P.: Frame level audio similarity - a codebook approach. In: Proceedings of the 11th International Conference on Digital Audio Effects, DAFx (2008)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, ICLR (2015)
Google Scholar
Smith, J.B.L., Burgoyne, J.A., Fujinaga, I., De Roure, D., Downie, J.S.: Design and creation of a large-scale database of structural annotations. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR, pp. 555–560. University of Miami (2011)
Google Scholar
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Proceedings of the 10th International Workshop on Adaptive Multimedia Retrieval: Semantics, Context, and Adaptation, AMR, pp. 29–66 (2012)
Google Scholar
Sturm, B.L.: Classification accuracy is not enough - on the evaluation of music genre recognition systems. J. Intell. Inf. Syst. 41(3), 371–406 (2013)
Article Google Scholar
van den Oord, A. and Dieleman, S. and Schrauwen, B.: Transfer learning by supervised pre-training for audio-based music classification. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR, pp. 29–34 (2014)
Google Scholar
Vatolkin, I., Ginsel, P., Rudolph, G.: Advancements in the music information retrieval framework AMUSE over the last decade. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR, pp. 2383–2389. ACM (2021)
Google Scholar
Vatolkin, I., Adrian, B., Kuzmic, J.: A fusion of deep and shallow learning to predict genres based on instrument and timbre features. In: Romero, J., Martins, T., Rodríguez-Fernández, N. (eds.) EvoMUSART 2021. LNCS, vol. 12693, pp. 313–326. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72914-1_21
Chapter Google Scholar
Vatolkin, I., Ostermann, F., Müller, M.: An evolutionary multi-objective feature selection approach for detecting music segment boundaries of specific types. In: Proceedings of the 2021 Genetic and Evolutionary Computation Conference, GECCO, pp. 1061–1069 (2021)
Google Scholar
Vatolkin, I., Rudolph, G., Weihs, C.: Evaluation of album effect for feature selection in music genre recognition. In: Proceedings of the 16th International Society for Music Information Retrieval Conference, ISMIR, pp. 169–175 (2015)
Google Scholar
Yu, H., Kim, S.: SVM tutorial - classification, regression and ranking. In: Rozenberg, G., Bäck, T., Kok, J.N. (eds.) Handbook of Natural Computing, vol. 1, pp. 479–506. Springer, Berlin Heidelberg (2012). https://doi.org/10.1007/978-3-540-92910-9_15
Chapter Google Scholar
Zhang, W., Lei, W., Xu, X., Xing, X.: Improved music genre classification with convolutional neural networks. In: Proceedings of the 17th Annual Conference of the International Speech Communication Association, Interspeech, pp. 3304–3308. ISCA (2016)
Google Scholar
Zhuang, F., et al.: A comprehensive survey on transfer learning. Proc. IEEE 109(1), 43–76 (2021)
Article Google Scholar

Download references

Acknowledgement

The authors gratefully acknowledge the computing time provided on the Linux HPC cluster at Technical University Dortmund (LiDO3), partially funded in the course of the Large-Scale Equipment Initiative by the German Research Foundation (DFG) as project 271512359.

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, Dortmund, Germany
Igor Vatolkin & Fabian Ostermann
Department of Arts and Sports Sciences, TU Dortmund University, Dortmund, Germany
Mark Gotham
McGill University, CIRMMT, Montréal, QC, Canada
Néstor Nápoles López

Authors

Igor Vatolkin
View author publications
You can also search for this author in PubMed Google Scholar
Mark Gotham
View author publications
You can also search for this author in PubMed Google Scholar
Néstor Nápoles López
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Ostermann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Vatolkin .

Editor information

Editors and Affiliations

University of Nottingham, Nottingham, UK
Colin Johnson
University of A Coruña, A Coruña, Spain
Nereida Rodríguez-Fernández
University of Coimbra, Coimbra, Portugal
Sérgio M. Rebelo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vatolkin, I., Gotham, M., López, N.N., Ostermann, F. (2023). Musical Genre Recognition Based on Deep Descriptors of Harmony, Instrumentation, and Segments. In: Johnson, C., Rodríguez-Fernández, N., Rebelo, S.M. (eds) Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2023. Lecture Notes in Computer Science, vol 13988. Springer, Cham. https://doi.org/10.1007/978-3-031-29956-8_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-29956-8_27
Published: 01 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29955-1
Online ISBN: 978-3-031-29956-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Musical Genre Recognition Based on Deep Descriptors of Harmony, Instrumentation, and Segments