Skip to main content

Music Genre Classification Using Masked Conditional Neural Networks

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10635))

Included in the following conference series:

Abstract

The ConditionaL Neural Networks (CLNN) and the Masked ConditionaL Neural Networks (MCLNN) exploit the nature of multi-dimensional temporal signals. The CLNN captures the conditional temporal influence between the frames in a window and the mask in the MCLNN enforces a systematic sparseness that follows a filterbank-like pattern over the network links. The mask induces the network to learn about time-frequency representations in bands, allowing the network to sustain frequency shifts. Additionally, the mask in the MCLNN automates the exploration of a range of feature combinations, usually done through an exhaustive manual search. We have evaluated the MCLNN performance using the Ballroom and Homburg datasets of music genres. MCLNN have achieved accuracies that are competitive to state-of-the-art handcrafted attempts in addition to models based on Convolutional Neural Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  2. Aryafar, K., Shokoufandeh, A.: Music genre classification using explicit semantic analysis. In: International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM) (2011)

    Google Scholar 

  3. Battenberg, E., Wessel, D.: Analyzing drum patterns using conditional deep belief networks. In: International Society for Music Information Retrieval, ISMIR (2012)

    Google Scholar 

  4. Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kgl, B.: Aggregate features and adaboost for music classification. Mach. Learn. 65(2–3), 473–484 (2006)

    Article  Google Scholar 

  5. Fahlman, S.E., Hinton, G.E., Sejnowski, T.J.: Massively parallel architectures for AI: NETL, Thistle, and Boltzmann machines. In: National Conference on Artificial Intelligence. AAAI (1983)

    Google Scholar 

  6. Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)

    Article  Google Scholar 

  7. Gouyon, F., Dixon, S., Pampalk, E., Widmer, G.: Evaluating rhythmic descriptors for musical genre classification. In: International AES Conference (2004)

    Google Scholar 

  8. Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference, ISMIR (2010)

    Google Scholar 

  9. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  10. Homburg, H., Mierswa, I., Moller, B., Morik, K., Wurst, M.: A benchmark dataset for audio classification and clustering. In: International Symposium on Music Information Retrieval (2005)

    Google Scholar 

  11. Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimedia 17(11), 2059–2071 (2015)

    Article  Google Scholar 

  12. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations, ICLR (2015)

    Google Scholar 

  13. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, NIPS (2012)

    Google Scholar 

  14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  15. Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: International Conference on Music Information Retrieval, ISMIR (2005)

    Google Scholar 

  16. Lidy, T., Rauber, A., Pertusa, A., Inesta, J.M.: Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In: International Conference on Music Information Retrieval (2007)

    Google Scholar 

  17. Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations, ICLR (2014)

    Google Scholar 

  18. Lykartsis, A., Lerch, A.: Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. In: Conference on Digital Audio Effects (DAFx 2015) (2015)

    Google Scholar 

  19. Marchand, U., Peeters, G.: The modulation scale spectrum and its application to rhythm-content description. In: International Conference on Digital Audio Effects (DAFx) (2014)

    Google Scholar 

  20. Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for audio classification. In: International Conference on Artificial Neural Networks (ICANN) (2017)

    Google Scholar 

  21. Moerchen, F., Mierswa, I., Ultsch, A.: Understandable models of music collections based on exhaustive feature generation with temporal statistics. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2006)

    Google Scholar 

  22. Mohamed, A.R., Hinton, G.: Phone recognition using restricted Boltzmann machines. In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2010)

    Google Scholar 

  23. Osendorfer, C., Schluter, J., Schmidhuber, J., van der Smagt, P.: Unsupervised learning of low-level audio features for music similarity estimation. In: Workshop on Speech and Visual Information Processing in Conjunction with the International Conference on Machine Learning (ICML) (2011)

    Google Scholar 

  24. Panagakis, Y., Kotropoulos, C.: Music classification by low-rank semantic mappings. EURASIP J. Audio Speech Music Process. 2013(1), 1–13 (2013)

    Article  Google Scholar 

  25. Panagakis, Y., Kotropoulos, C.L., Arce, G.R.: Music genre classification via joint sparse low-rank representation of audio features. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1905–1917 (2014)

    Article  Google Scholar 

  26. Peeters, G.: Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. IEEE Trans. Audio Speech Lang. Process. 19(5), 1242–1252 (2011)

    Article  Google Scholar 

  27. Pohle, T., Schnitzer, D., Schedl, M., Knees, P., Widmer, G.: On rhythm and general music similarity. In: International Society for Music Information Retrieval, ISMIR (2009)

    Google Scholar 

  28. Pons, J., Lidy, T., Serra, X.: Experimenting with musically motivated convolutional neural networks. In: International Workshop on Content-Based Multimedia Indexing, CBMI (2016)

    Google Scholar 

  29. Ranzato, M., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2551–2558 (2010)

    Google Scholar 

  30. Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), March 2017

    Google Scholar 

  31. Schluter, J., Osendorfer, C.: Music similarity estimation with the mean-covariance restricted Boltzmann machine. In: International Conference on Machine Learning and Applications, ICMLA, pp. 118–123 (2011)

    Google Scholar 

  32. Seyerlehner, K., Schedl, M., Pohle, T., Knees, P.: Using block-level features for genre classification, tag classification and music similarity estimation. In: Music Information Retrieval eXchange, MIREX (2010)

    Google Scholar 

  33. Seyerlehner, K., Widmer, G.: Fusing block-level features for music similarity estimation. In: International Conference on Digital Audio Effects (DAFx 2010) (2010)

    Google Scholar 

  34. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. JMLR 15, 1929–1958 (2014)

    MATH  MathSciNet  Google Scholar 

  35. Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, NIPS, pp. 1345–1352 (2006)

    Google Scholar 

  36. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  37. Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)

    Google Scholar 

Download references

Acknowledgments

This work is funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 608014 (CAPACITIE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fady Medhat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Medhat, F., Chesmore, D., Robinson, J. (2017). Music Genre Classification Using Masked Conditional Neural Networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70096-0_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70095-3

  • Online ISBN: 978-3-319-70096-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics