Abstract
The ConditionaL Neural Networks (CLNN) and the Masked ConditionaL Neural Networks (MCLNN) exploit the nature of multi-dimensional temporal signals. The CLNN captures the conditional temporal influence between the frames in a window and the mask in the MCLNN enforces a systematic sparseness that follows a filterbank-like pattern over the network links. The mask induces the network to learn about time-frequency representations in bands, allowing the network to sustain frequency shifts. Additionally, the mask in the MCLNN automates the exploration of a range of feature combinations, usually done through an exhaustive manual search. We have evaluated the MCLNN performance using the Ballroom and Homburg datasets of music genres. MCLNN have achieved accuracies that are competitive to state-of-the-art handcrafted attempts in addition to models based on Convolutional Neural Networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Aryafar, K., Shokoufandeh, A.: Music genre classification using explicit semantic analysis. In: International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies (MIRUM) (2011)
Battenberg, E., Wessel, D.: Analyzing drum patterns using conditional deep belief networks. In: International Society for Music Information Retrieval, ISMIR (2012)
Bergstra, J., Casagrande, N., Erhan, D., Eck, D., Kgl, B.: Aggregate features and adaboost for music classification. Mach. Learn. 65(2–3), 473–484 (2006)
Fahlman, S.E., Hinton, G.E., Sejnowski, T.J.: Massively parallel architectures for AI: NETL, Thistle, and Boltzmann machines. In: National Conference on Artificial Intelligence. AAAI (1983)
Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)
Gouyon, F., Dixon, S., Pampalk, E., Widmer, G.: Evaluating rhythmic descriptors for musical genre classification. In: International AES Conference (2004)
Hamel, P., Eck, D.: Learning features from music audio with deep belief networks. In: International Society for Music Information Retrieval Conference, ISMIR (2010)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Homburg, H., Mierswa, I., Moller, B., Morik, K., Wurst, M.: A benchmark dataset for audio classification and clustering. In: International Symposium on Music Information Retrieval (2005)
Kereliuk, C., Sturm, B.L., Larsen, J.: Deep learning and music adversaries. IEEE Trans. Multimedia 17(11), 2059–2071 (2015)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations, ICLR (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, NIPS (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lidy, T., Rauber, A.: Evaluation of feature extractors and psycho-acoustic transformations for music genre classification. In: International Conference on Music Information Retrieval, ISMIR (2005)
Lidy, T., Rauber, A., Pertusa, A., Inesta, J.M.: Improving genre classification by combination of audio and symbolic descriptors using a transcription system. In: International Conference on Music Information Retrieval (2007)
Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations, ICLR (2014)
Lykartsis, A., Lerch, A.: Beat histogram features for rhythm-based musical genre classification using multiple novelty functions. In: Conference on Digital Audio Effects (DAFx 2015) (2015)
Marchand, U., Peeters, G.: The modulation scale spectrum and its application to rhythm-content description. In: International Conference on Digital Audio Effects (DAFx) (2014)
Medhat, F., Chesmore, D., Robinson, J.: Masked conditional neural networks for audio classification. In: International Conference on Artificial Neural Networks (ICANN) (2017)
Moerchen, F., Mierswa, I., Ultsch, A.: Understandable models of music collections based on exhaustive feature generation with temporal statistics. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) (2006)
Mohamed, A.R., Hinton, G.: Phone recognition using restricted Boltzmann machines. In: IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2010)
Osendorfer, C., Schluter, J., Schmidhuber, J., van der Smagt, P.: Unsupervised learning of low-level audio features for music similarity estimation. In: Workshop on Speech and Visual Information Processing in Conjunction with the International Conference on Machine Learning (ICML) (2011)
Panagakis, Y., Kotropoulos, C.: Music classification by low-rank semantic mappings. EURASIP J. Audio Speech Music Process. 2013(1), 1–13 (2013)
Panagakis, Y., Kotropoulos, C.L., Arce, G.R.: Music genre classification via joint sparse low-rank representation of audio features. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1905–1917 (2014)
Peeters, G.: Spectral and temporal periodicity representations of rhythm for the automatic classification of music audio signal. IEEE Trans. Audio Speech Lang. Process. 19(5), 1242–1252 (2011)
Pohle, T., Schnitzer, D., Schedl, M., Knees, P., Widmer, G.: On rhythm and general music similarity. In: International Society for Music Information Retrieval, ISMIR (2009)
Pons, J., Lidy, T., Serra, X.: Experimenting with musically motivated convolutional neural networks. In: International Workshop on Content-Based Multimedia Indexing, CBMI (2016)
Ranzato, M., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2551–2558 (2010)
Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), March 2017
Schluter, J., Osendorfer, C.: Music similarity estimation with the mean-covariance restricted Boltzmann machine. In: International Conference on Machine Learning and Applications, ICMLA, pp. 118–123 (2011)
Seyerlehner, K., Schedl, M., Pohle, T., Knees, P.: Using block-level features for genre classification, tag classification and music similarity estimation. In: Music Information Retrieval eXchange, MIREX (2010)
Seyerlehner, K., Widmer, G.: Fusing block-level features for music similarity estimation. In: International Conference on Digital Audio Effects (DAFx 2010) (2010)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. JMLR 15, 1929–1958 (2014)
Taylor, G.W., Hinton, G.E., Roweis, S.: Modeling human motion using binary latent variables. In: Advances in Neural Information Processing Systems, NIPS, pp. 1345–1352 (2006)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)
Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Acknowledgments
This work is funded by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no. 608014 (CAPACITIE).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Medhat, F., Chesmore, D., Robinson, J. (2017). Music Genre Classification Using Masked Conditional Neural Networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-70096-0_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)