Abstract
In recent years, deep learning technique has received intense attention owing to its great success in image recognition. A tendency of adaption of deep learning in various information processing fields has formed, including music information retrieval (MIR). In this paper, we conduct a comprehensive study on music audio classification with improved convolutional neural networks (CNNs). To the best of our knowledge, this the first work to apply Densely Connected Convolutional Networks (DenseNet) to music audio tagging, which has been demonstrated to perform better than Residual neural network (ResNet). Additionally, two specific data augmentation approaches of time overlapping and pitch shifting have been proposed to address the deficiency of labelled data in the MIR. Moreover, an ensemble learning of stacking is employed based on SVM. We believe that the proposed combination of strong representation of DenseNet and data augmentation can be adapted to other audio processing tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, Nevada, pp. 770–778 (2016)
Huang, G., Liu, Z., van der Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, Hawaii, pp. 4700–4708 (2017)
Chathuranga, Y.M.D., Jayaratne, K.L.: Automatic music genre classification of audio signals with machine learning approaches. GSTF Int. J. Comput. (JoC) 3(2), 13 (2013)
Auguin, N., Huang, S., Fung, P.: Identification of live or studio versions of a song via supervised learning. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 1–4 (2013)
Costa, Y.M.G., Oliveira, L.S., Koericb, A.L., Gouyon, F.: Music genre recognition using spectrograms. In: 18th International Conference on Systems, Signals and Image Processing, pp. 1–4 (2011)
Dielman, S., Schrauwen, B.: End to end deep learning for music audio. In: IEEE International Conference on Music Information Retrieval (ISMIR) (2011)
Li, T., Ogihara, M., Li, Q.: A comparative study on content-based music genre classification. In: The 26th Annual International ACM SIGIR Conference on Research and Development in information Retrieval, pp. 282–289. ACM (2003)
Nakashika, T., Garcia, C., Takiguchi, T.: Local-feature-map integration using convolutional neural networks for music genre classification. In: 13th Annual Conference of the International Speech Communication Association (2012)
Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: IEEE Acoustics, Speech and Signal Processing (ICASSP), pp. 6964– 6968 (2014)
Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X.: FMA: a dataset for music analysis. In: International Society for Music Information Retrieval Conference (ISMIR), pp. 316–323 (2017)
Sturm, B.L.: An analysis of the GTZAN music genre dataset. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, pp. 7–12. ACM (2012)
Aguiar, R.L., Costa, M.G.Y., Silla Jr, N.C.: Exploring data augmentation to improve music genre classification with convNets. In: International Joint Conference on Neural Networks (IJCNN) (2018)
Mubarak, O.M., Ambikai Rajah, E., Epps, J.: Novel features for effective speech and music discrimination. IEEE Engineering on Intelligent Systems, pp. 342–346 (2006)
Wyse, L.: Audio spectrogram representations for processing with Convolutional neural networks. In: Proceeding of the First International Workshop on Deep Learning for Music (2017)
Gwardys, G., Grzywczak, D.: Deep image features in music information retrieval. Int. J. Electron. Telecommun. 4(60), 321–326 (2014)
Lee, H., Pham, P., Largman, Y., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Neural Information Processing Systems, pp. 1096–1104 (2009)
Choi, K., Fazekas, G., Sandler, M.: Automatic tagging using deep convolutional neural networks. In: Society for Music Information Retrieval Conference, New York, NY, pp. 805–811 (2016)
Kim, T., Lee, J., Nam, J.: Sample-level CNN architectures for music auto- tagging using raw waveforms. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, Aalborg, Denmark, pp. 366–370 (2018)
Schluter, J., Bock, S.: Musical onset detection with convolutional neural networks. In: 6th International Workshop on Machine Learning and Music (MML), Prague, Czech Republic (2013)
Tokozume, Y., Harada, T.: Learning environmental sounds with end-to-end convolutional neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2017)
Nam, J., Choi, K., Lee, J.: Deep learning for audio-Based music classification and tagging. IEEE Signal Process. Mag. 36(1), 41–51 (2019)
Ulyanov, D., Lebedev, V.: Audio texture synthesis and style transfer (2016)
Park, J., Lee, J., Park, J., Ha, J., Nam, J.: Representation learning of music using artist labels. In: 19th International Society for Music Information Retrieval Conference (2018)
Lin, M., Chen, Q., Yan, S.: Network in network. In: Proceedings of ICLR (2014)
Choi, K., et al.: Transfer learning for music classification and regression tasks. In: 18th International Society of Music Information Retrieval (ISMIR) Conference, Suzhou, China (2017)
Arabi, A.F., Lu, G.: Enhanced polyphonic music genre classification using high level features. In: 2009 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 101–106. IEEE (2009)
Panagakis, Y., Kotropoulos, C.: Music genre classification via topology preserving non-negative tensor factorization and sparse representations. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 249–252. IEEE (2010)
Donmoon, L., Lee, J., Park, J., Lee, K.: Enhancing music features by knowledge transfer from user-item log data. In: Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2019)
Baniya, B.K., Ghimire, D., Lee, J.: A novel approach of automatic music genre classification based on timbrai texture and rhythmic content features. In: International Conference on IEEE Advanced Communication Technology (ICACT), pp. 96-102 (2014)
Acknowledgement
This work was supported by Ping An Technology (Shenzhen) Co., Ltd, China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bian, W., Wang, J., Zhuang, B., Yang, J., Wang, S., Xiao, J. (2019). Audio-Based Music Classification with DenseNet and Data Augmentation. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-29894-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29893-7
Online ISBN: 978-3-030-29894-4
eBook Packages: Computer ScienceComputer Science (R0)