Classification of Music Genres Based on Mel-Frequency Cepstrum Coefficients Using Deep Learning Models

Preetham, Manoj; Panga, Jemimah Beulah; Andrew, J.; Raimond, Kumudha; Dang, Hien

doi:10.1007/978-981-19-2177-3_83

Manoj Preetham⁴⁰,
Jemimah Beulah Panga⁴⁰,
J. Andrew⁴¹,
Kumudha Raimond⁴⁰ &
…
Hien Dang^42,43

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 905))

480 Accesses

Abstract

Genre classification is indeed a vital task today since the number of songs produced on a regular basis keeps increasing. On average, around, 60,000 tracks are being uploaded per day on Spotify. So, classifying these tracks by genre is definitely an important task for every musical streaming services and platforms. Due to the high classification performance of neural network models such as convolutional neural network (CNN), multi-layer perceptron (MLP), and long short-term memory network (LSTM) are used in this work to automatically classify music into to its genres based on Mel-frequency cepstrum coefficients (MFCCs) instead of manually entering the genre. We experimented the models with the GTZAN dataset and provided a comparative analysis on the classification efficiency of deep learning models. We achieved a classification of 70.42% for our proposed CNN model which is greater than the human accuracy and over other deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

T. Li, M. Ogihara, Q. Li, A comparative study on content-based music genre classification 282 (2003), https://doi.org/10.1145/860435.860487
T. Johnson, Analyzing genre in post-millennial popular music. City Univ. New York, p. 206, Sep 2018, Accessed: 12 Jun 2021. [Online]. Available: https://academicworks.cuny.edu/gc_etds/2884
Y.E. Kim et al., Music emotion recognition: A state of the art review. Proc. Ismir 86, 937–952 (2010)
Google Scholar
Z. Fu, G. Lu, K.M. Ting, D. Zhang, A survey of audio-based music classification and annotation. IEEE Trans. Multimed. 13(2), 303–319 (2011). https://doi.org/10.1109/TMM.2010.2098858
Article Google Scholar
C. McKay, I. Fujinaga, P. Depalle, jAudio: A feature extraction library, in Proceedings of the International Conference on Music Information Retrieval (2005), pp. 600–603
Google Scholar
A. Karatana, O. Yildiz, Music genre classification with machine learning techniques, 1–4, Apr 2017, https://doi.org/10.1109/siu.2017.7960694
G. Tzanetakis, P. Cook, Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002). https://doi.org/10.1109/TSA.2002.800560
Article Google Scholar
M. Dong, Convolutional neural network achieves human-level accuracy in music genre classification, Feb 2018
Google Scholar
A.R. Rajanna, K. Aryafar, A. Shokoufandeh, R. Ptucha, Deep neural networks: A case study for music genre classification, in Proceedings—2015 IEEE 14th International Conference on Machine Learning and Applications, ICMLA 2015, pp. 655–660, Mar 2016, https://doi.org/10.1109/ICMLA.2015.160
B. Logan et al., Mel frequency cepstral coefficients for music modeling, in Ismir, vol. 270, (2000), pp. 1–11
Google Scholar
S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw. 8(1), 98–113 (1997). https://doi.org/10.1109/72.554195
Article Google Scholar
J. Tang, C. Deng, G. Bin Huang, Extreme learning machine for multilayer perceptron, IEEE Trans. Neural Networks Learn. Syst. 27(4), 809–821, Apr 2016, https://doi.org/10.1109/TNNLS.2015.2424995
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
P.C.G. Tzanetakis, GTZAN dataset
Google Scholar
W. Zhang, W. Lei, X. Xu, X. Xing, Improved music genre classification with convolutional neural networks, in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2016, vol. 08-12-Sept, pp. 3304–3308, https://doi.org/10.21437/Interspeech.2016-1236
K. Choi, G. Fazekas, M. Sandler, K. Cho, Transfer learning for music classification and regression tasks. Proc. 18th Int. Soc. Music Inf. Retr. Conf. ISMIR 2017, 141–149, Mar 2017
Google Scholar
Y.M.G. Costa, L.S. Oliveira, C.N. Silla, An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. J. 52, 28–38 (2017). https://doi.org/10.1016/j.asoc.2016.12.024
Article Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem (2016), pp. 770–778, https://doi.org/10.1109/CVPR.2016.90
J. Dai, S. Liang, W. Xue, C. Ni, W. Liu, Long short-term memory recurrent neural network based segment features for music genre classification, 2017. https://doi.org/10.1109/ISCSLP.2016.7918369
K. Choi, G. Fazekas, M. Sandler, K. Cho, Convolutional recurrent neural networks for music classification, in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, Jun 2017, pp. 2392–2396, https://doi.org/10.1109/ICASSP.2017.7952585
Y.M.G. Costa, L.S. Oliveira, A.L. Koerich, F. Gouyon, J.G. Martins, Music genre classification using LBP textural features. Signal Process. 92(11), 2723–2737 (2012). https://doi.org/10.1016/j.sigpro.2012.04.023
Article Google Scholar
Y. Costa, L. Oliveira, A. Koerich, F. Gouyon, Music genre recognition using gabor filters and LPQ texture descriptors, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, vol. 8259, LNCS, no. PART 2, pp. 67–74, https://doi.org/10.1007/978-3-642-41827-3_9
M.-J. Wu, J.-S.R. Jang, Combining acoustic and multilevel visual features for music genre classification. ACM Trans. Multimed. Comput. Commun. Appl. 12(10) (2015), https://doi.org/10.1145/2801127
L. Nanni, Y.M.G. Costa, A. Lumini, M.Y. Kim, S.R. Baek, Combining visual and acoustic features for music genre classification. Expert Syst. Appl. 45, 108–117 (2016). https://doi.org/10.1016/j.eswa.2015.09.018
Article Google Scholar
V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines
Google Scholar
G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, Jul 2012
Google Scholar
A.B. Chan, A. Hon, W. Chun, T.L. Li, A.H. Chun, Automatic Musical Pattern Feature Extraction Using Convolutional Neural Network Image/video restoration and people tracking in outdoor environments View project SmartPalette View project Automatic Musical Pattern Feature Extraction Using Convolutional Neu. (2010)
Google Scholar
A. van den Oord et al., WaveNet: A Generative model for raw audio, Sep 2016
Google Scholar
L. Wyse, Audio spectrogram representations for processing with convolutional neural networks, Jun 2017
Google Scholar
A. Schindler, T. Lidy, Parallel Convolutional Neural Networks for Music Genre and Mood Classification Europeana Sounds View Project SCAPE Project View Project Parallel Convolutional Neural Networks for Music Genre and Mood Classification (2017)
Google Scholar
F. Gouyon, Y.M.G. Costa, L.S. Oliveira, A.L. Koericb, Music genre recognition using spectrograms, in IEEE Conference Publication, IEEE Xplore
Google Scholar
J. Andrew, S.S. Mathew, B. Mohit, A Comprehensive analysis of privacy-preserving techniques in deep learning based disease prediction systems 0–9 (2019), https://doi.org/10.1088/1742-6596/1362/1/012070
J.A. Onesimu, J. Karthikeyan, An efficient privacy-preserving deep learning scheme for medical image analysis. J. Inf. Technol. Manag. 12(Special Issue: The Importance of Human Computer Interaction: Challenges, Methods and Applications), 50–67, Dec 2021, https://doi.org/10.22059/jitm.2020.79191
J. Andrew, R. Fiona, H. Caleb Andrew, Comparative study of various deep convolutional neural networks in the early prediction of cancer, in 2019 International Conference on Intelligent Computing and Control Systems, ICCS 2019, May 2019, pp. 884–890, https://doi.org/10.1109/ICCS45141.2019.9065445

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India
Manoj Preetham, Jemimah Beulah Panga & Kumudha Raimond
Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
J. Andrew
Department of Computer Science, University of Massachusetts Boston, Boston, MA, USA
Hien Dang
Faculty of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
Hien Dang

Authors

Manoj Preetham
View author publications
You can also search for this author in PubMed Google Scholar
Jemimah Beulah Panga
View author publications
You can also search for this author in PubMed Google Scholar
J. Andrew
View author publications
You can also search for this author in PubMed Google Scholar
Kumudha Raimond
View author publications
You can also search for this author in PubMed Google Scholar
Hien Dang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Andrew .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India
J. Dinesh Peter
Department of Computer Science, Creighton University, Omaha, NE, USA
Steven Lawrence Fernandes
Civil and Environmental Engineering, University of Pittsburgh, Pittsburgh, PA, USA
Amir H. Alavi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Preetham, M., Panga, J.B., Andrew, J., Raimond, K., Dang, H. (2022). Classification of Music Genres Based on Mel-Frequency Cepstrum Coefficients Using Deep Learning Models. In: Peter, J.D., Fernandes, S.L., Alavi, A.H. (eds) Disruptive Technologies for Big Data and Cloud Applications. Lecture Notes in Electrical Engineering, vol 905. Springer, Singapore. https://doi.org/10.1007/978-981-19-2177-3_83

Download citation

DOI: https://doi.org/10.1007/978-981-19-2177-3_83
Published: 02 August 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2176-6
Online ISBN: 978-981-19-2177-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics