An effective analysis of deep learning based approaches for audio based feature extraction and its visualization

Dhiraj; Biswas, Rohit; Ghattamaraju, Nischay

doi:10.1007/s11042-018-6706-x

An effective analysis of deep learning based approaches for audio based feature extraction and its visualization

Published: 13 October 2018

Volume 78, pages 23949–23972, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

878 Accesses
13 Citations
Explore all metrics

Abstract

Visualizations help decipher latent patterns in music and garner a deep understanding of a song’s characteristics. This paper offers a critical analysis of the effectiveness of various state-of-the-art Deep Neural Networks in visualizing music. Several implementations of auto encoders and genre classifiers have been explored for extracting meaningful features from audio tracks. Novel techniques have been devised to map these audio features to parameters that drive visualizations. These methodologies have been designed in a manner that enables the visualizations to be responsive to the music as well as provide unique visual experiences across different songs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Article 24 April 2024

Towards Deep Learning Strategies for Transcribing Electroacoustic Music

Visual Representations for Music Understanding Improvement

References

Annesi P, Basili R, Gitto R, Moschitti A, Petitti R (2007) Audio feature engineering for automatic music genre classification. In Large Scale Semantic Access to Content (Text, Image, Video, and Sound), pp. 702-711. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE
Baniya BK, Lee J, Li ZN (2014) Audio feature reduction and analysis for automatic music genre classification. In: 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, pp. 457–462
Benzi K, Defferrard M, Vandergheynst P, Bresson X (2016) Fma: A dataset for music analysis,” arXiv preprint arXiv:1612.01840
Chung Y, Wu C, Shen C, Lee H, Lee L (2016) Audio Word2Vec: Unsupervised learning of audio segment representations using sequence-to-sequence autoencoder. Proc. Interspeech, pp. 410–415
Ciuha P, Klemenc B, Solina F (2010) Visualization of concurrent tones in music with colours. Univ. of Ljubljana, Slovenia
Book Google Scholar
Congote J, Segura A, Kabongo L, Moreno A, Posada J, Ruiz O (2011) Interactive visualization of volumetric data with webgl in real-time. In: Proceedings of the 16th International Conference on 3D Web Technology, pp. 137–146. ACM
Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. Proc. IEEE Int. Conf Acoust. Speech Signal Process, pp. 6964–6968
Dixon S, Goebl W, Widmer G (2002) The performance worm: Real time visualisation based on langner's representation. In M. Nordahl, eds, Proceedings of the 2002 International Computer Music Conference, pages 361–364, San Francisco. International Computer Music Association
Foote J (2018) Visualizing music and audio using self-similarity
Gallagher M, Downs T (1997) Visualisation of learning in neural networks using principal component analysis. In: Verma B and Yao X (eds) Proceedings of International Conference on Computational Intelligence and Multimedia Applications, Gold Coast, pp. 327–331
Ha D, Eck D (2017) A neural representation of sketch drawings. CoRR
Hershey S, Chaudhuri S, Ellis DP, Gemmeke JF, Jansen A, Moore RC, Plakal M, Platt D, Saurous RA, Seybold B et al (2016) CNN architectures for large-scale audio classification. arXiv preprint arXiv: 1609.09430
Humphrey EJ, Bello JP, LeCun Y (2013) Feature Learning and Deep Architectures: New Directions for Music Informatics. J Intell Inf Syst 41(3):461–481
Article Google Scholar
Im DJ, Belghazi MID, Memisevic R (2015) Conservativeness of untied auto-encoders, CoRR, abs/1506.07643
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML
Kahng M, Andrews PY, Kalro A, Chau DH (2018) ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models. IEEE Trans Vis Comput Graph 24(1):88–97. https://doi.org/10.1109/tvcg.2017.2744718
Article Google Scholar
Kim J, Won M, Serra X, Liem CCS (2018) Transfer Learning of Artist Group Factors to Musical Genre Classification. In WWW
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. InNIPS, pp. 1106–1114
Mao X, Shen C, Yang Y-B (2016) Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections. In NIPS
Mierswa I, Morik K (2005) Automatic Feature Extraction for Classifying Audio Data. Mach Learn 58(2-3):127–149. https://doi.org/10.1007/s10994-005-5824-7
Article MATH Google Scholar
Murauer B, Specht G (2018) Detecting Music Genre Using Extreme Gradient Boosting. In WWW
Nam J, Herrera J, Lee K (2015) A Deep Bag-of-Features Model for Music Auto-Tagging. Eprint arXiv: 1508.04999
Pascual S, Bonafonte A, Serrà J (2017) SEGAN: Speech Enhancement Generative Adversarial Network arXiv: 1703.09452
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in PyTorch. In: NIPS 2017 Autodiff Workshop: The Future of Gradient-based Machine Learning Software and Techniques, Long Beach
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396
Scherer D, Muller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: Proc. of the Intl. Conf. on Artificial Neural Networks, pp. 92–101
Schluter J (2011) Unsupervised audio feature extraction for music similarity estimation. Technische Universit at Munchen, Fakultat fur Informatik
Sigtia S, Dixon S (2014) Improved music feature learning with deep neural networks. In: Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In ICLR
Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In ICML, volume 28 of JMLR Proceedings, pp. 1139–1147. JMLR.org
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Rabinovich A (2014) Going deeper with convolutions. Technical report
Takahashi N, Gygli M, Gool LV (2017) AEnet: Learning deep audio features for video analysis. arXiv: 1701.00599
Takahashi N, Gygli M, Van Gool L (2017) Aenet: Learning deep audio features for video analysis
Taylor R, Boulanger P, Torres D (2006) Real-time music visualizations using responsive imagery
Umapathy K, Krishnan S, Rao RK (2007) Audio Signal Feature Extraction and Classification Using Local Discriminant Bases. IEEE Trans Audio Speech Lang Process 15(4):1236–1246
Article Google Scholar
Wang H-H, Liu J-M, You M, Li G-Z (2015) Audio signals encoding for cough classification using convolutional neural networks: A comparative study. 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Washington, DC, pp. 442–445
Wang S, Sun J, Phillips P, Zhao G, Zhang Y (2017) Polarimetric synthetic aperture radar image segmentation by the convolutional neural network using graphical processing units. J Real-Time Image Proc
Wyse L (2017) Audio spectrogram representations for processing with convolutional neural networks. arXiv: 1706.09559
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557
Zhang Y-D, Dong Z, Chen X, Jia W, Du S, Muhammad K, Wang S (2017) Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimed Tools Appl:1–20. https://doi.org/10.1007/s11042-017-5243-3

Download references

Author information

Authors and Affiliations

CSIR-Central Electronics Engineering Research Institute (CEERI), Pilani, India
Dhiraj
Birla Institute of Technology and Science, Pilani, India
Rohit Biswas & Nischay Ghattamaraju

Authors

Dhiraj
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Nischay Ghattamaraju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dhiraj.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhiraj, Biswas, R. & Ghattamaraju, N. An effective analysis of deep learning based approaches for audio based feature extraction and its visualization. Multimed Tools Appl 78, 23949–23972 (2019). https://doi.org/10.1007/s11042-018-6706-x

Download citation

Received: 16 February 2018
Revised: 05 August 2018
Accepted: 18 September 2018
Published: 13 October 2018
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11042-018-6706-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective analysis of deep learning based approaches for audio based feature extraction and its visualization

Abstract

Access this article

Similar content being viewed by others

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Towards Deep Learning Strategies for Transcribing Electroacoustic Music

Visual Representations for Music Understanding Improvement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective analysis of deep learning based approaches for audio based feature extraction and its visualization

Abstract

Access this article

Similar content being viewed by others

Classification and study of music genres with multimodal Spectro-Lyrical Embeddings for Music (SLEM)

Towards Deep Learning Strategies for Transcribing Electroacoustic Music

Visual Representations for Music Understanding Improvement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation