Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy

Rajan, Rajeev; Sivan, Sreejth

doi:10.1007/s00034-023-02301-w

Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy

Published: 07 February 2023

Volume 42, pages 4072–4095, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

179 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

A vital aspect of Indian classical music (ICM) is rāga, which serves as a melodic framework for compositions and improvisations for both traditions of classical music. In this work, we propose a CNN-based sliding window analysis on mel-spectrogram and modgdgram for rāga recognition in Carnatic music. The important contribution of the work is that the proposed method neither requires pitch extraction nor metadata for the estimation of rāga. CNN learns the representation of rāga from the patterns in the mel-spectrogram/modgdgram during training through a sliding-window analysis. We train and test the network on the sliced-mel-spectrogram/modgdgram of the original audio while the final inference is performed on the audio as a whole. The performance is evaluated on 15 rāgas from the CompMusic dataset. Two fusion paradigms, namely multi-channel and multi-modal frameworks, have been implemented to identify the potential of two feature representations. Out of the two approaches, multi-modal architecture reports a macro-F1 measure of 0.72, which is at par with the performance of the baseline sequence classification model. The performance is also compared with that of a transfer learning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Raga Recognition in Indian Classical Music Using Deep Learning

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Analytical Comparison of Classification Models for Raga Identification in Carnatic Classical Instrumental Polyphonic Audio

Article 13 October 2020

Data Availability Statement

The dataset employed in the proposed work is available in the CompMusic repository,

Notes

A rasika, in Carnatic music terminology, is a person who has some knowledge of Carnatic music and can appreciate Carnatic music.
Twelve notes are not sufficient for an accurate or faithful representation of Carnatic music; the seasoned musician can easily identify many more than 12 musical entities in an octave, and they are collectively called “melodic atoms”.
swara is a Sanskrit word that connotes a breath simultaneously, a vowel and the sound of a musical note corresponding to its name.
Like a scale has numbers placed at different positions and we can measure the distance between them, the swara scale also has fixed positions of the 7 swaras as mentioned above.
Arohana is the ascending scale of notes, and Avarohana is the descending scale of notes.
Convolution of signals in the time domain is reflected as a summation in the group delay domain.
The (anti) resonance peaks (due to complex conjugate (zero) pole pairs) of a signal are better resolved in the group delay domain than in the spectral domain.

References

T. Asha, M.S. Saranya, D.S.K, Pandia, S. Madikeri, H.A. Murthy, Feature Switching in the i-vector framework for speaker verification, in Proceedings Interspeech (2014), pp. 1125–1129. https://doi.org/10.21437/Interspeech.2014-288
P. Boersma, Praat, A system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001)
Google Scholar
S. Belle, R. Joshi, P. Rao, Rāga identification using Swara intonation. J. ITC Sangeet Res. Acad. 23, 1–7 (2009)
Google Scholar
C. Chen, Q. Li, A multimodal music emotion classification method based on multi-feature combined network classifier. Math. Probl. Eng. (2020). https://doi.org/10.1155/2020/4606027
Article Google Scholar
P. Chordia, A. Rae, Raaga recognition using pitch class and pitch class distributions, in Proceedings of International Society for Music Information Retrieval Conference, vol. 43(1) (2007), pp. 431–436
S. Chowdhuri, Phononet “Multi-stage deep neural networks for rāga identification in Hindustani classical music”, in Proceedings of International Conference on Multimedia Retrieval (2019), pp. 197–201
P. Dighe, H. Karnick, B. Raj, Swara histogram based structural analysis and identification of Indian classical rāgas, in Proceedings of the 20th International Society for Music Information Retrieval Conference (2019), pp. 35–40
P. Dighe, P. Agrawal, H. Karnick, S. Thota, B. Raj, Scale independent raga identification using chromagram patterns and swara based features, in Proceedings of International Conference on Multimedia and Expo Workshops (2013), pp. 1–4
A. Diment, P. Rajan, T. Heittola, T. Virtanen, Modified group delay feature for musical instrument recognition, in Proceedings of 10th International Symposium on Computer Music Multidisciplinary Research (CMMR) (2013)
K. Drossos, S.I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, Y. Ben-gio, M. Twinnet, Masker-denoiser architecture with twin networks form on aural sound source separation, in Proceedings of International Joint Conference on Neural Networks (2018), pp. 1–8
S. Dutta, H.A. Murthy, Rāga verification in Carnatic music using longest common segment set, in Proceedings of International Society for Music Information Retrieval Conference (2015), pp. 605–611
D. Ghosal, M.H. Kolekar, Music genre recognition using deep neural networks and transfer learning, in Proceedings of Interspeech (2018), pp. 2087–2091
K. Gopala, I. Koduri, S. Gulati, P. Rao, A survey of raaga recognition techniques and improvements to the state-of-the-art, in Proceedings of Sound and Music Computing, Padova, Italy (2011)
S. Gulati, J. Sera, V. Ishwar, S. Senturk, X. Serra, Phrase-based rāga recognition using vector space modeling, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2016), pp. 66–70
S. Gulati, J. Sera, S.K. Ganguli, S. Senturk, X. Serra, Time-delayed melody surfaces for rāga recognition, in Proceedings of the 17th International Society for Music Information Retrieval Conference (2016), pp. 751–757
R.M. Hegde et al., Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 190–202 (2007)
Article Google Scholar
H. Hoffmann, Violin plot. Retrieved Sept 18 (2022)
A. Krishnaswamy, Melodic atoms for transcribing Carnatic music, in Proceedings of International Society for Music Information Retrieval Conference (2004), pp. 345–348
A. Krishnaswamy, Multi-dimensional musical atoms in south Indian classical music, in Proceedings of the International Conference of Music Perception and Cognition (2004), pp. 1–4
A.S. Krishna, P. Rajkumar, K. Saishankar, M. John, Identification of Carnatic raagas using hidden Markov models, in Proceedings of International Symposium on Applied Machine Intelligence and Informatics (2011), pp. 107–110
G.K. Koduri, J.S. Vignesh Ishwar, X. Serra, Intonation analysis of ragas in Carnatic music. J. New Music Res. 43(1), 72–93 (2014)
Article Google Scholar
V. Kumar, H. Pandya, C.V. Jawahar, Identifying rāgas in Indian music, in Proceedings of 22nd International Conference on Pattern Recognition (2014), pp. 767–772
Y. Le Cun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
S.T. Madhusudhan, G. Chowdhary, DeepSRGM: sequence classification and ranking in Indian classical music via deep learning, in Proceedings of the 20th International Society for Music Information Retrieval Conference (2019), pp. 533–540
S.T. Madhusdhan, G. Chowdhary, Tonic independent raag classification in Indian classical music. https://openreview.net/pdf?id=HJz9K7kJcX (2018)
A.P. Manoj Kumar, et al., Musical onset detection on Carnatic percussion instruments, in 2015 Twenty First National Conference on Communications (NCC) (2015), pp. 1–6
H.A. Murthy, B. Yegnanarayana, Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)
Article Google Scholar
H.A. Murthy, Algorithms for processing Fourier transform phase of signals, Ph.D. dissertation, Indian Institute of Technology, Department of Computer Science and Engineering, Madras, India (1991)
H.A. Murthy, B. Yegnanarayana, Formant extraction from minimum phase group delay function. Speech Commun. 10, 209–221 (1991)
Article Google Scholar
H.A. Murthy, B. Yegnanarayana, Group delay functions and its applications in speech technology. Sadhana 36, 745–782 (2011). https://doi.org/10.1007/s12046-011-0045-1
Article Google Scholar
H.A. Murthy, Algorithms for processing Fourier transform phase of Signals, Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, July (1992)
A.V. Oppenheim, R.W. Schafer, Discrete Time Signal Processing (Prentice Hall Inc, Upper Saddle River, 1990)
MATH Google Scholar
S. Oramas, F. Barbieri, O. Nieto Caballero, X. Serra, Multimodal deep learning for music genre classification. Trans. Int. Soc. Music Inf. Retr. 5, 4–21 (2018)
Google Scholar
K.K. Paliwal, L.D. Alsteris, On the usefulness of STFT phase spectrum in human listening tests. Speech Commun. 45, 153–170 (2005)
Article Google Scholar
D.S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E.D. Cubuk, Q.V. Le, SpecAugment: a simple data augmentation method for automatic speech recognition, in Proceedings of Interspeech (2019)
L. Perez, J. Wang, The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)
V.K. Prasad, T. Nagarajan, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42, 429–446 (2004)
Article Google Scholar
R. Rajan, H. Murthy, Group delay based melody monopitch extraction from music, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (2013), pp. 186–190. https://doi.org/10.1109/ICASSP.2013.6637634
R. Rajan, Estimating pitch of speech and music using modified group delay functions, Indian Institute of Technology Madras, Ph.D. Dissertation, Department of Computer Science and Engg, IIT Madras, India (2017)
R. Rajan, H.A. Murthy, Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)
Article Google Scholar
R. Rajan, H.A. Murthy, Music genre classification by fusion of modified group delay and melodic features, in Twenty-Third National Conference on Communications (NCC), 2017 (2017), pp. 1–6. https://doi.org/10.1109/NCC.2017.8077056
J. Salamon, J.P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 52(1), 1–5 (2016)
Google Scholar
J. Sebastian, P.A. Manoj Kumar, H.A. Murthy, Pitch estimation from speech using grating compression transform on modified group-delay-gram, in Proceedings of Twenty-First National Conference on Communications (NCC) (2015), pp. 1–6. https://doi.org/10.1109/NCC.2015.7084899
M. Seeland, P. Mader, Multi-view classification with convolutional neural networks. PLoS ONE 16, e0245230 (2021)
Article Google Scholar
D.P. Shah, N.M. Jagtap, P.T. Talekar, K. Gawande, Rāga recognition in Indian classical music using deep learning, in Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021, ed. by J. Romero, T. Martins, N. Rodríguez Fernández. Lecture Notes in Computer Science, vol. 12693 (Springer, Cham, 2021). https://doi.org/10.1007/978-3-030-72914-1-17
S. Shetty, K. Achary, Rāga mining of Indian music by extracting arohana-avarohana pattern. Int. J. Recent Trends Eng. 1, 362 (2009)
Google Scholar
C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019)
Article Google Scholar
M.S. Sinith, S. Tripathi, K.V.V. Murthy, Rāga recognition using Fibonacci series based pitch distribution in Indian classical music. Appl. Acoust. 167, 1073–1081 (2020)
Article Google Scholar
O. Slizovskaia, E. Gomez, G. Haro, Musical instrument recognition in user-generated videos using a multimodal convolutional neural net-work architecture, in Proceedings of ACM on International Conference on Multimedia Retrieval (2017), pp. 226–232
R. Sridhar, T. Geetha, Rāga identification of Carnatic music for music information retrieval. Int. J. Recent Trends Eng. 11, 571–574 (2009)
Google Scholar
M. Sukhavasi, S. Adappa, Music theme recognition using CNN and self-attention, in Proceedings of Media Eval 2019 Workshop (2019), pp. 1–3
B. Yegnanarayana, H.A. Murthy, Significance of group delay functions in spectrum estimation. IEEE Trans. Signal Process. 40(9), 2281–2289 (1992)
Article MATH Google Scholar
B. Yegnanarayana, D.K. Saikia, T.R.M. Krishan, Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Acoust. Speech Signal Process. 32(3), 610–623 (1984)
Article Google Scholar
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Proceedings of Computer Vision, ECCV 2014 (Springer, 2014), pp. 818–833

Download references

Funding

The authors declare that the proposed work received no specific grant from any agency.

Author information

Rajeev Rajan and Sreejth Sivan have contributed equally to this work.

Authors and Affiliations

College of Engineering, Trivandrum, APJ Abdul Kalam Technological University, Trivandrum, India
Rajeev Rajan
Department of Electrical and Computer Engineering, University of Washington, Seattle, USA
Sreejth Sivan

Authors

Rajeev Rajan
View author publications
You can also search for this author in PubMed Google Scholar
Sreejth Sivan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajeev Rajan.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Modgdgram is the visual representation of modified group delay functions with time and frequency in the horizontal and vertical axis, respectively.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Rajan, R., Sivan, S. Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy. Circuits Syst Signal Process 42, 4072–4095 (2023). https://doi.org/10.1007/s00034-023-02301-w

Download citation

Received: 04 March 2022
Revised: 17 January 2023
Accepted: 17 January 2023
Published: 07 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00034-023-02301-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy

Abstract

Access this article

Similar content being viewed by others

Raga Recognition in Indian Classical Music Using Deep Learning

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Analytical Comparison of Classification Models for Raga Identification in Carnatic Classical Instrumental Polyphonic Audio

Data Availability Statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy

Abstract

Access this article

Similar content being viewed by others

Raga Recognition in Indian Classical Music Using Deep Learning

Deep Learning Approach to Joint Identification of Instrument Pitch and Raga for Indian Classical Music

Analytical Comparison of Classification Models for Raga Identification in Carnatic Classical Instrumental Polyphonic Audio

Data Availability Statement

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation