Skip to main content
Log in

Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

A vital aspect of Indian classical music (ICM) is rāga, which serves as a melodic framework for compositions and improvisations for both traditions of classical music. In this work, we propose a CNN-based sliding window analysis on mel-spectrogram and modgdgram for rāga recognition in Carnatic music. The important contribution of the work is that the proposed method neither requires pitch extraction nor metadata for the estimation of rāga. CNN learns the representation of rāga from the patterns in the mel-spectrogram/modgdgram during training through a sliding-window analysis. We train and test the network on the sliced-mel-spectrogram/modgdgram of the original audio while the final inference is performed on the audio as a whole. The performance is evaluated on 15 rāgas from the CompMusic dataset. Two fusion paradigms, namely multi-channel and multi-modal frameworks, have been implemented to identify the potential of two feature representations. Out of the two approaches, multi-modal architecture reports a macro-F1 measure of 0.72, which is at par with the performance of the baseline sequence classification model. The performance is also compared with that of a transfer learning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability Statement

The dataset employed in the proposed work is available in the CompMusic repository,

Notes

  1. A rasika, in Carnatic music terminology, is a person who has some knowledge of Carnatic music and can appreciate Carnatic music.

  2. Twelve notes are not sufficient for an accurate or faithful representation of Carnatic music; the seasoned musician can easily identify many more than 12 musical entities in an octave, and they are collectively called “melodic atoms”.

  3. swara is a Sanskrit word that connotes a breath simultaneously, a vowel and the sound of a musical note corresponding to its name.

  4. Like a scale has numbers placed at different positions and we can measure the distance between them, the swara scale also has fixed positions of the 7 swaras as mentioned above.

  5. Arohana is the ascending scale of notes, and Avarohana is the descending scale of notes.

  6. Convolution of signals in the time domain is reflected as a summation in the group delay domain.

  7. The (anti) resonance peaks (due to complex conjugate (zero) pole pairs) of a signal are better resolved in the group delay domain than in the spectral domain.

References

  1. T. Asha, M.S. Saranya, D.S.K, Pandia, S. Madikeri, H.A. Murthy, Feature Switching in the i-vector framework for speaker verification, in Proceedings Interspeech (2014), pp. 1125–1129. https://doi.org/10.21437/Interspeech.2014-288

  2. P. Boersma, Praat, A system for doing phonetics by computer. Glot Int. 5(9/10), 341–345 (2001)

    Google Scholar 

  3. S. Belle, R. Joshi, P. Rao, Rāga identification using Swara intonation. J. ITC Sangeet Res. Acad. 23, 1–7 (2009)

    Google Scholar 

  4. C. Chen, Q. Li, A multimodal music emotion classification method based on multi-feature combined network classifier. Math. Probl. Eng. (2020). https://doi.org/10.1155/2020/4606027

    Article  Google Scholar 

  5. P. Chordia, A. Rae, Raaga recognition using pitch class and pitch class distributions, in Proceedings of International Society for Music Information Retrieval Conference, vol. 43(1) (2007), pp. 431–436

  6. S. Chowdhuri, Phononet “Multi-stage deep neural networks for rāga identification in Hindustani classical music”, in Proceedings of International Conference on Multimedia Retrieval (2019), pp. 197–201

  7. P. Dighe, H. Karnick, B. Raj, Swara histogram based structural analysis and identification of Indian classical rāgas, in Proceedings of the 20th International Society for Music Information Retrieval Conference (2019), pp. 35–40

  8. P. Dighe, P. Agrawal, H. Karnick, S. Thota, B. Raj, Scale independent raga identification using chromagram patterns and swara based features, in Proceedings of International Conference on Multimedia and Expo Workshops (2013), pp. 1–4

  9. A. Diment, P. Rajan, T. Heittola, T. Virtanen, Modified group delay feature for musical instrument recognition, in Proceedings of 10th International Symposium on Computer Music Multidisciplinary Research (CMMR) (2013)

  10. K. Drossos, S.I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, Y. Ben-gio, M. Twinnet, Masker-denoiser architecture with twin networks form on aural sound source separation, in Proceedings of International Joint Conference on Neural Networks (2018), pp. 1–8

  11. S. Dutta, H.A. Murthy, Rāga verification in Carnatic music using longest common segment set, in Proceedings of International Society for Music Information Retrieval Conference (2015), pp. 605–611

  12. D. Ghosal, M.H. Kolekar, Music genre recognition using deep neural networks and transfer learning, in Proceedings of Interspeech (2018), pp. 2087–2091

  13. K. Gopala, I. Koduri, S. Gulati, P. Rao, A survey of raaga recognition techniques and improvements to the state-of-the-art, in Proceedings of Sound and Music Computing, Padova, Italy (2011)

  14. S. Gulati, J. Sera, V. Ishwar, S. Senturk, X. Serra, Phrase-based rāga recognition using vector space modeling, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (2016), pp. 66–70

  15. S. Gulati, J. Sera, S.K. Ganguli, S. Senturk, X. Serra, Time-delayed melody surfaces for rāga recognition, in Proceedings of the 17th International Society for Music Information Retrieval Conference (2016), pp. 751–757

  16. R.M. Hegde et al., Significance of the modified group delay feature in speech recognition. IEEE Trans. Audio Speech Lang. Process. 15, 190–202 (2007)

    Article  Google Scholar 

  17. H. Hoffmann, Violin plot. Retrieved Sept 18 (2022)

  18. A. Krishnaswamy, Melodic atoms for transcribing Carnatic music, in Proceedings of International Society for Music Information Retrieval Conference (2004), pp. 345–348

  19. A. Krishnaswamy, Multi-dimensional musical atoms in south Indian classical music, in Proceedings of the International Conference of Music Perception and Cognition (2004), pp. 1–4

  20. A.S. Krishna, P. Rajkumar, K. Saishankar, M. John, Identification of Carnatic raagas using hidden Markov models, in Proceedings of International Symposium on Applied Machine Intelligence and Informatics (2011), pp. 107–110

  21. G.K. Koduri, J.S. Vignesh Ishwar, X. Serra, Intonation analysis of ragas in Carnatic music. J. New Music Res. 43(1), 72–93 (2014)

    Article  Google Scholar 

  22. V. Kumar, H. Pandya, C.V. Jawahar, Identifying rāgas in Indian music, in Proceedings of 22nd International Conference on Pattern Recognition (2014), pp. 767–772

  23. Y. Le Cun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  24. S.T. Madhusudhan, G. Chowdhary, DeepSRGM: sequence classification and ranking in Indian classical music via deep learning, in Proceedings of the 20th International Society for Music Information Retrieval Conference (2019), pp. 533–540

  25. S.T. Madhusdhan, G. Chowdhary, Tonic independent raag classification in Indian classical music. https://openreview.net/pdf?id=HJz9K7kJcX (2018)

  26. A.P. Manoj Kumar, et al., Musical onset detection on Carnatic percussion instruments, in 2015 Twenty First National Conference on Communications (NCC) (2015), pp. 1–6

  27. H.A. Murthy, B. Yegnanarayana, Group delay functions and its application to speech processing. Sadhana 36(5), 745–782 (2011)

    Article  Google Scholar 

  28. H.A. Murthy, Algorithms for processing Fourier transform phase of signals, Ph.D. dissertation, Indian Institute of Technology, Department of Computer Science and Engineering, Madras, India (1991)

  29. H.A. Murthy, B. Yegnanarayana, Formant extraction from minimum phase group delay function. Speech Commun. 10, 209–221 (1991)

    Article  Google Scholar 

  30. H.A. Murthy, B. Yegnanarayana, Group delay functions and its applications in speech technology. Sadhana 36, 745–782 (2011). https://doi.org/10.1007/s12046-011-0045-1

    Article  Google Scholar 

  31. H.A. Murthy, Algorithms for processing Fourier transform phase of Signals, Ph.D. thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, July (1992)

  32. A.V. Oppenheim, R.W. Schafer, Discrete Time Signal Processing (Prentice Hall Inc, Upper Saddle River, 1990)

    MATH  Google Scholar 

  33. S. Oramas, F. Barbieri, O. Nieto Caballero, X. Serra, Multimodal deep learning for music genre classification. Trans. Int. Soc. Music Inf. Retr. 5, 4–21 (2018)

    Google Scholar 

  34. K.K. Paliwal, L.D. Alsteris, On the usefulness of STFT phase spectrum in human listening tests. Speech Commun. 45, 153–170 (2005)

    Article  Google Scholar 

  35. D.S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E.D. Cubuk, Q.V. Le, SpecAugment: a simple data augmentation method for automatic speech recognition, in Proceedings of Interspeech (2019)

  36. L. Perez, J. Wang, The effectiveness of data augmentation in image classification using deep learning. arXiv:1712.04621 (2017)

  37. V.K. Prasad, T. Nagarajan, H.A. Murthy, Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Commun. 42, 429–446 (2004)

    Article  Google Scholar 

  38. R. Rajan, H. Murthy, Group delay based melody monopitch extraction from music, in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (2013), pp. 186–190. https://doi.org/10.1109/ICASSP.2013.6637634

  39. R. Rajan, Estimating pitch of speech and music using modified group delay functions, Indian Institute of Technology Madras, Ph.D. Dissertation, Department of Computer Science and Engg, IIT Madras, India (2017)

  40. R. Rajan, H.A. Murthy, Two-pitch tracking in co-channel speech using modified group delay functions. Speech Commun. 89, 37–46 (2017)

    Article  Google Scholar 

  41. R. Rajan, H.A. Murthy, Music genre classification by fusion of modified group delay and melodic features, in Twenty-Third National Conference on Communications (NCC), 2017 (2017), pp. 1–6. https://doi.org/10.1109/NCC.2017.8077056

  42. J. Salamon, J.P. Bello, Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 52(1), 1–5 (2016)

    Google Scholar 

  43. J. Sebastian, P.A. Manoj Kumar, H.A. Murthy, Pitch estimation from speech using grating compression transform on modified group-delay-gram, in Proceedings of Twenty-First National Conference on Communications (NCC) (2015), pp. 1–6. https://doi.org/10.1109/NCC.2015.7084899

  44. M. Seeland, P. Mader, Multi-view classification with convolutional neural networks. PLoS ONE 16, e0245230 (2021)

    Article  Google Scholar 

  45. D.P. Shah, N.M. Jagtap, P.T. Talekar, K. Gawande, Rāga recognition in Indian classical music using deep learning, in Artificial Intelligence in Music, Sound, Art and Design. EvoMUSART 2021, ed. by J. Romero, T. Martins, N. Rodríguez Fernández. Lecture Notes in Computer Science, vol. 12693 (Springer, Cham, 2021). https://doi.org/10.1007/978-3-030-72914-1-17

  46. S. Shetty, K. Achary, Rāga mining of Indian music by extracting arohana-avarohana pattern. Int. J. Recent Trends Eng. 1, 362 (2009)

    Google Scholar 

  47. C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep learning. J. Big Data 6, 1–48 (2019)

    Article  Google Scholar 

  48. M.S. Sinith, S. Tripathi, K.V.V. Murthy, Rāga recognition using Fibonacci series based pitch distribution in Indian classical music. Appl. Acoust. 167, 1073–1081 (2020)

    Article  Google Scholar 

  49. O. Slizovskaia, E. Gomez, G. Haro, Musical instrument recognition in user-generated videos using a multimodal convolutional neural net-work architecture, in Proceedings of ACM on International Conference on Multimedia Retrieval (2017), pp. 226–232

  50. R. Sridhar, T. Geetha, Rāga identification of Carnatic music for music information retrieval. Int. J. Recent Trends Eng. 11, 571–574 (2009)

    Google Scholar 

  51. M. Sukhavasi, S. Adappa, Music theme recognition using CNN and self-attention, in Proceedings of Media Eval 2019 Workshop (2019), pp. 1–3

  52. B. Yegnanarayana, H.A. Murthy, Significance of group delay functions in spectrum estimation. IEEE Trans. Signal Process. 40(9), 2281–2289 (1992)

    Article  MATH  Google Scholar 

  53. B. Yegnanarayana, D.K. Saikia, T.R.M. Krishan, Significance of group delay functions in signal reconstruction from spectral magnitude or phase. IEEE Trans. Acoust. Speech Signal Process. 32(3), 610–623 (1984)

    Article  Google Scholar 

  54. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Proceedings of Computer Vision, ECCV 2014 (Springer, 2014), pp. 818–833

Download references

Funding

The authors declare that the proposed work received no specific grant from any agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajeev Rajan.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Modgdgram is the visual representation of modified group delay functions with time and frequency in the horizontal and vertical axis, respectively.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajan, R., Sivan, S. Multi-channel CNN-Based Rāga Recognition in Carnatic Music Using Sequential Aggregation Strategy. Circuits Syst Signal Process 42, 4072–4095 (2023). https://doi.org/10.1007/s00034-023-02301-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02301-w

Keywords

Navigation