Skip to main content
Log in

Automatic note transcription system for Hindustani classical music

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In Hindustani classical music, notes and their different variations play an important role to arouse the aesthetic qualities of a rãga. Therefore, detection of notes is very much important to find out the different characteristics of a rãga, but the task is very much challenging due to presence of improvisations or ornamentations. In this work, melody contour is extracted from the music file using salience-based predominant melody extraction method. Initially, the notes were determined by optimizing the tolerance band and duration of notes. The output of initial note transcription system consists of notes, its duration, and their boundaries (onset and offset instants). For improving the accuracy of initial note transcription, the melody contour is divided into melodic segments, and categorized into four broad categories based on duration and transition characteristics of the initial transcribed notes. Different features and classification models have been explored for classifying the melodic segments into desired and undesired categories. Further, we further proposed two metrics to measure the performance of the proposed transcription system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio, Speech, and Language Processing, 21(3), 520–530.

    Article  Google Scholar 

  • Bagchee, S. (1998). NĀD: Understanding rāga music. Girgaon: Ceshwar, ISBN 81-86982-07-8.

  • Benetos, E., & Dixon, S. (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.

    Article  Google Scholar 

  • Cancela, P. (2008). Tracking melody in polyphonic audio. MIREX 2008. Proceedings of Music Information Retrieval Evaluation eXchange (MIREX).

  • De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.

    Article  Google Scholar 

  • Dighe, P., Karnick, H., & Raj, B. (2013). Swara histogram based structural analysis and identification of Indian classical ragas. In Proceedings of the 14th International Society of Music Information Retrieval Conference (ISMIR), Brazil (pp. 35–40). Curitiba: ISMIR.

  • Durrieu, J.-L., Richard, G., David, B., & Févotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 564–575.

    Article  Google Scholar 

  • Gong, R., Yang, Y., & Serra, X. (2016). Pitch contour segmentation for computer-aided jingju singing training. In Proceedings of the 13th Sound and Music Computing Conference, Germany (pp. 172–178). Hamburg: Hochschule fur Musik und Theater Hamburg.

  • Goto, M. (2004). A real-time music-scene-description system: Predominant-f0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4), 311–329.

    Article  MathSciNet  Google Scholar 

  • Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music melodies. In International Computer Music Conference (ICMC), Greece (pp. 1062–1068). Athens: ICMC.

  • Huang, P.-S., Chen, S. D., Smaragdis, P., & Hasegawa-Johnson, M. (2012). Singing-voice separation from monaural recordings using robust principal component analysis. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), Japan (pp. 57–60), Kyoto: ICASSP.

  • Koduri, G. K., Gulati, S., Rao, P., & Serra, X. (2012). Rāga recognition based on pitch distribution methods. Journal of New Music Research, 41(4), 337–350.

    Article  Google Scholar 

  • Mauch, M., Cannam, C., Bittner, R., Fazekas, G., Salamon, J., Dai, J., et al. (2015). Computer-aided melody note transcription using the Tony software: Accuracy and efficiency. In Proceedings of the First International Conference on Technologies for Music Notation and Representation, France (p. 8). Paris: TENOR.

  • Mauch, M., & Dixon, S. (2014). PYIN: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Italy (pp. 659–663). Florence: ICASSP.

  • Miryala, S. S., Bali, K., Bhagwan, R., & Choudhury, M. (2013). Automatically identifying vocal expressions for music transcription. In ISMIR, Brazil (pp. 239–244). Curitiba: ISMIR.

  • Mukherjee, H., Obaidullah, S. M., Santosh, K., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, pp. 1–8.

  • Obaidullah, S. M., Bose, A., Mukherjee, H., Santosh, K., Das, N., & Roy, K. (2018). Extreme learning machine for handwritten Indic script identification in multiscript documents. Journal of Electronic Imaging, 27(5), 051214.

    Article  Google Scholar 

  • Pandey, G., Mishra, C., & Ipe, P. (2003). Tansen: A system for automatic raga identification. In Indian International Conference on Artificial Intelligence (IICAI), India (pp. 1350–1363). Hyderabad: IICAI.

  • Poliner, G. E., Ellis, D. P., Ehmann, A. F., Gómez, E., Streich, S., & Ong, B. (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1247–1256.

    Article  Google Scholar 

  • Pratyush, (2010). Analysis and classification of ornaments in north Indian (Hindustani) classical music. Master’s thesis, Universitat Pompeu Fabra, Spain.

  • Rafii, Z., & Pardo, B. (2012). Music/voice separation using the similarity matrix. In International Society of Music Information Retrieval Conference (ISMIR), Portugal (pp. 583–588). Porto: ISMIR.

  • Rao, K. S., Saroj, V., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38(10), 13 181–13 185.

    Article  Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21(2), 282–295.

    Article  Google Scholar 

  • Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for indian languages. Computer Speech & Language, 23(2), 240–256.

    Article  Google Scholar 

  • Rao, P. (2012). Audio metadata extraction: The case for Hindustani classical music. In International Conference on Signal Processing and Communications (SPCOM), India (pp. 1–5). Bangalore: IEEE.

  • Rao, P., Ross, J. C., Ganguli, K. K., Pandit, V., Ishwar, V., Bellur, A., et al. (2014). Classification of melodic motifs in raga music with time-series matching. Journal of New Music Research, 43(1), 115–131.

    Article  Google Scholar 

  • Ryynänen, M. P., & Klapuri, A. P. (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.

    Article  Google Scholar 

  • Salamon, J., & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.

    Article  Google Scholar 

  • Samsekai Manjabhat, S., Koolagudi, S. G., Rao, K., & Ramteke, P. B. (2017). Raga and tonic identification in Carnatic music. Journal of New Music Research, 46(3), 229–245.

    Article  Google Scholar 

  • Shetty, S., & Achary, K. (2009). Raga mining of Indian music by extracting arohana-avarohana pattern. International Journal of Recent Trends in Engineering, 1(1), 362–366.

    Google Scholar 

  • Sjölander, K., & Beskow, J. (2000). Wavesurfer-an open source speech tool. In Proceedings of International Conference on Spoken Language Processing, China (pp. 464–467). Beijing, ICSLP.

  • Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), USA (pp. 425–428). Texas: ICASSP.

  • Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. In International Conference on Recent Trends in Image Processing and Pattern Recognition, India (pp. 185–193). Bidar: Springer.

  • Vidwans, A., Ganguli, K. K., & Rao, P. (2012). Classification of Indian classical vocal styles from melodic contours. In X. Serra, P. Rao, H. Murthy, B. Bozkurt (Eds.), Proceedings of the 2nd CompMusic Workshop, Istanbul, Turkey. Barcelona: Universitat Pompeu Fabra.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prasenjit Dhara.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dhara, P., Rao, K.S. Automatic note transcription system for Hindustani classical music. Int J Speech Technol 21, 987–1003 (2018). https://doi.org/10.1007/s10772-018-9554-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9554-1

Keywords

Navigation