International Journal of Speech Technology

, Volume 21, Issue 4, pp 987–1003 | Cite as

Automatic note transcription system for Hindustani classical music

  • Prasenjit DharaEmail author
  • K. Sreenivasa Rao


In Hindustani classical music, notes and their different variations play an important role to arouse the aesthetic qualities of a rãga. Therefore, detection of notes is very much important to find out the different characteristics of a rãga, but the task is very much challenging due to presence of improvisations or ornamentations. In this work, melody contour is extracted from the music file using salience-based predominant melody extraction method. Initially, the notes were determined by optimizing the tolerance band and duration of notes. The output of initial note transcription system consists of notes, its duration, and their boundaries (onset and offset instants). For improving the accuracy of initial note transcription, the melody contour is divided into melodic segments, and categorized into four broad categories based on duration and transition characteristics of the initial transcribed notes. Different features and classification models have been explored for classifying the melodic segments into desired and undesired categories. Further, we further proposed two metrics to measure the performance of the proposed transcription system.


Hindustani classical music Heterophonic music Note transcription Ornamentation Melody contour Melody segmentation 


  1. Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio, Speech, and Language Processing, 21(3), 520–530.Google Scholar
  2. Bagchee, S. (1998). NĀD: Understanding rāga music. Girgaon: Ceshwar, ISBN 81-86982-07-8.Google Scholar
  3. Benetos, E., & Dixon, S. (2012). A shift-invariant latent variable model for automatic music transcription. Computer Music Journal, 36(4), 81–94.Google Scholar
  4. Cancela, P. (2008). Tracking melody in polyphonic audio. MIREX 2008. Proceedings of Music Information Retrieval Evaluation eXchange (MIREX).Google Scholar
  5. De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4), 1917–1930.Google Scholar
  6. Dighe, P., Karnick, H., & Raj, B. (2013). Swara histogram based structural analysis and identification of Indian classical ragas. In Proceedings of the 14th International Society of Music Information Retrieval Conference (ISMIR), Brazil (pp. 35–40). Curitiba: ISMIR.Google Scholar
  7. Durrieu, J.-L., Richard, G., David, B., & Févotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Transactions on Audio, Speech, and Language Processing, 18(3), 564–575.Google Scholar
  8. Gong, R., Yang, Y., & Serra, X. (2016). Pitch contour segmentation for computer-aided jingju singing training. In Proceedings of the 13th Sound and Music Computing Conference, Germany (pp. 172–178). Hamburg: Hochschule fur Musik und Theater Hamburg.Google Scholar
  9. Goto, M. (2004). A real-time music-scene-description system: Predominant-f0 estimation for detecting melody and bass lines in real-world audio signals. Speech Communication, 43(4), 311–329.MathSciNetGoogle Scholar
  10. Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music melodies. In International Computer Music Conference (ICMC), Greece (pp. 1062–1068). Athens: ICMC.Google Scholar
  11. Huang, P.-S., Chen, S. D., Smaragdis, P., & Hasegawa-Johnson, M. (2012). Singing-voice separation from monaural recordings using robust principal component analysis. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), Japan (pp. 57–60), Kyoto: ICASSP.Google Scholar
  12. Koduri, G. K., Gulati, S., Rao, P., & Serra, X. (2012). Rāga recognition based on pitch distribution methods. Journal of New Music Research, 41(4), 337–350.Google Scholar
  13. Mauch, M., Cannam, C., Bittner, R., Fazekas, G., Salamon, J., Dai, J., et al. (2015). Computer-aided melody note transcription using the Tony software: Accuracy and efficiency. In Proceedings of the First International Conference on Technologies for Music Notation and Representation, France (p. 8). Paris: TENOR.Google Scholar
  14. Mauch, M., & Dixon, S. (2014). PYIN: A fundamental frequency estimator using probabilistic threshold distributions. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Italy (pp. 659–663). Florence: ICASSP.Google Scholar
  15. Miryala, S. S., Bali, K., Bhagwan, R., & Choudhury, M. (2013). Automatically identifying vocal expressions for music transcription. In ISMIR, Brazil (pp. 239–244). Curitiba: ISMIR.Google Scholar
  16. Mukherjee, H., Obaidullah, S. M., Santosh, K., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, pp. 1–8.Google Scholar
  17. Obaidullah, S. M., Bose, A., Mukherjee, H., Santosh, K., Das, N., & Roy, K. (2018). Extreme learning machine for handwritten Indic script identification in multiscript documents. Journal of Electronic Imaging, 27(5), 051214.Google Scholar
  18. Pandey, G., Mishra, C., & Ipe, P. (2003). Tansen: A system for automatic raga identification. In Indian International Conference on Artificial Intelligence (IICAI), India (pp. 1350–1363). Hyderabad: IICAI.Google Scholar
  19. Poliner, G. E., Ellis, D. P., Ehmann, A. F., Gómez, E., Streich, S., & Ong, B. (2007). Melody transcription from music audio: Approaches and evaluation. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1247–1256.Google Scholar
  20. Pratyush, (2010). Analysis and classification of ornaments in north Indian (Hindustani) classical music. Master’s thesis, Universitat Pompeu Fabra, Spain.Google Scholar
  21. Rafii, Z., & Pardo, B. (2012). Music/voice separation using the similarity matrix. In International Society of Music Information Retrieval Conference (ISMIR), Portugal (pp. 583–588). Porto: ISMIR.Google Scholar
  22. Rao, K. S., Saroj, V., Maity, S., & Koolagudi, S. G. (2011). Recognition of emotions from video using neural network models. Expert Systems with Applications, 38(10), 13 181–13 185.Google Scholar
  23. Rao, K. S., & Yegnanarayana, B. (2007). Modeling durations of syllables using neural networks. Computer Speech & Language, 21(2), 282–295.Google Scholar
  24. Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for indian languages. Computer Speech & Language, 23(2), 240–256.Google Scholar
  25. Rao, P. (2012). Audio metadata extraction: The case for Hindustani classical music. In International Conference on Signal Processing and Communications (SPCOM), India (pp. 1–5). Bangalore: IEEE.Google Scholar
  26. Rao, P., Ross, J. C., Ganguli, K. K., Pandit, V., Ishwar, V., Bellur, A., et al. (2014). Classification of melodic motifs in raga music with time-series matching. Journal of New Music Research, 43(1), 115–131.Google Scholar
  27. Ryynänen, M. P., & Klapuri, A. P. (2008). Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.Google Scholar
  28. Salamon, J., & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.Google Scholar
  29. Samsekai Manjabhat, S., Koolagudi, S. G., Rao, K., & Ramteke, P. B. (2017). Raga and tonic identification in Carnatic music. Journal of New Music Research, 46(3), 229–245.Google Scholar
  30. Shetty, S., & Achary, K. (2009). Raga mining of Indian music by extracting arohana-avarohana pattern. International Journal of Recent Trends in Engineering, 1(1), 362–366.Google Scholar
  31. Sjölander, K., & Beskow, J. (2000). Wavesurfer-an open source speech tool. In Proceedings of International Conference on Spoken Language Processing, China (pp. 464–467). Beijing, ICSLP.Google Scholar
  32. Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), USA (pp. 425–428). Texas: ICASSP.Google Scholar
  33. Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. In International Conference on Recent Trends in Image Processing and Pattern Recognition, India (pp. 185–193). Bidar: Springer.Google Scholar
  34. Vidwans, A., Ganguli, K. K., & Rao, P. (2012). Classification of Indian classical vocal styles from melodic contours. In X. Serra, P. Rao, H. Murthy, B. Bozkurt (Eds.), Proceedings of the 2nd CompMusic Workshop, Istanbul, Turkey. Barcelona: Universitat Pompeu Fabra.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations