Circuits, Systems, and Signal Processing

, Volume 37, Issue 7, pp 2911–2933 | Cite as

Predominant Melody Extraction from Vocal Polyphonic Music Signal by Time-Domain Adaptive Filtering-Based Method

  • M. Gurunath Reddy
  • K. Sreenivasa Rao


In this paper, a time-domain adaptive filtering-based melody extraction method is proposed. The proposed method works in multiple stages to extract the vocal melody (singer’s fundamental frequency) from vocal polyphonic music signals. The vocal and non-vocal regions of the music signal are identified by the strength of excitation of the source signal. The vocal regions are further segmented into the sequence of notes by detecting their onsets in the frequency representation of the composite signal. The melody contour in each of the vocal note segment is obtained by adaptive zero-frequency filtering in the time domain. The performance of the proposed melody extraction method is compared with the current state-of-the-art melody extraction method in respect of voicing recall rate, voicing false alarm rate, raw pitch, and overall accuracy.


Polyphonic Vocals Music signal Melody Zero-frequency filter Onsets Strength of excitation Note boundaries Melody contour 



The present work is carried out under the project entitled “Scientific Approach to Networking and Designing of Heritage Interfaces (SANDHI)” sponsored by Ministry of Human Resource Development (MHRD), Govt. of India. Project reference IIT/SRIC/R/ITA/2014/40, dated March 24, 2014. We would like to thank Google (Google PhD Fellowship) and Department of Information Technology (DIT), Govt. of India for financial support. We would also like to thank Prof. Pallab Das Gupta (Dept. of Computer Science and Engineering, IIT Kharagpur), Prof. Priyadarshi Patnaik (Dept. of Humanities, IIT Kharagpur), and Ms. Gowri (Professional Hindustani music vocalist) for providing us the more theoretical insight into the Hindustani Music.


  1. 1.
    V. Arora, L. Behera, On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Trans. Audio Speech Lang. Process. 21(3), 520–530 (2013)CrossRefGoogle Scholar
  2. 2.
    J.P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M.B. Sandler, A tutorial on onset detection in music signals. IEEE Trans. Audio Speech Lang. Process. 13(5), 1035–1047 (2005)CrossRefGoogle Scholar
  3. 3.
    S. Böck, F. Krebs, M. Schedl, Evaluating the online capabilities of onset detection methods, in ISMIR, pp. 49–54 (2012)Google Scholar
  4. 4.
    J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRefGoogle Scholar
  5. 5.
    P. Cancela, Tracking melody in polyphonic audio. mirex 2008, in Proceedings of Music Information Retrieval Evaluation eXchange (2008)Google Scholar
  6. 6.
    S. Dixon, Onset detection revisited, in Proceedings of the International Confernce on Digital Audio Effects (DAFx-06), pp. 133–137 (2006)Google Scholar
  7. 7.
    K. Dressler, Sinusoidal extraction using an efficient implementation of a multi-resolution FFT, in Proceedings of 9th International Conference on Digital Audio Effects (DAFx), pp. 247–252 (2006)Google Scholar
  8. 8.
    J.L. Durrieu, G. Richard, B. David, C. Févotte, Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. Audio Speech Lang. Process. 18(3), 564–575 (2010)CrossRefGoogle Scholar
  9. 9.
    C. Duxbury, M. Sandler, M. Davies, A hybrid approach to musical note onset detection, in Proceedings of Digital Audio Effects Conference (DAFX) pp. 33–38 (2002)Google Scholar
  10. 10.
    J. Eggink, G.J. Brown, Extracting Melody Lines From Complex Audio, ISMIR (2004)Google Scholar
  11. 11.
    M. Goto, A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Commun. 43(4), 311–329 (2004)MathSciNetCrossRefGoogle Scholar
  12. 12.
    D.W. Griffin, J.S. Lim, Multiband excitation vocoder. IEEE Trans. Acoust. Speech Signal Process. 36(8), 1223–1235 (1988)CrossRefMATHGoogle Scholar
  13. 13.
    C.-L. Hsu, J.-S. R. Jang, Singing Pitch Extraction by Voice Vibrato/Tremolo Estimation and Instrument Partial Deletion. ISMIR, pp. 525–530 (2010)Google Scholar
  14. 14.
    P.S. Huang, S.D. Chen, P. Smaragdis, H.-J. Mark, Singing-voice separation from monaural recordings using robust principal component analysis, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57–60 (2012)Google Scholar
  15. 15.
    S. Jo, S. Joo, C.D. Yoo, Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model. INTERSPEECH, pp. 2902–2905 (2010)Google Scholar
  16. 16.
    P. Leveau, L. Daudet, Methodology and tools for the evaluation of automatic onset detection algorithms in music, in Proceeding International Symposium on Music Information Retrieval (2004)Google Scholar
  17. 17.
    A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard, Adaptive filtering for music/voice separation exploiting the repeating musical structure, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 53–56 (2012)Google Scholar
  18. 18.
    H. Madden, Comments on smoothing and differentiation of data by simplified least square procedure. Anal. Chem. 50(9), 1383–86 (1978)CrossRefGoogle Scholar
  19. 19.
    R.C. Maher, J.W. Beauchamp, Fundamental frequency estimation of musical signals using a two-way mismatch procedure. J. Acoust. Soc. Am. 95(4), 2254–2263 (1994)CrossRefGoogle Scholar
  20. 20.
    B.C.J. Moore, An Introduction to the Psychology of Hearing (Brill, Leiden, 2012)Google Scholar
  21. 21.
    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  22. 22.
    N. Ono, K. Miyamoto, H. Kameoka, J. Le Roux, Y. Uchiyama, E. Tsunoo, T. Nishimoto, S. Sagayama, Harmonic and percussive sound separation and its application to MIR-related tasks, in Advances in music information retrieval (Springer, 2010), pp. 213–236Google Scholar
  23. 23.
    R.P. Paiva, T. Mendes, A. Cardoso, Melody detection in polyphonic musical signals: exploiting perceptual rules, note salience, and melodic smoothness. Comput. Music J. 30(4), 80–98 (2006)CrossRefGoogle Scholar
  24. 24.
    G.E. Poliner, D.P.W. Ellis, A.F. Ehmann, E. Gómez, S. Streich, B. Ong, Melody transcription from music audio: approaches and evaluation. IEEE Trans. Audio Speech Lang. Process. 15(4), 1247–1256 (2007)CrossRefGoogle Scholar
  25. 25.
    Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)CrossRefGoogle Scholar
  26. 26.
    V. Rao, P. Rao, Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Trans. Audio Speech Lang. Process. 18(8), 2145–2154 (2010)CrossRefGoogle Scholar
  27. 27.
    M.G. Reddy, K. Sreenivasa, Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 455–459 (2016)Google Scholar
  28. 28.
    G. Reddy, K.S. Rao, Enhanced harmonic content and vocal note based predominant melody extraction from vocal polyphonic music signals, in INTERSPEECH, pp. 3309–3313 (2016)Google Scholar
  29. 29.
    G. Reddy, K.S. Rao, Predominant vocal melody extraction from enhanced partial harmonic content, in 25th European Signal Processing Conference (EUSIPCO), pp. 1016–1020 (2017)Google Scholar
  30. 30.
    D.W. Robinson, R.S. Dadson, A re-determination of the equal-loudness relations for pure tones. Br. J. Appl. Phys. 7(5), 166 (1956)CrossRefGoogle Scholar
  31. 31.
    M.P. Ryynänen, A.P. Klapuri, Automatic transcription of melody, bass line, and chords in polyphonic music. Comput. Music J. 32(3), 72–86 (2008)CrossRefGoogle Scholar
  32. 32.
    J. Salamon, E. Gómez, Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Trans. Audio Speech Lang. Process. 20(6), 1759–1770 (2012)CrossRefGoogle Scholar
  33. 33.
    J. Salamon, E. Gomez, D.P.W. Ellis, G. Richard, Melody extraction from polyphonic music signals: approaches, applications, and challenges. IEEE Signal Process. Mag. 31(2), 118–134 (2014)CrossRefGoogle Scholar
  34. 34.
    J. Salamon, Melody extraction from polyphonic music signals. Ph. D. thesis, Department of Information and Communication Technologies Universitat Pompeu Fabra, Barcelona, Spain (2013)Google Scholar
  35. 35.
    E.D. Scheirer, Machine-listening systems. Unpublished Ph.D. Thesis, Massachusetts Institute of Technology (2000)Google Scholar
  36. 36.
    B. Scherrer, P. Depalle, Onset time estimation for the analysis of percussive sounds using exponentially damped sinusoids, in Proceedings of the 17th International Conference on Digital Audio Effects (DAFx), pp. 211–217 (2014)Google Scholar
  37. 37.
    J. Sundberg, T.D. Rossing, The science of singing voice. J. Acoust. Soc. Am. 87(1), 462–463 (1990)CrossRefGoogle Scholar
  38. 38.
    H. Tachibana, T. Ono, N. Ono, S. Sagayama, Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source, in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 425–428 (2010)Google Scholar
  39. 39.
    T.-C. Yeh, M.-J. Wu, J.-S.R. Jang, W.-L. Chang, I.-B. Liao, A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 457–460 (2012)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Indian Institute of Technology KharagpurKharagpurIndia

Personalised recommendations