Predominant Melody Extraction from Vocal Polyphonic Music Signal by Time-Domain Adaptive Filtering-Based Method
- 91 Downloads
Abstract
In this paper, a time-domain adaptive filtering-based melody extraction method is proposed. The proposed method works in multiple stages to extract the vocal melody (singer’s fundamental frequency) from vocal polyphonic music signals. The vocal and non-vocal regions of the music signal are identified by the strength of excitation of the source signal. The vocal regions are further segmented into the sequence of notes by detecting their onsets in the frequency representation of the composite signal. The melody contour in each of the vocal note segment is obtained by adaptive zero-frequency filtering in the time domain. The performance of the proposed melody extraction method is compared with the current state-of-the-art melody extraction method in respect of voicing recall rate, voicing false alarm rate, raw pitch, and overall accuracy.
Keywords
Polyphonic Vocals Music signal Melody Zero-frequency filter Onsets Strength of excitation Note boundaries Melody contourNotes
Acknowledgements
The present work is carried out under the project entitled “Scientific Approach to Networking and Designing of Heritage Interfaces (SANDHI)” sponsored by Ministry of Human Resource Development (MHRD), Govt. of India. Project reference IIT/SRIC/R/ITA/2014/40, dated March 24, 2014. We would like to thank Google (Google PhD Fellowship) and Department of Information Technology (DIT), Govt. of India for financial support. We would also like to thank Prof. Pallab Das Gupta (Dept. of Computer Science and Engineering, IIT Kharagpur), Prof. Priyadarshi Patnaik (Dept. of Humanities, IIT Kharagpur), and Ms. Gowri (Professional Hindustani music vocalist) for providing us the more theoretical insight into the Hindustani Music.
References
- 1.V. Arora, L. Behera, On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Trans. Audio Speech Lang. Process. 21(3), 520–530 (2013)CrossRefGoogle Scholar
- 2.J.P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, M.B. Sandler, A tutorial on onset detection in music signals. IEEE Trans. Audio Speech Lang. Process. 13(5), 1035–1047 (2005)CrossRefGoogle Scholar
- 3.S. Böck, F. Krebs, M. Schedl, Evaluating the online capabilities of onset detection methods, in ISMIR, pp. 49–54 (2012)Google Scholar
- 4.J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)CrossRefGoogle Scholar
- 5.P. Cancela, Tracking melody in polyphonic audio. mirex 2008, in Proceedings of Music Information Retrieval Evaluation eXchange (2008)Google Scholar
- 6.S. Dixon, Onset detection revisited, in Proceedings of the International Confernce on Digital Audio Effects (DAFx-06), pp. 133–137 (2006)Google Scholar
- 7.K. Dressler, Sinusoidal extraction using an efficient implementation of a multi-resolution FFT, in Proceedings of 9th International Conference on Digital Audio Effects (DAFx), pp. 247–252 (2006)Google Scholar
- 8.J.L. Durrieu, G. Richard, B. David, C. Févotte, Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE Trans. Audio Speech Lang. Process. 18(3), 564–575 (2010)CrossRefGoogle Scholar
- 9.C. Duxbury, M. Sandler, M. Davies, A hybrid approach to musical note onset detection, in Proceedings of Digital Audio Effects Conference (DAFX) pp. 33–38 (2002)Google Scholar
- 10.J. Eggink, G.J. Brown, Extracting Melody Lines From Complex Audio, ISMIR (2004)Google Scholar
- 11.M. Goto, A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals. Speech Commun. 43(4), 311–329 (2004)MathSciNetCrossRefGoogle Scholar
- 12.D.W. Griffin, J.S. Lim, Multiband excitation vocoder. IEEE Trans. Acoust. Speech Signal Process. 36(8), 1223–1235 (1988)CrossRefMATHGoogle Scholar
- 13.C.-L. Hsu, J.-S. R. Jang, Singing Pitch Extraction by Voice Vibrato/Tremolo Estimation and Instrument Partial Deletion. ISMIR, pp. 525–530 (2010)Google Scholar
- 14.P.S. Huang, S.D. Chen, P. Smaragdis, H.-J. Mark, Singing-voice separation from monaural recordings using robust principal component analysis, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 57–60 (2012)Google Scholar
- 15.S. Jo, S. Joo, C.D. Yoo, Melody pitch estimation based on range estimation and candidate extraction using harmonic structure model. INTERSPEECH, pp. 2902–2905 (2010)Google Scholar
- 16.P. Leveau, L. Daudet, Methodology and tools for the evaluation of automatic onset detection algorithms in music, in Proceeding International Symposium on Music Information Retrieval (2004)Google Scholar
- 17.A. Liutkus, Z. Rafii, R. Badeau, B. Pardo, G. Richard, Adaptive filtering for music/voice separation exploiting the repeating musical structure, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 53–56 (2012)Google Scholar
- 18.H. Madden, Comments on smoothing and differentiation of data by simplified least square procedure. Anal. Chem. 50(9), 1383–86 (1978)CrossRefGoogle Scholar
- 19.R.C. Maher, J.W. Beauchamp, Fundamental frequency estimation of musical signals using a two-way mismatch procedure. J. Acoust. Soc. Am. 95(4), 2254–2263 (1994)CrossRefGoogle Scholar
- 20.B.C.J. Moore, An Introduction to the Psychology of Hearing (Brill, Leiden, 2012)Google Scholar
- 21.K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
- 22.N. Ono, K. Miyamoto, H. Kameoka, J. Le Roux, Y. Uchiyama, E. Tsunoo, T. Nishimoto, S. Sagayama, Harmonic and percussive sound separation and its application to MIR-related tasks, in Advances in music information retrieval (Springer, 2010), pp. 213–236Google Scholar
- 23.R.P. Paiva, T. Mendes, A. Cardoso, Melody detection in polyphonic musical signals: exploiting perceptual rules, note salience, and melodic smoothness. Comput. Music J. 30(4), 80–98 (2006)CrossRefGoogle Scholar
- 24.G.E. Poliner, D.P.W. Ellis, A.F. Ehmann, E. Gómez, S. Streich, B. Ong, Melody transcription from music audio: approaches and evaluation. IEEE Trans. Audio Speech Lang. Process. 15(4), 1247–1256 (2007)CrossRefGoogle Scholar
- 25.Z. Rafii, B. Pardo, Repeating pattern extraction technique (REPET): a simple method for music/voice separation. IEEE Trans. Audio Speech Lang. Process. 21(1), 73–84 (2013)CrossRefGoogle Scholar
- 26.V. Rao, P. Rao, Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Trans. Audio Speech Lang. Process. 18(8), 2145–2154 (2010)CrossRefGoogle Scholar
- 27.M.G. Reddy, K. Sreenivasa, Predominant melody extraction from vocal polyphonic music signal by combined spectro-temporal method, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 455–459 (2016)Google Scholar
- 28.G. Reddy, K.S. Rao, Enhanced harmonic content and vocal note based predominant melody extraction from vocal polyphonic music signals, in INTERSPEECH, pp. 3309–3313 (2016)Google Scholar
- 29.G. Reddy, K.S. Rao, Predominant vocal melody extraction from enhanced partial harmonic content, in 25th European Signal Processing Conference (EUSIPCO), pp. 1016–1020 (2017)Google Scholar
- 30.D.W. Robinson, R.S. Dadson, A re-determination of the equal-loudness relations for pure tones. Br. J. Appl. Phys. 7(5), 166 (1956)CrossRefGoogle Scholar
- 31.M.P. Ryynänen, A.P. Klapuri, Automatic transcription of melody, bass line, and chords in polyphonic music. Comput. Music J. 32(3), 72–86 (2008)CrossRefGoogle Scholar
- 32.J. Salamon, E. Gómez, Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Trans. Audio Speech Lang. Process. 20(6), 1759–1770 (2012)CrossRefGoogle Scholar
- 33.J. Salamon, E. Gomez, D.P.W. Ellis, G. Richard, Melody extraction from polyphonic music signals: approaches, applications, and challenges. IEEE Signal Process. Mag. 31(2), 118–134 (2014)CrossRefGoogle Scholar
- 34.J. Salamon, Melody extraction from polyphonic music signals. Ph. D. thesis, Department of Information and Communication Technologies Universitat Pompeu Fabra, Barcelona, Spain (2013)Google Scholar
- 35.E.D. Scheirer, Machine-listening systems. Unpublished Ph.D. Thesis, Massachusetts Institute of Technology (2000)Google Scholar
- 36.B. Scherrer, P. Depalle, Onset time estimation for the analysis of percussive sounds using exponentially damped sinusoids, in Proceedings of the 17th International Conference on Digital Audio Effects (DAFx), pp. 211–217 (2014)Google Scholar
- 37.J. Sundberg, T.D. Rossing, The science of singing voice. J. Acoust. Soc. Am. 87(1), 462–463 (1990)CrossRefGoogle Scholar
- 38.H. Tachibana, T. Ono, N. Ono, S. Sagayama, Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source, in Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 425–428 (2010)Google Scholar
- 39.T.-C. Yeh, M.-J. Wu, J.-S.R. Jang, W.-L. Chang, I.-B. Liao, A hybrid approach to singing pitch extraction based on trend estimation and hidden Markov models, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 457–460 (2012)Google Scholar