International Journal of Speech Technology

, Volume 20, Issue 1, pp 185–204 | Cite as

Melody extraction from music using modified group delay functions

Article

Abstract

Modified group delay based algorithms for estimation of melodic pitch sequences from heterphonic/polyphonic music are discussed in this paper. Two different variants of the modified group delay function are proposed, namely, (a) system based—MODGD (Direct) and (b) source based—MODGD (Source). In (a) the standard modified group delay function (MODGDF) is used to estimate prominent melodic pitch (\(f_0\)), which appears like a low frequency formant in the MODGDF spectrum. In (b), the power spectrum of the signal is first flattened to emphasise the source. The flattened power spectrum behaves like a sinusoid in noise, the frequency of the sinusoid being related to the pitch frequency. The modified group delay function of this signal produces peaks at \(T_0\), \(2T_0, \ldots ,\) where \(T_0=\frac{1}{f_0}\). Continuity constraints in a dynamic programming framework are imposed across frames to reduce octave errors. Sudden changes in pitch are accommodated by changing the frame size dynamically using a multi-resolution framework. The performance of the proposed systems was evaluated on four datasets: ADC-2004, LabROSA, MIREX-2008 and Carnatic music dataset. The performance of the proposed approaches demonstrate the potential of the group delay based methods for melody extraction.

Keywords

Group delay Modified group delay-system Modified group delay-source Pitch extraction for music 

References

  1. Arora, V., & Behera, L. (2013). On-line melody extraction from polyphonic audio using harmonic cluster tracking. IEEE Transactions on Audio Speech and Language Processing, 21(3), 520–530.CrossRefGoogle Scholar
  2. Bello, J. P. (2003). Towards the automated analysis of simple polyphonic music: A knowledge based approach. Ph.D. Diss., University of London, Queen Mary.Google Scholar
  3. Bittner, R. M., Salamon, J., Tierney, M., Mauch, M., Cannam, C., & Bello, J. P. (2014). Medleydb: A multitrack dataset for annotation-intensive mir research. In Proceedings of the international society for music information retrieval (ISMIR), Taipei, Taiwan.Google Scholar
  4. Brossier, P. M. (2005, September). Fast melody extraction using aubio(brossier), mirex-2005. In 4th Music information retrieval evaluation eXchange (MIREX), extended abstract (pp. 325–333).Google Scholar
  5. Cancela, P. (2008). Tracking melody in polyphonic audio. In 4th music information retrieval evaluation eXchange (MIREX), extended abstract. Google Scholar
  6. Cao, C., Li, M., Liu, J., & Yan, Y. (2007). Singing melody extraction in polyphonic music by harmonic tracking. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval conference) (pp. 373–374).Google Scholar
  7. Dressler, K. (2011, October). An auditory streaming approach for melody extraction from polyphonic music. In Proceedings of international society for music information retrieval conference (pp. 19–24).Google Scholar
  8. Durrieu, J. L., Richard, G., & Fvotte, C. (2010). Source/filter model for unsupervised main melody extraction from polyphonic audio signals. IEEE transactions on audio, speech, and language processing (pp. 564–575).Google Scholar
  9. Goto, M., & Hayamizu, S. (1999, May) A real-time music scene description system: Detecting melody and bass lines in audio signals. In Working notes of the IJCAI-99 workshop on computational auditory scene analysis (pp. 31–40).Google Scholar
  10. Hsu, C.-L., Chen, L.-Y., Jang, J.-S. R., & Li, H.-J. (2009). Singing pitch extraction fom monaural polyphonic songs by contextuual audio modeling and singing harmonic enhancement. In Proceedings of the 10th international society for music information retrieval conference (pp. 201–206).Google Scholar
  11. Hsu, C. L., & Jang, J. S. (2010, May) Singing pitch extraction by voice vibrato/tremolo estimation and instrument partial deletion. In Proceedings of international society for music information retrieval (International Society for Music Information Retrieval Conference) (pp. 525–530).Google Scholar
  12. Hsu, C.-L., Wang, D., Jang, J.-S. R., & Hu, K. (2012). A tandem algorithm for singing pitch extraction and voice separation from music accompaniment. IEEE Transactions on Audio, Speech and Langauge Processing, 20(5), 1482–1491.CrossRefGoogle Scholar
  13. Hu, G., & Wang, D. L. (2010). A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio Speech Language Processing, 18(8), 2067–2079.CrossRefGoogle Scholar
  14. Jones, D., & Parks, T. (1990). A high-resolution data-adaptive time-frequency representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(12), 2127–2135, 1990.CrossRefGoogle Scholar
  15. Joo, S., Jo, S., & Yoo, C. D. (2010). Melody extraction from polyphonic audio signal mirex-2010. In 6th Music information retrieval evaluation exchange (MIREX), 2010.Google Scholar
  16. Joo, S., Park, S., Jo, S., & Yo, C. D. (2011). Melody extraction based on harmonic coded structures. In 12th international society for music information retrieval conference (ISMIR 2011) (pp. 227 –232).Google Scholar
  17. Kitahara, T. (2006). Computational musical instrument recognition and its application to content-based music information retrieval. Ph.D. Diss., Kyoto University, Japan.Google Scholar
  18. Kum, S., Oh, C., & Nam, J. (2016). Melody extraction on vocal segments using multi-column deep neural networks. In Proceedings of 17th international society for music information retrieval (ISMIR).Google Scholar
  19. Mauch, M., & Dixon, S. (2014, April). Pyin: A fundamental frequency estimator using probabilistic threshold distributions. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (pp. 659–663).Google Scholar
  20. Murthy, H. A. (1991, December). Algorithms for processing fourier transform phase of signals. PhD Dissertation, Department of Computer Science and Engg, Indian Institute of Technology, Madras, India.Google Scholar
  21. Murthy, H. A., & Yegnanarayana, B. (1991a). Formant extraction from minimum phase group delay function. Speech Communications, 10, 209–221.CrossRefGoogle Scholar
  22. Murthy, H. A., & Yegnanarayana, B. (1991b). Speech processing using group delay functions. Signal Processing, 22, 259–267.CrossRefGoogle Scholar
  23. Murthy, H. A., & Yegnanarayana, B. (2011). Group delay functions and its application to speech processing. Sadhana, 36(5), 745–782.CrossRefGoogle Scholar
  24. Nagarajan, T., Prasad, V. K., & Murthy, H. A. (2003). Minimum phase signal derived from the root cepstrum. IEEE Electronics Letters, 39, 941–942.CrossRefGoogle Scholar
  25. Oppenheim, A. V., & Schafer, R. W. (1990). Discrete time signal processing. New Jersey: Prentice Hall Inc.MATHGoogle Scholar
  26. Painter, T., & Spanias, A. (2000, April). Perceptual coding of digital audio. In Proceedings of IEEE (Vol. 88, No. 4, pp. 451–513).Google Scholar
  27. Poliner, G., Ellis, D., Ehmann, A., Gomez, E., Streich, S., & Ong, B. (2007, May). Melody transcription from music audio:approaches and evaluation. In Proceedings of the IEEE international conference on audio, speech and language processing (Vol. 15, No. 4, pp. 1247–1256).Google Scholar
  28. Prasad, V. K., Nagarajan, T., & Murthy, H. A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communications, 42, 429–446.CrossRefGoogle Scholar
  29. Rabiner, L., Cheng, M. J., Rosenberg, A. E., & McGonegal, C. A. (1976, October). A comparative performance study of several pitch detection algorithms. IEEE transactions on acoustics, speech and signal processing (Vol. ASSP-24, No. 5, pp. 399–418).Google Scholar
  30. Rajan, R., & Murthy, H. A. (2013a, May). Group delay based melody monopitch extraction from music. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 186–190).Google Scholar
  31. Rajan, R., & Murthy, H. A. (2013b, February). Melodic pitch extraction from music signals using modified group delay functions. In 2013 National conference on proceedings of the communications (NCC) (pp. 1–5).Google Scholar
  32. Rajan, R., & Murthy, H. A. (2016). Modified group delay based multipitch estimation in co-channel speech. arXiv:1603.05435.
  33. Ramakrishnan, S., Rao, V., & Rao, P. (2008, February). Singing voice detection in north indian classical music. In Proceedings of the national conference on communications (NCC).Google Scholar
  34. Rao, P., & Shandilya, S. (2004). On the detection of melodic pitch in a percussive background. The Journal of the Audio Engineering Society, 52(4), 378–391.Google Scholar
  35. Rao, V., Gaddipati, P., & Rao, P. (2012). Signal-driven window length adaptation for sinusoid detection in polyphonic music. IEEE Transactions on Audio Speech and Language Processing, 20(1), 342–348.CrossRefGoogle Scholar
  36. Rao, V., & Rao, P. (2010). Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2145–2154.CrossRefGoogle Scholar
  37. Ryynanen, M., & Klapuri, A. (2008). Automatic transcription of melody, base line, and chords in polyphonic music. Computer Music Journal, 32(3), 72–86.CrossRefGoogle Scholar
  38. Salamon, J., & Gomez, E. (2012). Melody extraction from polyphonic music signals using pitch contours characteristics. IEEE Transactions on Audio Speech and Language Processing, 20(6), 1759–1770.CrossRefGoogle Scholar
  39. Salamon, J., Gomez, E., Ellis, D. P. W., & Richard, G. (2014). Melody extraction from polyphonic music signals: Approaches, applications and challenges. IEEE Signal Processing Magazine, 31(2), 114–118.CrossRefGoogle Scholar
  40. Salamon, J., Gomez, E., Ellis, D., & Richard, G. (2015, April). Melody extraction from music recordings. In IEEE signal processing society.Google Scholar
  41. Sebastian, J., Kumar, P. A. M., & Murthy, H. A. (2016). An analysis of the high resolution property of group delay function with applications to audio signal processing. Speech Communication.Google Scholar
  42. Shanmugam, S. A., & Murthy, H. (2014, September). A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014).Google Scholar
  43. Tachibana, H., Ono, T., Ono, N., & Sagayama, S. (2010, April). Melody line estimation in homophonic music audio signals based on temporal-variability of melodic source. In Proceedings of IEEE international conference acoustics, speech, signal processing (pp. 425–428).Google Scholar
  44. Thornburg, H. (2003, September). Detection and modeling of transient audio signals with prior information. Ph.D. Thesis, Standford University.Google Scholar
  45. Veldhuis, R. (2000, October). Consistent pitch marking. In Proceedings of sixth international conference on spoken language processing (Vol. 3, pp. 207–210).Google Scholar
  46. Vijayan, K. Kumar, V., & Murty, K. S. R. (2014, September). Feature extraction from analytic phase of speech signals for speaker verification. In Proceedings of fifteenth annual conference of the international speech communication association (INTERSPEECH 2014) (pp. 1658–1662).Google Scholar
  47. Wavesurfer-an open source speech tool. (2000) [Online]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.1118
  48. Yegnanarayana, B., & Murthy, H. A. (1992). Significance of group delay functions in spectrum estimation. IEEE Transactions on Signal Processing, 40(9), 2281–2289.CrossRefMATHGoogle Scholar
  49. Yegnanarayana, B., Murthy, H. A., & Ramachandran, V. R. (1991, May). Processing of noisy speech using modified group delay functions. In Proceedings of the IEEE international conference on audio, speech and signal processing (pp. 945–948).Google Scholar
  50. Yeh, T. C., Wu, M. J., Jang, J. S. R., Chang, W. L., & Liao, I. B. (2012, March). A hybrid approach to singing pitch extraction based on trend estimation and hidden markov models. In Proceedings of IEEE international conference on acoustics speech and signal processing (ICASSP) Kyoto, Japan (pp. 457–460).Google Scholar
  51. Yoon, J. -Y., Song, C.-J., Lee, S.-P., & Park, H. (2011). Extracting predominant melody of polyphonic music based on harmonic structure. In 7th Music information retrieval evaluation eXchange (MIREX), extended abstract. Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology, MadrasChennaiIndia
  2. 2.Center for Computer Research in Music and AcousticsStanford UniversityStanfordUSA

Personalised recommendations