Skip to main content

Context-Aware Features for Singing Voice Detection in Polyphonic Music

  • Conference paper
Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation (AMR 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7836))

Included in the following conference series:

Abstract

The effectiveness of audio content analysis for music retrieval may be enhanced by the use of available metadata. In the present work, observed differences in singing style and instrumentation across genres are used to adapt acoustic features for the singing voice detection task. Timbral descriptors traditionally used to discriminate singing voice from accompanying instruments are complemented by new features representing the temporal dynamics of source pitch and timbre. A method to isolate the dominant source spectrum serves to increase the robustness of the extracted features in the context of polyphonic audio. While demonstrating the effectiveness of combining static and dynamic features, experiments on a culturally diverse music database clearly indicate the value of adapting feature sets to genre-specific acoustic characteristics. Thus commonly available metadata, such as genre, can be useful in the front-end of an MIR system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berenzweig, A., Ellis, D., Lawrence, S.: Using voice segments to improve artist classification of music. In: 22nd International Conference of Audio Engineering Society, Finland (2002)

    Google Scholar 

  2. Li, Y., Wang, D.: Separation of singing voice from music accompaniment for monoaural recordings. IEEE Trans. of Audio, Speech Lang. Proc. 15(4), 1475–1487 (2007)

    Article  Google Scholar 

  3. Fujihara, H., Goto, M.: Three techniques for improving automatic synchronization between music and lyrics: Fricative detection, filler model and novel feature vectors for vocal activity detection. In: IEEE International Conference on Acoust., Speech, Signal Proc., Las Vegas (2008)

    Google Scholar 

  4. Lukashevich, H., Gruhne, M., Dittmar, C.: Effective singing voice detection in popular music using ARMA filtering. In: 10th International Conference on Digital Audio Effects (DAFx 2007), Bordeaux, France (2007)

    Google Scholar 

  5. Xiao, L., Zhou, J., Zhang, T.: Using DTW based unsupervised segmentation to improve the vocal part detection in pop music. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany (2008)

    Google Scholar 

  6. Fujihara, et al.: F0 estimation method for singing voice in polyphonic audio signal based on statistical vocal model and viterbi search. In: IEEE International Conference on Acoust. Speech and Sig. Processing, Toulouse, France (2006)

    Google Scholar 

  7. Berenzweig, A., Ellis, D.: Locating singing voice segments within music signals. In: IEEE Workshop Applications of Sig. Process. to Audio and Acoust., New York (2001)

    Google Scholar 

  8. Maddage, N., Xu, C., Wang, Y.: A SVM-based classification approach to musical audio. In: International Conference on Music Information Retrieval, Baltimore (2003)

    Google Scholar 

  9. Ramona, M., Richard, G., David, B.: Vocal detection in music with support vector machines. In: IEEE International Conference on Acoust. Speech and Sig. Process. (2008)

    Google Scholar 

  10. Nwe, T., Li, H.: Exploring vibrato-motivated acoustic features for singer identification. IEEE Trans. Audio Speech Lang. Process. 15(2), 519–530 (2007)

    Article  Google Scholar 

  11. Kim, Y., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proc. 5th Intl. Conf. on Music Information Retrieval, Spain (2004)

    Google Scholar 

  12. Nwe, T., Li, H.: On fusion of timbre-motivated features for singing voice detection and singer identification. In: IEEE International Conference Acoust., Speech, Signal Proc., Las Vegas (2008)

    Google Scholar 

  13. Chou, W., Gu, L.: Robust singing detection in speech/music discriminator design. In: IEEE International Conference Acoust. Speech Sig. Process. (2001)

    Google Scholar 

  14. Tzanetakis, G.: Song-specific bootstrapping of singing voice structure. In: IEEE International Conference Multimedia and Expo, Taipei, Taiwan (2004)

    Google Scholar 

  15. Zhang, T.: System and method for automatic singer identification. In: IEEE International Conference Multimedia and Expo, Baltimore (2003)

    Google Scholar 

  16. Vallet, F., McKinney, M.: Perceptual constraints for automatic vocal detection in music recordings. In: Conference Interdisciplinary Musicology (2007)

    Google Scholar 

  17. Regnier, L., Peeters, G.: Singing voice detection in music tracks using direct voice vibrato detection. In: IEEE International Conference Acoust. Speech Sig. Process., Taipei, Taiwan (2009)

    Google Scholar 

  18. Lidy, T., et al.: On the Suitability of State-of-the-art Music Information Retrieval Methods for Analyzing, Categorizing and Accessing Non-Western and Ethnic Music Collections. In: Elsevier Signal Processing Special issue on Ethnic Music Audio Documents: From the Preservation to the Fruition (2009)

    Google Scholar 

  19. Mohammed, N., Squire, D.M.: Effectiveness of ICF features for collection-specific CBIR. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds.) AMR 2011. LNCS, vol. 7836, pp. 83–95. Springer, Heidelberg (2013)

    Google Scholar 

  20. Proutskova, P., Casey, M.: You call that singing? Ensemble classification for multi-cultural collections of music recordings. In: 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)

    Google Scholar 

  21. Fuhrmann, F., Haro, M., Herrera, P.: Scalability, Generality and Temporal Aspects in Automatic Recognition of Predominant Musical Instruments in Polyphonic Music. In: 10th International Conference on Music Information Retrieval, Kobe, Japan (2009)

    Google Scholar 

  22. Fuhijara, H., Goto, M., Kitahara, T., Okuno, H.: A modeling of singing voice robust to accompaniment sounds and its application to singer identification and vocal-timbre-similarity-based music information retrieval. IEEE Trans. Audio, Speech, Lang. Process. 18(3), 638–648 (2010)

    Article  Google Scholar 

  23. Rao, V., Rao, P.: Vocal melody extraction in the presence of pitched accompaniment in polyphonic music. IEEE Trans. Audio Speech and Lang. Process. 18(8), 2145–2154 (2010)

    Article  Google Scholar 

  24. Pant, S., Rao, V., Rao, P.: A melody detection user interface for polyphonic music. In: National Conference Comm., Chennai, India (2010)

    Google Scholar 

  25. Rao, V., Gaddipati, P., Rao, P.: Signal-driven adaptation for singing voice processing in polyphony. IEEE Trans. Audio, Speech, Lang. Process. (2011) (accepted with minor mandatory revisions)

    Google Scholar 

  26. Serra, X.: Music sound modeling with sinusoids plus noise. In: Roads, C., Pope, S., Picialli, A., De Poli, G. (eds.) Musical Signal Processing, Swets and Zeitlinger (1997)

    Google Scholar 

  27. Rocamora, M., Herrera, P.: Comparing audio descriptors for singing voice detection in music audio files. In: Brazilian Symposium on Computer Music (2007)

    Google Scholar 

  28. Peeters, G.: A large set of audio features for sound description (similarity and classification) in the CUIDADO project. In: CUIDADO I.S.T. Project Report (2004)

    Google Scholar 

  29. Lagrange, M., Raspaud, M., Badeau, R., Richard, G.: Explicit modeling of temporal dynamics within musical signals for acoustic unit similarity. Pattern Recog. Letters 31(12), 1498–1506 (2010)

    Article  Google Scholar 

  30. Burred, J., Robel, A., Sikora, T.: Dynamic spectral envelope modeling for timbre analysis of musical instrument sounds. IEEE Trans. Audio Speech Lang. Process. 18(3), 663–674 (2010)

    Article  Google Scholar 

  31. Aucouturier, J.-J., Patchet, F.: The influence of polyphony on the dynamic modeling of musical timbre. Pattern Recog. Letters 28(5), 654–661 (2007)

    Article  Google Scholar 

  32. Sundberg, J.: A rhapsody on perception. In: The Science of Singing Voice. Northern Illinois University Press (1987)

    Google Scholar 

  33. Shenoy, A., Wu, Y., Wang, Y.: Singing voice detection for karaoke application. In: Visual Comm. and Image Proc., Beijing, China (2005)

    Google Scholar 

  34. Rao, V., Rao, P.: Singing voice detection using predominant pitch. In: InterSpeech, Brighton, U.K. (2009)

    Google Scholar 

  35. Hall, M., et al.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  36. Bouman, C.: Cluster: An unsupervised algorithm for modeling Gaussian mixtures, http://www.ece.purdue.edu/~bouman

  37. Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence 20(3) (1998)

    Google Scholar 

  38. Markaki, M., Holzapfel, A., Stylianou, Y.: Singing voice detection using modulation frequency features. In: Workshop on Statistical and Perceptual Audition (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rao, V., Gupta, C., Rao, P. (2013). Context-Aware Features for Singing Voice Detection in Polyphonic Music. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds) Adaptive Multimedia Retrieval. Large-Scale Multimedia Retrieval and Evaluation. AMR 2011. Lecture Notes in Computer Science, vol 7836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37425-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37425-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37424-1

  • Online ISBN: 978-3-642-37425-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics