Comparative study of singing voice detection methods

You, Shingchern D.; Wu, Yi-Chung; Peng, Shih-Hsien

doi:10.1007/s11042-015-2894-9

Comparative study of singing voice detection methods

Published: 29 August 2015

Volume 75, pages 15509–15524, (2016)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shingchern D. You¹,
Yi-Chung Wu¹ &
Shih-Hsien Peng¹

426 Accesses
10 Citations
Explore all metrics

Abstract

Detecting Singing segments in a segment of a soundtrack is an important and useful technique in musical signal processing and retrieval. In this paper, we study the accuracy of detecting singing segments using the HMM (Hidden Markov Model) classifier with various features, including MFCC (Mel Frequency Cepstral Coefficients), LPCC (Linear Predictive Cepstral Coefficients), and LPC (Linear Prediction Coefficients). Simulation results show that detecting singing segments in a soundtrack is more difficult than detecting them among pure-instrument segments. In addition, combining MFCC and LPCC yield higher accuracy. The bootstrapping technique has only limited accuracy improvement to detect all singing segments in a soundtrack. To be complete, we also conduct an experiment to show that the time to perform music identification can be reduced by more than 40 % if we incorporate the singing-voice detection mechanism into the identification process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram

Article 20 August 2022

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Becchetti C, Ricotti LP (1999) Speech recognition: theory and C++ implementation. Wiley, New York
Google Scholar
Berenzweig AL, Ellis DPW (2001) “Locating singing voice segments within music signals.” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 21–24
Cano P, Battle E, Kalker T, Haitsma J (2005) A review of audio fingerprinting. J VLSI Signal Process 41(3):271–284
Article Google Scholar
Casey MA (1987) “MPEG-7 sound recognition tools,” IEEE Trans. Circuits and Systems for Video Tech, vol. 11, no. 6, pp. 737–747, June, 2001.D. O’Shaughnessy, Speech Communication: Human and Machine, Addison-Wesley, Reading MA
Casey MA (2001) “Reduced-rank spectra and minimum-entropy priors as consistent and reliable cues for generalized sound recognition.” Proceedings of workshop for consistent & reliable acoustic cues for sound analysis. Columbia Univ., NY, USA, 167
Cho H, Choi M (2014) Personal mobile album/diary application development. J Converg 5(1):32–37
Google Scholar
http://www.siliconrepublic.com/digital-life/item/38714-15-billion-songs-have-been
ISO/IEC (2002) Information Technology -- Multimedia Content Description Interface -Part 4: Audio, IS 15938–4
ISO/IEC (2003) Information technology -- Multimedia content description interface -- Part 6: Reference software ISO 15938–6. The reference program is available at http://standards.iso.org/ittf/PubliclyAvailableStandards/c035364_ISO_IEC_15938-6%28E%29_Reference_Software.zip
Lindsay PH, Norman DA (1977) Human information processing: An introduction to psychology, 2nd edn. Academic, New York
Google Scholar
Lukashevich H, H. et al (2007) “Effective singing voice detection in popular music using ARMA filtering.” Proc. 10th International Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, 10–15
Murrphy K (19998) “Hidden Markov Model (HMM) Toolbox for Matlab,” available at http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html
New TL et al (2004) “Singing voice detection in popular music.” Proc. 12th Annual ACM International Conference on Multimedia, 1–4
O’Shaughnessy D (1987) Speech communication: Human and machine. Addison-Wesley, Reading
MATH Google Scholar
Rabiner LR, Juang BH (1993) Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs
MATH Google Scholar
Rocamora M, Herrera P (2007) “Comparing audio descriptors for singing voice detection in music audio files.” Proc. of 11th Brazilian Symposium on Computer Music, 1–10
Tzanetakis G (2004) “Song-specific bootstrapping of singing voice structure.” Proc. 2004 I.E. International Conference on Multimedia and Expo, vol. 3, 2027–2030
Vembu S, Baumann S (2005) “Separation of vocals from polyphonic audio recordings.” Proc. of 6th International Conference on Music Information Retrieval (ISMIR 2005), 1–8
Yoon S-H, Min J (2013) An intelligent automatic early detection system of forest fire smoke signatures using gaussian mixture model. J Inf Process Syst 9(4):621–632
Article Google Scholar
You SD, Pu Y-H (2015) Using paired distances of signal peaks in stereo channels as fingerprints for copy identification. ACM Trans Multimedia Comput Commun Appl 12(1):22
You SD, Chen W-H (2015) Comparative study of methods for reducing dimensionality of MPEG-7 audio signature descriptors. Multimed Tools Appl 74(10):3579–3598
Article Google Scholar
You SD, Chen W-H, Chen W-K (2013) “Music identification system using MPEG-7 audio signature descriptors.” Sci World J 2013:doi:10.1155/2013/752464

Download references

Acknowledgments

This work was supported in part by National Science Council (NSC) and Ministry of Science and Technology (MOST) of Taiwan through Grants NSC 101-2221-E-027-127 and MOST 103-2221-E-027-092.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, 106, Taiwan
Shingchern D. You, Yi-Chung Wu & Shih-Hsien Peng

Authors

Shingchern D. You
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Chung Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Hsien Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shingchern D. You.

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, S.D., Wu, YC. & Peng, SH. Comparative study of singing voice detection methods. Multimed Tools Appl 75, 15509–15524 (2016). https://doi.org/10.1007/s11042-015-2894-9

Download citation

Received: 30 November 2014
Revised: 27 June 2015
Accepted: 17 August 2015
Published: 29 August 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11042-015-2894-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative study of singing voice detection methods

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comparative study of singing voice detection methods

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Heart Sound Classification Using Deep Learning Techniques Based on Log-mel Spectrogram

Speech Emotion Recognition: A Comprehensive Survey

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation