Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

Xue, Wei; Du, Sidan; Fang, Chengzhi; Ye, Yingxian

doi:10.1007/11754336_8

Wei Xue²³,
Sidan Du²³,
Chengzhi Fang²³ &
…
Yingxian Ye²³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3979))

Included in the following conference series:

European Conference on Computer Vision

913 Accesses
1 Citations

Abstract

This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.

Download to read the full chapter text

Chapter PDF

A voice activity detection algorithm in spectro-temporal domain using sparse representation

Article 01 August 2018

Mohadese Eshaghi, Farbod Razzazi & Alireza Behrad

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave

Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal

Article 18 June 2018

Himadri Mukherjee, Sk. Md. Obaidullah, … Kaushik Roy

References

Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing 9, 217–231 (2001)
Article Google Scholar
Junqua, J.C., Reaves, B., Mak, B.: A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. In: Proc. Eurospeech 1991, pp. 371–1374 (1991)
Google Scholar
Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD techniques for real-time speech transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)
Google Scholar
Guo, G., Li, S.Z.: Content-Based Audio Classificationand Retrieval by Support VectorMachines. IEEE Trans. on Neural Networks 14(1), 209–215 (2003)
Article MathSciNet Google Scholar
Stegmann, J., Schroeder, G.: Robust Voice Activity Detection Based on the Wavelet Transform. In: Proc. IEEE Workshop on Speech Coding, September 7-10, 1997, pp. 99–100 (1997)
Google Scholar
Lin, C.-C., Chen, S.-H., Truong, T.K., Chang, Y.: Audio Classification and Categorization Based on Wavelets and Support Vector Machine. IEEE Transactions on Speech and Audio Processing 13(5), 644–651 (2005)
Article Google Scholar
ETSI: Draft Recommendation prETS 300 724: GSM Enhanced Full Rate (EFR) speech codec (1996)
Google Scholar
ITU-T: Draft Recommendation G.729, Annex B: Voice Activity Detection (1996)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ (1993)
MATH Google Scholar
Agustín, J.G., Hussein, A.W.: Audio mixing for interactive multimedia communications. In: JCIS 1998, Research Triangle, NC, pp. 217–220 (1998)
Google Scholar
Yang, S., Yu, S., Zhou, J.: Multipoint communications with speech mixing over IP network. Computer communications 25, 46–55 (2002)
Article Google Scholar
Venkat, R.P., Harrick, M.V., Srinivas, R.: Communication architectures and algorithms for media mixing in multimedia conferences. IEEE/ACM Trans. on Networking 1(1), 20–30 (1993)
Article Google Scholar
Cortes, C., Vapnik, C.: Support Vector Networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Book MATH Google Scholar
Thomas Parsons, W.: Voice and Speech Processing. McGraw-Hill Book Company, New York (1986)
Google Scholar
Platt, J.C.: A Fast Algorithm for Training Support Vector Machines. Microsoft Research Technical Report MSR-TR-98-14 (April 1998)
Google Scholar
Xing, F., Gu, W.-k.: Research on fast real-time adaptive audio mixing in multimedia conference. Journal of Zhejiang University Science 6a(6), 507–512 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics Science and Engineering, Nanjing University, Nanjing, 210093, P.R. China
Wei Xue, Sidan Du, Chengzhi Fang & Yingxian Ye

Authors

Wei Xue
View author publications
You can also search for this author in PubMed Google Scholar
Sidan Du
View author publications
You can also search for this author in PubMed Google Scholar
Chengzhi Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yingxian Ye
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Beckman Institute, University of Illinois at Urbana-Champaign, USA
Thomas S. Huang
Intelligent Systems Lab Amsterdam, University of Amsterdam, The Netherlands
Nicu Sebe
LIACS Media Lab, Leiden University, Netherlands
Michael S. Lew
Deptartment of Computer Science, Rutgers University, 08854, Piscataway, NJ, USA
Vladimir Pavlović
Naval Postgraduate School, USA
Mathias Kölsch
School of Computing, University of Leeds, LS2 9JT, UK
Aphrodite Galata
Delphi Coorporation, USA
Branislav Kisačanin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, W., Du, S., Fang, C., Ye, Y. (2006). Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm. In: Huang, T.S., et al. Computer Vision in Human-Computer Interaction. ECCV 2006. Lecture Notes in Computer Science, vol 3979. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11754336_8

Download citation

DOI: https://doi.org/10.1007/11754336_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34202-1
Online ISBN: 978-3-540-34203-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

A voice activity detection algorithm in spectro-temporal domain using sparse representation

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave

Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

Abstract

Chapter PDF

Similar content being viewed by others

A voice activity detection algorithm in spectro-temporal domain using sparse representation

A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave

Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation