Abstract
This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.
Chapter PDF
Similar content being viewed by others
References
Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing 9, 217–231 (2001)
Junqua, J.C., Reaves, B., Mak, B.: A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. In: Proc. Eurospeech 1991, pp. 371–1374 (1991)
Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD techniques for real-time speech transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)
Guo, G., Li, S.Z.: Content-Based Audio Classificationand Retrieval by Support VectorMachines. IEEE Trans. on Neural Networks 14(1), 209–215 (2003)
Stegmann, J., Schroeder, G.: Robust Voice Activity Detection Based on the Wavelet Transform. In: Proc. IEEE Workshop on Speech Coding, September 7-10, 1997, pp. 99–100 (1997)
Lin, C.-C., Chen, S.-H., Truong, T.K., Chang, Y.: Audio Classification and Categorization Based on Wavelets and Support Vector Machine. IEEE Transactions on Speech and Audio Processing 13(5), 644–651 (2005)
ETSI: Draft Recommendation prETS 300 724: GSM Enhanced Full Rate (EFR) speech codec (1996)
ITU-T: Draft Recommendation G.729, Annex B: Voice Activity Detection (1996)
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ (1993)
Agustín, J.G., Hussein, A.W.: Audio mixing for interactive multimedia communications. In: JCIS 1998, Research Triangle, NC, pp. 217–220 (1998)
Yang, S., Yu, S., Zhou, J.: Multipoint communications with speech mixing over IP network. Computer communications 25, 46–55 (2002)
Venkat, R.P., Harrick, M.V., Srinivas, R.: Communication architectures and algorithms for media mixing in multimedia conferences. IEEE/ACM Trans. on Networking 1(1), 20–30 (1993)
Cortes, C., Vapnik, C.: Support Vector Networks. Machine Learning 20, 273–297 (1995)
Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)
Thomas Parsons, W.: Voice and Speech Processing. McGraw-Hill Book Company, New York (1986)
Platt, J.C.: A Fast Algorithm for Training Support Vector Machines. Microsoft Research Technical Report MSR-TR-98-14 (April 1998)
Xing, F., Gu, W.-k.: Research on fast real-time adaptive audio mixing in multimedia conference. Journal of Zhejiang University Science 6a(6), 507–512 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xue, W., Du, S., Fang, C., Ye, Y. (2006). Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm. In: Huang, T.S., et al. Computer Vision in Human-Computer Interaction. ECCV 2006. Lecture Notes in Computer Science, vol 3979. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11754336_8
Download citation
DOI: https://doi.org/10.1007/11754336_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34202-1
Online ISBN: 978-3-540-34203-8
eBook Packages: Computer ScienceComputer Science (R0)