Voice Activity Detection Using Wavelet-Based Multiresolution Spectrum and Support Vector Machines and Audio Mixing Algorithm

  • Wei Xue
  • Sidan Du
  • Chengzhi Fang
  • Yingxian Ye
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3979)


This paper presents a Voice Activity Detection (VAD) algorithm and efficient speech mixing algorithm for a multimedia conference. The proposed VAD uses MFCC of multiresolution spectrum based on wavelets and two classical audio parameters as audio feature, and prejudges silence by detection of multi-gate zero cross ratio, and classify noise and voice by Support Vector Machines (SVM). New speech mixing algorithm used in Multipoint Control Unit (MCU) of conferences imposes short-time power of each audio stream as mixing weight vector, and is designed for parallel processing in program. Various experiments show, proposed VAD algorithm achieves overall better performance in all SNRs than VAD of G.729b and other VAD, output audio of new speech mixing algorithm has excellent hearing perceptibility, and its computational time delay are small enough to satisfy the needs of real-time transmission, and MCU computation is lower than that based on G.729b VAD.


  1. 1.
    Nemer, E., Goubran, R., Mahmoud, S.: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Transactions on Speech and Audio Processing 9, 217–231 (2001)CrossRefGoogle Scholar
  2. 2.
    Junqua, J.C., Reaves, B., Mak, B.: A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. In: Proc. Eurospeech 1991, pp. 371–1374 (1991)Google Scholar
  3. 3.
    Sangwan, A., Chiranth, M.C., Jamadagni, H.S., Sah, R., Prasad, R.V., Gaurav, V.: VAD techniques for real-time speech transmission on the Internet. In: IEEE International Conference on High-Speed Networks and Multimedia Communications, pp. 46–50 (2002)Google Scholar
  4. 4.
    Guo, G., Li, S.Z.: Content-Based Audio Classificationand Retrieval by Support VectorMachines. IEEE Trans. on Neural Networks 14(1), 209–215 (2003)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Stegmann, J., Schroeder, G.: Robust Voice Activity Detection Based on the Wavelet Transform. In: Proc. IEEE Workshop on Speech Coding, September 7-10, 1997, pp. 99–100 (1997)Google Scholar
  6. 6.
    Lin, C.-C., Chen, S.-H., Truong, T.K., Chang, Y.: Audio Classification and Categorization Based on Wavelets and Support Vector Machine. IEEE Transactions on Speech and Audio Processing 13(5), 644–651 (2005)CrossRefGoogle Scholar
  7. 7.
    ETSI: Draft Recommendation prETS 300 724: GSM Enhanced Full Rate (EFR) speech codec (1996)Google Scholar
  8. 8.
    ITU-T: Draft Recommendation G.729, Annex B: Voice Activity Detection (1996) Google Scholar
  9. 9.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs, NJ (1993)zbMATHGoogle Scholar
  10. 10.
    Agustín, J.G., Hussein, A.W.: Audio mixing for interactive multimedia communications. In: JCIS 1998, Research Triangle, NC, pp. 217–220 (1998)Google Scholar
  11. 11.
    Yang, S., Yu, S., Zhou, J.: Multipoint communications with speech mixing over IP network. Computer communications 25, 46–55 (2002)CrossRefGoogle Scholar
  12. 12.
    Venkat, R.P., Harrick, M.V., Srinivas, R.: Communication architectures and algorithms for media mixing in multimedia conferences. IEEE/ACM Trans. on Networking 1(1), 20–30 (1993)CrossRefGoogle Scholar
  13. 13.
    Cortes, C., Vapnik, C.: Support Vector Networks. Machine Learning 20, 273–297 (1995)zbMATHGoogle Scholar
  14. 14.
    Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992)CrossRefzbMATHGoogle Scholar
  15. 15.
    Thomas Parsons, W.: Voice and Speech Processing. McGraw-Hill Book Company, New York (1986)Google Scholar
  16. 16.
    Platt, J.C.: A Fast Algorithm for Training Support Vector Machines. Microsoft Research Technical Report MSR-TR-98-14 (April 1998)Google Scholar
  17. 17.
    Xing, F., Gu, W.-k.: Research on fast real-time adaptive audio mixing in multimedia conference. Journal of Zhejiang University Science 6a(6), 507–512 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Wei Xue
    • 1
  • Sidan Du
    • 1
  • Chengzhi Fang
    • 1
  • Yingxian Ye
    • 1
  1. 1.Department of Electronics Science and EngineeringNanjing UniversityNanjingP.R. China

Personalised recommendations