International Journal of Speech Technology

, Volume 8, Issue 2, pp 133–146 | Cite as

A Robust, Real-Time Voice Activity Detection Algorithm for Embedded Mobile Devices

Article
  • 93 Downloads

Abstract

When an Automatic Speech Recognition (ASR) system is applied in noisy environments, Voice Activity Detection (VAD) is crucial to the performance of the overall system. The employment of the VAD for ASR on embedded mobile systems will minimize physical distractions and make the system convenient to use. Conventional VAD algorithm is of high complexity, which makes it unsuitable for embedded mobile devices; or of low robustness, which holds back its application in mobile noisy environments. In this paper, we propose a robust VAD algorithm specifically designed for ASR on embedded mobile devices. The architecture of the proposed algorithm is based on a two-level decision making strategy, where there is an interaction between a lower features-based level and subsequent decision logic based on a finite-state machine. Many discriminating features are employed in the lower level to improve the robustness of the VAD. The two-level decision strategy allows different features to be used in different states and reduces the cost of the algorithm, which makes the proposed algorithm suitable for embedded mobile devices. The evaluation experiments show the proposed VAD algorithm is robust and contribute to the overall performance gain of the ASR system in various acoustic environments.

Keywords

voice activity detection noisy robust real-time embedded automatic speech recognition 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Benassine, A., Shlomot, E., and Su, H. (1997). ITU-T recommendation G.729, annex B, a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. In IEEE Commun. Mag., pp. 64–97.Google Scholar
  2. Chengalvaryan, R. (2001). Evaluation of front-end features and noise compensation methods for robust mandarin speech recognition. In Proceeding of Eurospeech.Google Scholar
  3. De Wet, F. (2001). A comparison of LPC and FFT-based acoustic features for noise robust ASR. In Proceeding of Eurospeech.Google Scholar
  4. Ganapathiraju, A. (1996). Comparison of energy-based endpoint detection for speech signal processing. In Proceedings of the IEEE Southeastcon. Tampa, Florida, USA, pp. 500–503.Google Scholar
  5. Huang, X.D. and Acero, A. (2001). Spoken Language Processing, A Guide to Theory, Algorithm, and System Development. Prentice Hall.Google Scholar
  6. Junqua, J.C., Reaves, B., and Mak, B. (1991). A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. In Proceeding of Eurospeech, pp. 1371–1374.Google Scholar
  7. Martens, J.P. (2000). Continuous speech recognition over the telephone. Final Report of COST Action 249.Google Scholar
  8. Nemer, E. (2001). Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. on Speech and Audio Processing, 9(3).Google Scholar
  9. Picone, J. (1993). Signal modeling techniques in speech recognition. Proc. IEEE, 79(4):1215–1247.Google Scholar
  10. Rabiner, L. and Juang, B.H. (1993). Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  11. Renevey, P. (2001). Entropy based voice activity detection in very noisy conditions. In Proceeding of Eurospeech.Google Scholar
  12. Savoji, M.H. (1989). A robust algorithm for accurate endpointing of speech. Speech Communication, 8:45–60.CrossRefGoogle Scholar
  13. Shieh, W.C. (1999). The dependence of feature vectors under adverse noise, In Proceeding of Eurospeech.Google Scholar
  14. Shin, W.H. (2000). Speech/non-speech classification using multiple features for robust endpoint detection. In Proceeding of ICASSP.Google Scholar
  15. Tanyer, S.G. (2000). Voice activity detection in nonstationary noise. IEEE Trans. On Speech and Audio Processing, 8(4).Google Scholar
  16. Tucker, R. (1992). Voice activity detection using a periodicity measure. In Proc Inst. Elect. Eng., 139:377–380.Google Scholar
  17. Wu, G.D. and Lin, C.T. (2000). Word boundary detection with mel-scale frequency bank in noisy environment. IEEE Trans. Speech and Audio Processing, 8(5).Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Bian Wu
    • 1
  • Xiaolin Ren
    • 2
  • Chongqing Liu
    • 3
  • Yaxin Zhang
    • 4
  1. 1.Institute of Image Processing and Pattern RecognitionShanghai Jiaotong UniversityShanghaiPeople's Republic of China
  2. 2.Motorola Labs China Research CenterW. ShanghaiPeople's Republic of China
  3. 3.Institute of Image Processing and Pattern RecognitionShanghai Jiaotong UniversityShanghaiPeople's Republic of China
  4. 4.Motorola Labs China Research CenterW. ShanghaiPeople's Republic of China

Personalised recommendations