Improved Likelihood Ratio Test Detector Using a Jointly Gaussian Probability Distribution Function

  • O. Pernía
  • J. M. Górriz
  • J. Ramírez
  • C. G. Puntonet
  • I. Turias
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4528)

Abstract

Currently, the accuracy of speech processing systems is stro- ngly affected by the acoustic noise. This is a serious obstacle to meet the demands of modern applications and therefore these systems often needs a noise reduction algorithm working in combination with a precise voice activity detector (VAD). This paper presents a new voice activity detector (VAD) for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm defines an optimum likelihood ratio test (LRT) involving Multiple and correlated Observations (MO). The so defined decision rule reports significant improvements in speech/non-speech discrimination accuracy over existing VAD methods with optimal performance when just a single observation is processed. The algorithm has an inherent delay in MO scenario that, for several applications including robust speech recognition, does not represent a serious implementation obstacle. An analysis of the methodology for a pair-wise observation dependence shows the improved robustness of the proposed approach by means of a clear reduction of the classification error as the number of observations is increased. The proposed strategy is also compared to different VAD methods including the G.729, AMR and AFE standards, as well as recently reported algorithms showing a sustained advantage in speech/non-speech detection accuracy and speech recognition performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Benyassine, A., Shlomot, E., Su, H., Massaloux, D., Lamblin, C., Petit, J.: ITU-T Recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Communications Magazine 35(9), 64–73 (1997)CrossRefGoogle Scholar
  2. 2.
    ITU, A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70. ITU-T Recommendation G.729-Annex B (1996)Google Scholar
  3. 3.
    ETSI, Voice activity detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels. ETSI EN 301 708 Recommendation (1999)Google Scholar
  4. 4.
    ETSI, Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. ETSI ES 201 108 Recommendation (2002)Google Scholar
  5. 5.
    Bouquin-Jeannes, R.L., Faucon, G.: Study of a voice activity detector and its influence on a noise reduction system. Speech Communication 16, 245–254 (1995)CrossRefGoogle Scholar
  6. 6.
    Sohn, J., Kim, N.S., Sung, W.: A statistical model-based voice activity detection. IEEE Signal Processing Letters 16(1), 1–3 (1999)CrossRefGoogle Scholar
  7. 7.
    Cho, Y.D., Al-Naimi, K., Kondoz, A.: Improved voice activity detection based on a smoothed statistical likelihood ratio. In: Proc. of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 737–740 (2001)Google Scholar
  8. 8.
    Ramírez, J., Segura, J.C., Benítez, C., García, L., Rubio, A.: Statistical voice activity detection using a multiple observation likelihood ratio test. IEEE Signal Processing Letters 12(10), 837–844 (2001)Google Scholar
  9. 9.
    Górriz, J.M., Ramírez, J., Puntonet, C.G., Segura, J.C.: An effective cluster-based model for robust speech detection and speech recognition in noisy environments. Journal of Acoustical Society of America 120(470), 470–481 (2006)CrossRefGoogle Scholar
  10. 10.
    Górriz, J.M., Ramirez, J., Segura, J.C., Puntonet, C.G.: An improved mo-lrt vad based on a bispectra gaussian model. Electronic Letters 41(15), 877–879 (2005)CrossRefGoogle Scholar
  11. 11.
    Moreno, A., Borge, L., Christoph, D., Gael, R., Khalid, C., Stephan, E., Jeffrey, A.: SpeechDat-Car: A Large Speech Database for Automotive Environments. In: Proceedings of the II LREC Conference (2000)Google Scholar
  12. 12.
    Akhiezer, N.I.: The Classical Moment Problem. Oliver and Boyd, Edinburgh (1965)Google Scholar
  13. 13.
    Yamani, H.A., Abdelmonem, M.S.: The analytic inversion of any finite symmetric tridiagonal matrix. J. Phys. A: Math. Gen. 30, 2889–2893 (1997)CrossRefMathSciNetMATHGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • O. Pernía
    • 1
  • J. M. Górriz
    • 1
  • J. Ramírez
    • 1
  • C. G. Puntonet
    • 1
  • I. Turias
    • 1
  1. 1.E.T.S.I.I., Universidad de Granada, C/ Periodista Daniel Saucedo, 18071 GranadaSpain

Personalised recommendations