Voice Activity Detection Using Generalized Gamma Distribution

  • George Almpanidis
  • Constantine Kotropoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3955)


In this work, we model speech samples with a two-sided generalized Gamma distribution and evaluate its efficiency for voice activity detection. Using a computationally inexpensive maximum likelihood approach, we employ the Bayesian Information Criterion for identifying the phoneme boundaries in noisy speech.


Discrete Cosine Transformation Speech Signal False Alarm Rate Minimum Description Length Clean Speech 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rabiner, L.R., Sambur, M.R.: An algorithm for determining the endpoints of isolated utterances. Bell Syst. Tech. Journal 54(2), 297–315 (1975)CrossRefGoogle Scholar
  2. 2.
    Ying, G.S., Mitchell, C.D., Jamieson, L.H.: Endpoint detection of isolated utterances based on a modified Teager energy measurement. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 732–735 (1992)Google Scholar
  3. 3.
    Ganapathiraju, A., Webster, L., Trimble, J., Bush, K., Kornman, P.: Comparison of Energy-Based Endpoint Detectors for Speech Signal Processing. In: Proc. IEEE Southeastcon Bringing Together Education, Science and Technology, Florida, April 1996, pp. 500–503 (1996)Google Scholar
  4. 4.
    Tanyer, S., Ozer, H.: Voice activity detection in nonstationary noise. IEEE Trans. Speech and Audio Processing 8(4), 478–482 (2000)CrossRefGoogle Scholar
  5. 5.
    Sohn, J., Kim, N.S., Sung, W.: A statistical model based voice activity detection. IEEE Signal Processing Letters 6(1), 1–3 (1999)CrossRefGoogle Scholar
  6. 6.
    Chang, J., Shin, J., Kim, N.S.: Likelihood ratio test with complex Laplacian model for voice activity detection. In: Proc. European Conf. Speech Communication Technology (2003)Google Scholar
  7. 7.
    Nemer, E., Goubran, R., Mahmould, S.: Robust voice activity detection using higher-order statistics in the LPC residual domain. IEEE Trans. Speech and Audio Processing 9(3), 217–231 (2001)CrossRefGoogle Scholar
  8. 8.
    Schwartz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chen, S., Gopalakrishnam, P.: Speaker, environment and channel change detection and clustering via the Bayesian information criterion. In: DARPA Broadcast News Workshop (1998)Google Scholar
  10. 10.
    Grunwald, P.: Minimum description length tutorial. In: Advances in Minimum Description Length: Theory and Applications, pp. 23–80. MIT Press, Cambridge, MAGoogle Scholar
  11. 11.
    Delacourt, P., Wellekens, C.J.: DISTBIC: a speaker-based segmentation for audio data indexing. Speech Communication 32(1-2), 111–126 (2000)CrossRefGoogle Scholar
  12. 12.
    Tritschler, A., Gopinath, R.: Improved speaker segmentation and segments clustering using the Bayesian information criterion. In: Proc. 1999 European Speech Processing, vol. 2, pp. 679–682 (1999)Google Scholar
  13. 13.
    Gazor, S., Zhang, W.: Speech probability distribution. IEEE Signal Processing Letters 10(7), 204–207 (2003)CrossRefGoogle Scholar
  14. 14.
    Gazor, S., Zhang, W.: A soft voice activity detector based on a Laplacian-Gaussian model. IEEE Trans. on Speech and Audio Processing 11(5), 498–505 (2003)CrossRefGoogle Scholar
  15. 15.
    Martin, R.: Speech enhancement using short time spectral estimation with Gamma distributed priors. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Proc., vol. 1, pp. 253–256 (2005)Google Scholar
  16. 16.
    Nakamura, A.: Acoustic modeling for speech recognition based on a generalized Laplacian mixture distribution. Electronics and Communications in Japan Part II: Electronics 85(11), 32–42 (2002)CrossRefGoogle Scholar
  17. 17.
    Shin, W.-H., Lee, B.-S., Lee, Y.-K., Lee, J.-S.: Speech/non-speech classification using multiple features for robust endpoint detection. In: Proc. IEEE Intl Conf. Acoustics, Speech, and Signal Processing, vol. 3, pp. 1399–1402 (2000)Google Scholar
  18. 18.
    Shin, J.W., Chang, J.-H.: Statistical Modeling of Speech Signals Based on Generalized Gamma Distribution. IEEE Signal Processing Letters 12(3), 258–261 (2005)CrossRefGoogle Scholar
  19. 19.
    Pigeon, S., Vandendorpe, L.: The M2VTS multimodal face database. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 403–409. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  20. 20.
    TIMIT Acoustic-Phonetic Continuous Speech Corpus. National Institute of Standards and Technology Speech. Disc 1-1.1, NTIS Order No. PB91-505065 (1990)Google Scholar
  21. 21.
    Varga, A., Steeneken, H., Tomlinson, M., Jones, D.: The NOISEX-92 study on the affect of additive noise on automatic speech recognition, Technical Report, DRA Speech Research Unit, Malvern, England (1992)Google Scholar
  22. 22.
    Shi, J.W., Chang, J.-H., Yun, H.S., Kim, N.S.: Voice Activity Detection based on Generalized Gamma Distribution. In: Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, pp. 781–784 (2005)Google Scholar
  23. 23.
    Ramirez, J., Segura, C., Benitez, C., Torre, A., Rubio, A.: A new Kullback-Leibler VAD for speech recognition in noise. IEEE Signal Processing Letters 11(2), 266–269 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • George Almpanidis
    • 1
  • Constantine Kotropoulos
    • 1
  1. 1.Department of InformaticsAristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations