Robust Endpoint Detection

  • Qi (Peter) LiEmail author
Part of the Signals and Communication Technology book series (SCT)


Often the first step in speech signal processing is the use of endpoint detection to separate speech and silence signals for further processing. This topic has been studied for several decades; however, as wireless communications and VoIP phones are becoming more and more popular, more background and system noises are affecting communication channels, which poses a challenge to the existing algorithms; therefore new and robust algorithms are needed.


Energy Normalization Automatic Speech Recognition Speaker Recognition Voice Over Internet Protocol Word Error Rate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atal, B. S.: “Automatic recognition of speakers from their voices”. Proceeding of the IEEE 64, 460–475 (1976)CrossRefGoogle Scholar
  2. 2.
    Atal, B. S.: “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of the Acoustical Society of America 55, 1304–1312 (1974)CrossRefGoogle Scholar
  3. 3.
    Bansal, R. K., Papantoni-Kazakos, P.: “An algorithm for detecting a change in stochastic process”. IEEE Trans. Information Theory IT-32, 227–235 (1986)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Brodsky, B., Darkhovsky, B. S.: Nonparametric methods in change-point problems. Kluwer Academic, Boston (1993)Google Scholar
  5. 5.
    Bullington, K., Fraser, J. M.: “Engineering aspects of TASI,” Bell Syst. Tech. J. pp. 353–364, Mar 1959Google Scholar
  6. 6.
    Canny, J.: “A computational approach to edge detection”. IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI-8, 679–698 (1986)CrossRefGoogle Scholar
  7. 7.
    Carlstein, E., Müller, H.-G., Siegmund, D.: Change-point problems. Institute of Mathematical Statistics, Hayward (1994)zbMATHGoogle Scholar
  8. 8.
    Chengalvarayan, R.: “Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition,” in Proceedings of Eurospeech’99, (Budapest), pp. 61–64, Sept. 1999Google Scholar
  9. 9.
    Chou, W., Lee, C.-H., Juang, B.-H.: “Minimum error rate training of inter-word context dependent acoustic model units in speech recognition,” in Proceedings of Int. Conf. on Spoken Language Processing, pp. 432–439, 1994Google Scholar
  10. 10.
    Furui, S.: “Cepstral analysis techniques for automatic speaker verification”. IEEE Trans. Acoust., Speech, Signal Processing 27, 254–277 (1981)CrossRefGoogle Scholar
  11. 11.
    Haigh, J. A., Mason, J. S.: “Robust voice activity detection using cepstral features,” in Proceedings of IEEE TENCON (China), pp. 321–324, 1993Google Scholar
  12. 12.
    Junqua, J. C., Reaves, B., Mak, B.: “A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize,” in Proceedings of Eurospeech, pp. 1371–1374, 1991Google Scholar
  13. 13.
    Lamel, L. F., Rabiner, L. R., Rosenberg, A. E., Wilpon, J. G.: “An improved endpoint detector for isolated word recognition”. IEEE Trans. on Acoustics, speech, and signal processing ASSP-29, 777–785 (1981)CrossRefGoogle Scholar
  14. 14.
    Lee, C.-H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E.: “Improved acoustic modeling for large vocabulary speech recognition”. Computer Speech and Language 6, 103–127 (1992)CrossRefGoogle Scholar
  15. 15.
    Li, Q.: “A detection approach to search-space reduction for HMM state alignment in speaker verification” . IEEE Trans. on Speech and Audio Processing 9, 569–578 (2001)CrossRefGoogle Scholar
  16. 16.
    Li, Q. and Tsai, A.: “A language-independent personal voice controller with embedded speaker verification,” in Eurospeech’99 (Budapest, Hungary), Sept. 1999Google Scholar
  17. 17.
    Li, Q. and Tsai, A.: “A matched filter approach to endpoint detection for robust speaker verification,” in Proceedings of IEEE Workshop on Automatic Identification (Summit, NJ), Oct. 1999Google Scholar
  18. 18.
    Li, Q., Zheng, J., Tsai, A., Zhou, Q.: “Robust endpoint detection and energy normalization for real-time speech and speaker recognition”. IEEE Trans. on Speech and Audio Processing 10, 146–157 (2002)CrossRefGoogle Scholar
  19. 19.
    Li, Q., Zheng, J., Zhou, Q., and Lee, C.-H.: “A robust, real-time endpoint detector with energy normalization for ASR in adverse environments,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Salt Lake City), May 2001Google Scholar
  20. 20.
    Petrou, M., Kittler, J.: “Optimal edge detectors for ramp edges”. IEEE Trans. on Pattern Analysis and Machine Intelligence 13, 483–491 (1991)CrossRefGoogle Scholar
  21. 21.
    Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. PTR Prentice Hall, Englewood Cliffs (1993)Google Scholar
  22. 22.
    Rabiner, L. R., Sambur, M. R.: “An algorithm for determining the endpoints of isolated utterances”. The Bell System Technical Journal 54, 297–315 (1975)Google Scholar
  23. 23.
    Spacek, L. A.: “Edge detection and motion detection”. Image Vision Comput. 4, 43 (1986)CrossRefGoogle Scholar
  24. 24.
    Tanyer, S. G., Özer, H.: “Voice activity detection in nonstationary noise”. IEEE Trans. on Speech and Audio Processing 8, 478–482 (2000)CrossRefGoogle Scholar
  25. 25.
    Wald, A.: Sequential analysis. Chapman & Hall, NY (1947)zbMATHGoogle Scholar
  26. 26.
    Wilpon, J. G., Rabiner, L. R., Martin, T.: “An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints”. AT&T Bell Laboratories Technical Journal 63, 479–498 (1984)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg  2012

Authors and Affiliations

  1. 1.Li Creative Technologies (LcT), IncFlorham ParkUSA

Personalised recommendations