Abstract
Often the first step in speech signal processing is the use of endpoint detection to separate speech and silence signals for further processing. This topic has been studied for several decades; however, as wireless communications and VoIP phones are becoming more and more popular, more background and system noises are affecting communication channels, which poses a challenge to the existing algorithms; therefore new and robust algorithms are needed.
Keywords
- Energy Normalization
- Automatic Speech Recognition
- Speaker Recognition
- Voice Over Internet Protocol
- Word Error Rate
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Atal, B. S.: “Automatic recognition of speakers from their voices”. Proceeding of the IEEE 64, 460–475 (1976)
Atal, B. S.: “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of the Acoustical Society of America 55, 1304–1312 (1974)
Bansal, R. K., Papantoni-Kazakos, P.: “An algorithm for detecting a change in stochastic process”. IEEE Trans. Information Theory IT-32, 227–235 (1986)
Brodsky, B., Darkhovsky, B. S.: Nonparametric methods in change-point problems. Kluwer Academic, Boston (1993)
Bullington, K., Fraser, J. M.: “Engineering aspects of TASI,” Bell Syst. Tech. J. pp. 353–364, Mar 1959
Canny, J.: “A computational approach to edge detection”. IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI-8, 679–698 (1986)
Carlstein, E., Müller, H.-G., Siegmund, D.: Change-point problems. Institute of Mathematical Statistics, Hayward (1994)
Chengalvarayan, R.: “Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition,” in Proceedings of Eurospeech’99, (Budapest), pp. 61–64, Sept. 1999
Chou, W., Lee, C.-H., Juang, B.-H.: “Minimum error rate training of inter-word context dependent acoustic model units in speech recognition,” in Proceedings of Int. Conf. on Spoken Language Processing, pp. 432–439, 1994
Furui, S.: “Cepstral analysis techniques for automatic speaker verification”. IEEE Trans. Acoust., Speech, Signal Processing 27, 254–277 (1981)
Haigh, J. A., Mason, J. S.: “Robust voice activity detection using cepstral features,” in Proceedings of IEEE TENCON (China), pp. 321–324, 1993
Junqua, J. C., Reaves, B., Mak, B.: “A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize,” in Proceedings of Eurospeech, pp. 1371–1374, 1991
Lamel, L. F., Rabiner, L. R., Rosenberg, A. E., Wilpon, J. G.: “An improved endpoint detector for isolated word recognition”. IEEE Trans. on Acoustics, speech, and signal processing ASSP-29, 777–785 (1981)
Lee, C.-H., Giachin, E., Rabiner, L. R., Pieraccini, R., Rosenberg, A. E.: “Improved acoustic modeling for large vocabulary speech recognition”. Computer Speech and Language 6, 103–127 (1992)
Li, Q.: “A detection approach to search-space reduction for HMM state alignment in speaker verification” . IEEE Trans. on Speech and Audio Processing 9, 569–578 (2001)
Li, Q. and Tsai, A.: “A language-independent personal voice controller with embedded speaker verification,” in Eurospeech’99 (Budapest, Hungary), Sept. 1999
Li, Q. and Tsai, A.: “A matched filter approach to endpoint detection for robust speaker verification,” in Proceedings of IEEE Workshop on Automatic Identification (Summit, NJ), Oct. 1999
Li, Q., Zheng, J., Tsai, A., Zhou, Q.: “Robust endpoint detection and energy normalization for real-time speech and speaker recognition”. IEEE Trans. on Speech and Audio Processing 10, 146–157 (2002)
Li, Q., Zheng, J., Zhou, Q., and Lee, C.-H.: “A robust, real-time endpoint detector with energy normalization for ASR in adverse environments,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Salt Lake City), May 2001
Petrou, M., Kittler, J.: “Optimal edge detectors for ramp edges”. IEEE Trans. on Pattern Analysis and Machine Intelligence 13, 483–491 (1991)
Rabiner, L., Juang, B.-H.: Fundamentals of speech recognition. PTR Prentice Hall, Englewood Cliffs (1993)
Rabiner, L. R., Sambur, M. R.: “An algorithm for determining the endpoints of isolated utterances”. The Bell System Technical Journal 54, 297–315 (1975)
Spacek, L. A.: “Edge detection and motion detection”. Image Vision Comput. 4, 43 (1986)
Tanyer, S. G., Özer, H.: “Voice activity detection in nonstationary noise”. IEEE Trans. on Speech and Audio Processing 8, 478–482 (2000)
Wald, A.: Sequential analysis. Chapman & Hall, NY (1947)
Wilpon, J. G., Rabiner, L. R., Martin, T.: “An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints”. AT&T Bell Laboratories Technical Journal 63, 479–498 (1984)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, Q.(. (2012). Robust Endpoint Detection. In: Speaker Authentication. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23731-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-23731-7_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23730-0
Online ISBN: 978-3-642-23731-7
eBook Packages: EngineeringEngineering (R0)