Speech detection in non-stationary noise based on the 1/f process

Wang, Fan; Zheng, Fang; Wu, Wenhu

doi:10.1007/BF02949828

Speech detection in non-stationary noise based on the 1/f process

Correspondence
Published: January 2002

Volume 17, pages 83–89, (2002)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Fan Wang¹,
Fang Zheng¹ &
Wenhu Wu¹

52 Accesses
Explore all metrics

Abstract

In this paper, an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments. The Gaussian 1/f process, a mathematical model for statistically self-similar random processes based on fractals, is selected to model both the speech and the background noise. An optimal Bayesian two-class classifier is developed to discriminate them by their 1/f wavelet coefficients with Karhunen-Loeve-type properties. Multiple templates are trained for the speech signal, and the parameters of the background noise can be dynamically adapted in runtime to model the variation of both the speech and the noise. In our experiments, a 10-minute long speech with different types of noises ranging from 20dB to 5dB is tested using this new detection method. A high performance with over 90% detection accuracy is achieved when average SNR is about 10dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Article 01 February 2024

Mahadevaswamy Shanthamallappa, Kiran Puttegowda, … Sudheesh Kannur Vasudeva Rao

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Article 15 December 2023

Shabnam Firooz, Farshad Almasganj & Yasser Shekofteh

Glottal Activity Detection from the Speech Signal Using Multifractal Analysis

Article 13 September 2019

G. Jyothish Lal, E. A. Gopalakrishnan & D. Govind

References

Tanrikulu O, Baykal B, Constantinides A G,et al. Residual echo signal in critically sampled sub-band acoustic echo cancellers based on IIR and FIR filter banks.IEEE Trans. Signal Processing, 1997, 45(4): 901–912.
Article Google Scholar
Lamel L F, Labiner L R, Rosenberg A E,et al. An improved endpoint detector for isolated word recognition.IEEE Trans. Acoustic, speech and Signal Processing, 1981, 29(4): 777–785.
Article Google Scholar
Savoji M H. A robust algorithm for accurate endpointing of speech.Speech Communication, 1989, 8: 45–60.
Article Google Scholar
Junqua J C, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise.IEEE Trans. Speech and Audio Processing, 1994, 2(3): 406–412.
Article Google Scholar
Robiner L R, Sambur M R. Voiced-unvoiced-silence detection using the Itakura LPC distance measures. InProc. IEEE Int. Conf. Acoustic, Speech, Signal Processing, May, 1977, pp. 323–326.
Junqua J C, Reaves B, Mar B. A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. InProc. Europseech’91, 1991, pp. 1371–1374.
Abdallah I, Montresor S, Baudry M. Robust speech/non speech detection in adverse conditions using an entropy based estimator. InProc. IEEE Int. Conf. Digital Signal Processing, July, 1997, 2: 757–760.
Wilpon J G, Rabiner L R. Application of hidden Markov models to automatic speech endpoint detection.Computer, Speech and Language, 1987, 2: 321–341.
Article Google Scholar
Tanyer S G, Ozer H. Voice activity detection in nonstationary noise.IEEE Trans. Speech and Audio Processing, 2000, 8(4): 478–482.
Article Google Scholar
Kumar A, Mullick S K. Nonlinear dynamical analysis of speech.J. Acoustical Society of America, 1996, 100(1): 615–629.
Article Google Scholar
Manderbrot B B. The Fractal Geometry of Nature. Freeman, 1982.
Maragos P. Fractal aspects of speech signals: Dimension and interpolation. InProc. Int. Conf. Acoustic, Speech, Signal Processing, May, 1991, pp. 417–420.
Wornell G. Wavelet-based representations for the 1/f family of fractal processes.Proc. IEEE, 1993, 81 (10): 1428–1450.
Article Google Scholar
Seck M, Bimbot F, Zugaj D,et al. Two-class signal segmentation for speech/music detection in audio tracks. InProc. Eurospeech’99, Sept., 1999, Vol. 6, pp. 2801–2804.

Download references

Author information

Authors and Affiliations

Center of Speech Technology, State Key Laboratory of Intelligent, Technology and Systems Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, P.R. China
Fan Wang, Fang Zheng & Wenhu Wu

Authors

Fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Wenhu Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fan Wang.

Additional information

WANG Fan was born in 1974. He received his B.S. degree in computer science and technology from the Department of Computer Science and technology, Tsinghua University in 1998. He is currently a Ph.D. candidate and research assistant, majoring in computer applications. His current research interests focus on robust speech recognition and understanding. In 2000, he received the Excellent Student Paper Award in the ’2000 International Symposium on Chinese Spoken Language Processing (ISCSLP’2000). He is an ACM member and the chair of Tsinghua ACM Student Chapter.

ZHENG Fang is currently an associate professor of Tsinghua University. He is director of the Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems. He received his B.S., M.S. and Ph.D. degrees in computer science and technology from Tsinghua University, in 1990, 1992 and 1997 respectively. He has been working on speech recognition and understanding at the Department of Computer science and Technology, Tsinghua University, since 1988. He has published over 80 technical papers on acoustic/language modeling, isolated/continuous speech recognition, keyword spotting, dictating, language understanding and so on. He is an IEEE member and a member of the Editorial Committee of the Journal of Chinese Information Processing.

WU Wenhu received his B.S. degree in automation in 1961 from Tsinghua University. Since then, he has been with Tsinghua University, where he is currently a full professor in the Department of Computer Science and Technology. His major research interests include speech recognition and language understanding, speech synthesis, digital processing of speech signals, and so on. As a principal or key undertaker, he has been taking part in many state important tasks and the ‘863’ Hi-Tech projects and has been awarded several times.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Zheng, F. & Wu, W. Speech detection in non-stationary noise based on the 1/f process. J. Comput. Sci. & Technol. 17, 83–89 (2002). https://doi.org/10.1007/BF02949828

Download citation

Received: 04 December 2000
Revised: 18 June 2001
Issue Date: January 2002
DOI: https://doi.org/10.1007/BF02949828

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech detection in non-stationary noise based on the 1/f process

Abstract

Access this article

Similar content being viewed by others

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Glottal Activity Detection from the Speech Signal Using Multifractal Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speech detection in non-stationary noise based on the 1/f process

Abstract

Access this article

Similar content being viewed by others

Robust Automatic Speech Recognition Using Wavelet-Based Adaptive Wavelet Thresholding: A Review

Improvement of automatic speech recognition systems utilizing 2D adaptive wavelet transformation applied to recurrence plot of speech trajectories

Glottal Activity Detection from the Speech Signal Using Multifractal Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation