Abstract
In this paper, an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments. The Gaussian 1/f process, a mathematical model for statistically self-similar random processes based on fractals, is selected to model both the speech and the background noise. An optimal Bayesian two-class classifier is developed to discriminate them by their 1/f wavelet coefficients with Karhunen-Loeve-type properties. Multiple templates are trained for the speech signal, and the parameters of the background noise can be dynamically adapted in runtime to model the variation of both the speech and the noise. In our experiments, a 10-minute long speech with different types of noises ranging from 20dB to 5dB is tested using this new detection method. A high performance with over 90% detection accuracy is achieved when average SNR is about 10dB.
Similar content being viewed by others
References
Tanrikulu O, Baykal B, Constantinides A G,et al. Residual echo signal in critically sampled sub-band acoustic echo cancellers based on IIR and FIR filter banks.IEEE Trans. Signal Processing, 1997, 45(4): 901–912.
Lamel L F, Labiner L R, Rosenberg A E,et al. An improved endpoint detector for isolated word recognition.IEEE Trans. Acoustic, speech and Signal Processing, 1981, 29(4): 777–785.
Savoji M H. A robust algorithm for accurate endpointing of speech.Speech Communication, 1989, 8: 45–60.
Junqua J C, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise.IEEE Trans. Speech and Audio Processing, 1994, 2(3): 406–412.
Robiner L R, Sambur M R. Voiced-unvoiced-silence detection using the Itakura LPC distance measures. InProc. IEEE Int. Conf. Acoustic, Speech, Signal Processing, May, 1977, pp. 323–326.
Junqua J C, Reaves B, Mar B. A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. InProc. Europseech’91, 1991, pp. 1371–1374.
Abdallah I, Montresor S, Baudry M. Robust speech/non speech detection in adverse conditions using an entropy based estimator. InProc. IEEE Int. Conf. Digital Signal Processing, July, 1997, 2: 757–760.
Wilpon J G, Rabiner L R. Application of hidden Markov models to automatic speech endpoint detection.Computer, Speech and Language, 1987, 2: 321–341.
Tanyer S G, Ozer H. Voice activity detection in nonstationary noise.IEEE Trans. Speech and Audio Processing, 2000, 8(4): 478–482.
Kumar A, Mullick S K. Nonlinear dynamical analysis of speech.J. Acoustical Society of America, 1996, 100(1): 615–629.
Manderbrot B B. The Fractal Geometry of Nature. Freeman, 1982.
Maragos P. Fractal aspects of speech signals: Dimension and interpolation. InProc. Int. Conf. Acoustic, Speech, Signal Processing, May, 1991, pp. 417–420.
Wornell G. Wavelet-based representations for the 1/f family of fractal processes.Proc. IEEE, 1993, 81 (10): 1428–1450.
Seck M, Bimbot F, Zugaj D,et al. Two-class signal segmentation for speech/music detection in audio tracks. InProc. Eurospeech’99, Sept., 1999, Vol. 6, pp. 2801–2804.
Author information
Authors and Affiliations
Corresponding author
Additional information
WANG Fan was born in 1974. He received his B.S. degree in computer science and technology from the Department of Computer Science and technology, Tsinghua University in 1998. He is currently a Ph.D. candidate and research assistant, majoring in computer applications. His current research interests focus on robust speech recognition and understanding. In 2000, he received the Excellent Student Paper Award in the ’2000 International Symposium on Chinese Spoken Language Processing (ISCSLP’2000). He is an ACM member and the chair of Tsinghua ACM Student Chapter.
ZHENG Fang is currently an associate professor of Tsinghua University. He is director of the Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems. He received his B.S., M.S. and Ph.D. degrees in computer science and technology from Tsinghua University, in 1990, 1992 and 1997 respectively. He has been working on speech recognition and understanding at the Department of Computer science and Technology, Tsinghua University, since 1988. He has published over 80 technical papers on acoustic/language modeling, isolated/continuous speech recognition, keyword spotting, dictating, language understanding and so on. He is an IEEE member and a member of the Editorial Committee of the Journal of Chinese Information Processing.
WU Wenhu received his B.S. degree in automation in 1961 from Tsinghua University. Since then, he has been with Tsinghua University, where he is currently a full professor in the Department of Computer Science and Technology. His major research interests include speech recognition and language understanding, speech synthesis, digital processing of speech signals, and so on. As a principal or key undertaker, he has been taking part in many state important tasks and the ‘863’ Hi-Tech projects and has been awarded several times.
Rights and permissions
About this article
Cite this article
Wang, F., Zheng, F. & Wu, W. Speech detection in non-stationary noise based on the 1/f process. J. Comput. Sci. & Technol. 17, 83–89 (2002). https://doi.org/10.1007/BF02949828
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02949828