Skip to main content
Log in

Speech detection in non-stationary noise based on the 1/f process

  • Correspondence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper, an effective and robust active speech detection method is proposed based on the 1/f process technique for signals under non-stationary noisy environments. The Gaussian 1/f process, a mathematical model for statistically self-similar random processes based on fractals, is selected to model both the speech and the background noise. An optimal Bayesian two-class classifier is developed to discriminate them by their 1/f wavelet coefficients with Karhunen-Loeve-type properties. Multiple templates are trained for the speech signal, and the parameters of the background noise can be dynamically adapted in runtime to model the variation of both the speech and the noise. In our experiments, a 10-minute long speech with different types of noises ranging from 20dB to 5dB is tested using this new detection method. A high performance with over 90% detection accuracy is achieved when average SNR is about 10dB.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Tanrikulu O, Baykal B, Constantinides A G,et al. Residual echo signal in critically sampled sub-band acoustic echo cancellers based on IIR and FIR filter banks.IEEE Trans. Signal Processing, 1997, 45(4): 901–912.

    Article  Google Scholar 

  2. Lamel L F, Labiner L R, Rosenberg A E,et al. An improved endpoint detector for isolated word recognition.IEEE Trans. Acoustic, speech and Signal Processing, 1981, 29(4): 777–785.

    Article  Google Scholar 

  3. Savoji M H. A robust algorithm for accurate endpointing of speech.Speech Communication, 1989, 8: 45–60.

    Article  Google Scholar 

  4. Junqua J C, Mak B, Reaves B. A robust algorithm for word boundary detection in the presence of noise.IEEE Trans. Speech and Audio Processing, 1994, 2(3): 406–412.

    Article  Google Scholar 

  5. Robiner L R, Sambur M R. Voiced-unvoiced-silence detection using the Itakura LPC distance measures. InProc. IEEE Int. Conf. Acoustic, Speech, Signal Processing, May, 1977, pp. 323–326.

  6. Junqua J C, Reaves B, Mar B. A study of endpoint detection algorithms in adverse conditions: Incidence on a DTW and HMM recognize. InProc. Europseech’91, 1991, pp. 1371–1374.

  7. Abdallah I, Montresor S, Baudry M. Robust speech/non speech detection in adverse conditions using an entropy based estimator. InProc. IEEE Int. Conf. Digital Signal Processing, July, 1997, 2: 757–760.

  8. Wilpon J G, Rabiner L R. Application of hidden Markov models to automatic speech endpoint detection.Computer, Speech and Language, 1987, 2: 321–341.

    Article  Google Scholar 

  9. Tanyer S G, Ozer H. Voice activity detection in nonstationary noise.IEEE Trans. Speech and Audio Processing, 2000, 8(4): 478–482.

    Article  Google Scholar 

  10. Kumar A, Mullick S K. Nonlinear dynamical analysis of speech.J. Acoustical Society of America, 1996, 100(1): 615–629.

    Article  Google Scholar 

  11. Manderbrot B B. The Fractal Geometry of Nature. Freeman, 1982.

  12. Maragos P. Fractal aspects of speech signals: Dimension and interpolation. InProc. Int. Conf. Acoustic, Speech, Signal Processing, May, 1991, pp. 417–420.

  13. Wornell G. Wavelet-based representations for the 1/f family of fractal processes.Proc. IEEE, 1993, 81 (10): 1428–1450.

    Article  Google Scholar 

  14. Seck M, Bimbot F, Zugaj D,et al. Two-class signal segmentation for speech/music detection in audio tracks. InProc. Eurospeech’99, Sept., 1999, Vol. 6, pp. 2801–2804.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Wang.

Additional information

WANG Fan was born in 1974. He received his B.S. degree in computer science and technology from the Department of Computer Science and technology, Tsinghua University in 1998. He is currently a Ph.D. candidate and research assistant, majoring in computer applications. His current research interests focus on robust speech recognition and understanding. In 2000, he received the Excellent Student Paper Award in the ’2000 International Symposium on Chinese Spoken Language Processing (ISCSLP’2000). He is an ACM member and the chair of Tsinghua ACM Student Chapter.

ZHENG Fang is currently an associate professor of Tsinghua University. He is director of the Center of Speech Technology, State Key Laboratory of Intelligent Technology and Systems. He received his B.S., M.S. and Ph.D. degrees in computer science and technology from Tsinghua University, in 1990, 1992 and 1997 respectively. He has been working on speech recognition and understanding at the Department of Computer science and Technology, Tsinghua University, since 1988. He has published over 80 technical papers on acoustic/language modeling, isolated/continuous speech recognition, keyword spotting, dictating, language understanding and so on. He is an IEEE member and a member of the Editorial Committee of the Journal of Chinese Information Processing.

WU Wenhu received his B.S. degree in automation in 1961 from Tsinghua University. Since then, he has been with Tsinghua University, where he is currently a full professor in the Department of Computer Science and Technology. His major research interests include speech recognition and language understanding, speech synthesis, digital processing of speech signals, and so on. As a principal or key undertaker, he has been taking part in many state important tasks and the ‘863’ Hi-Tech projects and has been awarded several times.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Zheng, F. & Wu, W. Speech detection in non-stationary noise based on the 1/f process. J. Comput. Sci. & Technol. 17, 83–89 (2002). https://doi.org/10.1007/BF02949828

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02949828

Keywords

Navigation