Abstract
This chapter addresses the problem of speech processing, which is robust against noise for applications in communication terminals as front-ends to digital networks. By studying the limitations of auditory perception, particularly how it reduces the information rate of the speech signal through masking constraints, improvements may be made in the efficiency of: (1) speaker/speech recognition, (2) wide-band speech coding. In the first case, speech enhancement techniques derived from spectral subtraction are used not only for noise reduction, often unreliable, but also for the detection of missing (masked by noise or unreliable) features. We show that this detection technique can be combined with compensation techniques for missing features in the statistical models (Gaussian Mixture Models (GMMs) and Hidden Markov Models (HMMs)) to improve recognition results. In the second case, the spectral subtraction technique is used to design an integrated speech enhancement/coding system incorporating both ambient noise and quantization noise masking. The advantage of the method presented in this chapter over previous approaches is that perceptual enhancement and coding, usually implemented as a cascade of two separate systems are combined. This leads to a decreased computational load while controlling bit rate and maintaining acceptable speech intelligibility and quality.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals. New York: Macmillan Publishing Company, 1993.
G. Davis, ed., Noise Reduction in Speech Applications. CRC Press, Boca Raton, 2002.
J.-C. Junqua and J.-P. Haton, Robustness in Automatic Speech Recognition. Boston: Kluwer Academic Publishers, 1996.
N. Virag, “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Trans. on Speech and Audio Processing, vol. 7, pp. 126–137, March 1999.
M. Cooke, A. Morris, and P. Green, “Missing data techniques for robust speech recognition,” in ICASSP’97, (Munich, Germany), pp. 863–866, April 1997.
R. P. Lippman and B. A. Carlson, “Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise,” in EUROSPEECH’97, (Rhodes, Greece), pp. 37–40, Sept. 22–25, 1997.
A. Drygajlo and M. El-Maliki, “Use of generalized spectral subtraction and missing feature compensation for robust speaker verification,” in Workshop on Speaker Recognition and its Commercial and Forensic Applications, (Avignon, France), pp. 80–83, April 20–23, 1998.
A. Drygajlo and M. El-Maliki, “Spectral subtraction and missing feature modeling for speaker verification,” in Signal Processing IX, Theories and Applications (EURASIP), (Rhodes, Greece), pp. 355–358, 1998.
P. Renevey and A. Drygajlo, “Missing feature theory and probabilistic estimation of clean speech components for robust speech recognition,” in EUROSPEECH’99, (Budapest, Hungary), pp. 2627–2630, Sept. 5–9, 1999.
A. Drygajlo and B. Carnero, “Integrated speech enhancement and coding in time-frequency domain,” in ICASSP’97, (Munich, Germany), pp. 1183–1186, April 1997.
B. Carnero and A. Drygajlo, “Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms,” IEEE Trans. Signal Processing, vol. 47, pp. 1622–1635, June 1999.
M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in IEEE Conf. on Acoust., Speech, Signal Processing, (Washington, DC), pp. 208–211, April 1979.
A. Vizinho, P. Green, M. Cooke, and L. Josifovski, “Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study,” in EUROSPEECH’99, (Budapest, Hungary), pp. 2407–2410, Sept. 5–9, 1999.
A. Drygajlo and M. El-Maliki, “Speaker verification in noisy environments with combined spectral subtraction and missing feature theory,” in ICASSP’98, (Seattle, USA), pp. 121–124, May 12–15, 1998.
M. El-Maliki and A. Drygajlo, “Missing features detection and handling for robust speaker verification,” in EUROSPEECH’99, (Budapest, Hungary), pp. 975–978, Sept. 5–9, 1999.
M. El-Maliki and A. Drygajlo, “Missing feature detection and compensation for GMM-based speaker verification in noise,” in COST 250 Workshop on Speaker Recognition in Telephony, (Rome, Italy), November 10–12, 1999.
J. Ortega-García and J. Gonzàlez-Rodríguez, “Overview of speech enhancement techniques for automatic speaker recognition,” in ICSLP’96, (Philadelphia, USA), pp. 929–932, Oct. 1996.
D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol. 17, pp. 91–108, 1995.
D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture models,” IEEE Trans. on Speech Audio Processing, vol. 3, pp. 72–83, 1995.
Y. Ephraim and D. Malah, “Speech enhancement using a minimum meansquare error short-time spectral amplitude estimator,” IEEE Trans. Acoustics, Speech, and Signal Proc., vol. 32, pp. 1109–1121, Dec. 1984.
M. El-Maliki, Speaker Verification with Missing Features in Noisy Environments. Ph.d. thesis, EPFL, Lausanne, Switzerland, 2000.
M. Cooke, P. Green, L. Josifovski, and A. Vizinho, “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Communication, vol. 34, no. 3, pp. 267–285, 2001.
M. Cooke, P. Green, and M. Crawford, “Handling missing data in speech recognition,” in ICSLP-94, (Yokohama, Japan), pp. 1555–1558, 1994.
M. El-Maliki, P. Renevey, and A. Drygajlo, “Speaker verification for noisy GSM quality speech,” in International COST 254 Workshop on Intelligent Communication Technologies and Applications, with Emphasis on Mobile Communications, (Neuchâtel, Switzerland), pp. 303–306, May 5–7, 1999.
P. Renevey and A. Drygajlo, “Estimation of unreliable data for robust speech recognition,” in ICASSP’2000, (Istanbul, Turkey), pp. 1731–1734, June 2000.
M. J. F. Gales and S. J. Young, “HMM recognition in noise using parallel model combination,” in EUROSPEECH’93, (Berlin, Germany), pp. 837–840, Sept. 21–23, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Kluwer Academic Publishers
About this chapter
Cite this chapter
Drygajlo, A. (2003). Speech Coding and Recognition in Noisy Environments for Communication Terminals. In: Tasič, J.F., Najim, M., Ansorge, M. (eds) Intelligent Integrated Media Communication Techniques. Springer, Boston, MA. https://doi.org/10.1007/0-306-48718-7_10
Download citation
DOI: https://doi.org/10.1007/0-306-48718-7_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-7552-0
Online ISBN: 978-0-306-48718-7
eBook Packages: Springer Book Archive