Reconstructing Noise-Corrupted Spectrographic Components for Robust Speech Recognition

Raj, Bhiksha; Singh, Rita

doi:10.1007/978-3-642-21317-5_6

Bhiksha Raj³ &
Rita Singh³

859 Accesses
1 Citations

Abstract

An effective solution for missing-feature problems is the imputation of the missing components, based on the reliable components and prior knowledge about the distribution of the data. In this chapter we will describe various imputationmethods, including those that consider correlation across time and those that do not, and present experimental evaluation of the techniques. We will demonstrate how imputation of missing spectrographic components prior to cepstral feature computation can in fact be superior to techniques that attempt to perform computation directly in the domain with the incomplete data, due to the superior performance obtained with cepstral features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. P. Dempster, N.M.L., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
Google Scholar
A. Vizhinho P. Green, M.C., Josifovski, L.: Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: An integrated study. In: Proc. Eurospeech, pp. 2407–2410. Budapest, Hungary (1999)
Google Scholar
Barker, J., Josifovski, L., Cooke, M.P., Greene, P.D.: Soft decisions in missing data techniques for robust automatic speech recognition. In: Proc. Intl Conf. on Speech and Language Processing. Beijing, China (2000)
Google Scholar
Boll, S.F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing 27, 113–120 (1979)
Article Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Communication 34, 267–285 (2001)
Article MATH Google Scholar
Cooke, M.P., Green, P.G., Crawford, M.D.: Handling missing data in speech recognition. In: Proc. Intl. Conference on Speech and Language Processing, pp. 1555–1558. Yokohama, Japan (1994)
Google Scholar
Cooke, M.P., Morris, A., Green, P.D.: Missing data techniques for robust speech recognition. In: Proc. IEEE Conf. on Acoustics, Speech and Signal Processing. Munich, Germany (1997)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Processing 28, 357–366 (1980)
Google Scholar
Fletcher, H.: Speech and Hearing in Communication. Van Nostrand, New York (1953)
Google Scholar
Gales, M.J.F., Young, S.J.: Robust continuous speech recognition using parallel model combination. IEEE Tansactions on Speech and Audio Processing 4, 352–359 (1996)
Article Google Scholar
Gemmeke, J.F., Van hamme, H., Cranen, B., Boves, L.: Compressive sensing for missing data imputation in noise robust speech recognition. IEEE Journal of Selected Topics in Signal Processing 4(2), 272–287 (2010)
Google Scholar
Gemmeke, J.F., Virtanen, T.: Noise robust exemplar based robust speech recognition. In: IEEE Conf. on Acoustics, Speech and Signal Processing. Dallas, USA (2010)
Google Scholar
J. Barker N. Ma, A.C., Cooke, M.: Speech fragment decoding techniques for simultaneous speaker identification and speech recognition. Computer Speech and Language 24, 94–111 (2010)
Google Scholar
Josifovski, L., Cooke, M., Green, P., Vizinho, A.: State based imputation of missing data for robust speech recognition and speech enhancement. In: Proc. Eurospeech. Budapest, Hungary (1999)
Google Scholar
LeRoux, J., de Chevigne, A.: Computational auditory induction by missing-data non-negative matrix factorization. In: ISCA tutorial and research workshop on statistical and perceptual audition (SAPA). Brisbane, Australia (2008)
Google Scholar
Lippmann, R., Carlson, B.: Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise. In: Proc. Eurospeech, pp. 37–40. Rhodes, Greece (1997)
Google Scholar
Miller, G.A., Licklider, J.C.R.: The intelligibility of interrupted speech. Journal of the Acoustic Society of America 22, 167–173 (1950)
Article Google Scholar
Ming, J., Lin, J., Smith, F.J.: A posterior union model with applications to robust speech and speaker recognition. EURASIP Journal on Applied Signal Processing pp. 1–12 (2006)
Google Scholar
Moreno, P.: Speech recognition in Noisy Environments. Ph.D. Thesis, Carnegie Mellon University (1996)
Google Scholar
P. Price W. M. Fisher, J.B., Pallet, D.S.: The DARPA 1000 word resource management database for continuous speech recognition. In: Proc. IEEE Conf. on Acoustics Speech and Signal Processing, pp. 651–654. Seattle, Wa. (1998)
Google Scholar
Palomaki, K.J., Brown, G.J., Barker, J.: Techniques for handling convolutional distortion with missing data automatic speech recognition. Speech Communication 43, 123–142 (2004)
Article Google Scholar
Papoulis, A.: Probability, Random Variables, and Stochastic Processes. McGraw Hill Inc., New York (1991)
Google Scholar
Raj, B.: Reconstruction of incomplete spectrograms for robust speech recognition. Ph.D. thesis, Carnegie Mellon University (2000)
Google Scholar
Raj, B., Parikh, V., Stern, R.M.: The effects of background music on speech recognition accuracy. In: Proc. IEEE Conf. on Acoustics, Speech and Signal Processing. Munich, Germany (1997)
Google Scholar
Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Communication 43, 275–296 (2004)
Article Google Scholar
Raj, B., Singh, R.: Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition. In: Automatic Speech Recognition and Understanding Workshop. Puerto Rico (2006)
Google Scholar
Raj, B., Virtanen, T., Chaudhuri, S., Singh, R.: Non-negative matrix factorization based compensation of music for automatic speech recognition. In: Proceedings of Interspeech. Makuhari Japan (2010)
Google Scholar
Renevey, P.: Speech in noisy conditions using missing feature approach. Ph.D. Thesis EPFL No. 2303, Swiss Federal Institute of Technology (2000)
Google Scholar
Reyes-Gomez, M.J., Jojic, N., Ellis, D.P.W.: Towards single-channel unsupervised source separation of speech mixtures: The layered harmonics/formants separation/tracking model. In: ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA). Jeju, Korea (2004)
Google Scholar
Seltzer, M.L., Raj, B., Stern, R.M.: A bayesian framework for spectrographic mask estimation for missing feature speech recognition. Speech Communication 43, 379–393 (2004)
Article Google Scholar
Shaugnessey, D.O.: Speech Communication – Human and Machine. Addison Wesley (1987)
Google Scholar
Smaragdis, P., Raj, B., Shashanka, M.: Missing data imputation for spectral audio signals. In: IEEE Intl. Workshop on Machine Learning for Signal Processing. Grenoble, France (2009)
Google Scholar
Wang, D., Brown, G. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press (2006)
Google Scholar
Warren, R.M., Reiner, K.R., Bashford, J.A., Brubaker, B.S.: Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits. Perception and Pscychophysics 57, 175–182 (1995)
Article Google Scholar
Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: Robust speech recognition using cepstral minimum-mean-square-error noise suppressor. IEEE Transactions on Acoustics, Speech and Language Processing 16(5) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
Bhiksha Raj & Rita Singh

Authors

Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar
Rita Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhiksha Raj .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Raj, B., Singh, R. (2011). Reconstructing Noise-Corrupted Spectrographic Components for Robust Speech Recognition. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_6
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics