Skip to main content

Reconstructing Noise-Corrupted Spectrographic Components for Robust Speech Recognition

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

An effective solution for missing-feature problems is the imputation of the missing components, based on the reliable components and prior knowledge about the distribution of the data. In this chapter we will describe various imputationmethods, including those that consider correlation across time and those that do not, and present experimental evaluation of the techniques. We will demonstrate how imputation of missing spectrographic components prior to cepstral feature computation can in fact be superior to techniques that attempt to perform computation directly in the domain with the incomplete data, due to the superior performance obtained with cepstral features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. P. Dempster, N.M.L., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)

    Google Scholar 

  2. A. Vizhinho P. Green, M.C., Josifovski, L.: Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: An integrated study. In: Proc. Eurospeech, pp. 2407–2410. Budapest, Hungary (1999)

    Google Scholar 

  3. Barker, J., Josifovski, L., Cooke, M.P., Greene, P.D.: Soft decisions in missing data techniques for robust automatic speech recognition. In: Proc. Intl Conf. on Speech and Language Processing. Beijing, China (2000)

    Google Scholar 

  4. Boll, S.F.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing 27, 113–120 (1979)

    Article  Google Scholar 

  5. Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Communication 34, 267–285 (2001)

    Article  MATH  Google Scholar 

  6. Cooke, M.P., Green, P.G., Crawford, M.D.: Handling missing data in speech recognition. In: Proc. Intl. Conference on Speech and Language Processing, pp. 1555–1558. Yokohama, Japan (1994)

    Google Scholar 

  7. Cooke, M.P., Morris, A., Green, P.D.: Missing data techniques for robust speech recognition. In: Proc. IEEE Conf. on Acoustics, Speech and Signal Processing. Munich, Germany (1997)

    Google Scholar 

  8. Davis, S., Mermelstein, P.: Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences. IEEE Trans. on Acoustics, Speech, and Signal Processing 28, 357–366 (1980)

    Google Scholar 

  9. Fletcher, H.: Speech and Hearing in Communication. Van Nostrand, New York (1953)

    Google Scholar 

  10. Gales, M.J.F., Young, S.J.: Robust continuous speech recognition using parallel model combination. IEEE Tansactions on Speech and Audio Processing 4, 352–359 (1996)

    Article  Google Scholar 

  11. Gemmeke, J.F., Van hamme, H., Cranen, B., Boves, L.: Compressive sensing for missing data imputation in noise robust speech recognition. IEEE Journal of Selected Topics in Signal Processing 4(2), 272–287 (2010)

    Google Scholar 

  12. Gemmeke, J.F., Virtanen, T.: Noise robust exemplar based robust speech recognition. In: IEEE Conf. on Acoustics, Speech and Signal Processing. Dallas, USA (2010)

    Google Scholar 

  13. J. Barker N. Ma, A.C., Cooke, M.: Speech fragment decoding techniques for simultaneous speaker identification and speech recognition. Computer Speech and Language 24, 94–111 (2010)

    Google Scholar 

  14. Josifovski, L., Cooke, M., Green, P., Vizinho, A.: State based imputation of missing data for robust speech recognition and speech enhancement. In: Proc. Eurospeech. Budapest, Hungary (1999)

    Google Scholar 

  15. LeRoux, J., de Chevigne, A.: Computational auditory induction by missing-data non-negative matrix factorization. In: ISCA tutorial and research workshop on statistical and perceptual audition (SAPA). Brisbane, Australia (2008)

    Google Scholar 

  16. Lippmann, R., Carlson, B.: Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise. In: Proc. Eurospeech, pp. 37–40. Rhodes, Greece (1997)

    Google Scholar 

  17. Miller, G.A., Licklider, J.C.R.: The intelligibility of interrupted speech. Journal of the Acoustic Society of America 22, 167–173 (1950)

    Article  Google Scholar 

  18. Ming, J., Lin, J., Smith, F.J.: A posterior union model with applications to robust speech and speaker recognition. EURASIP Journal on Applied Signal Processing pp. 1–12 (2006)

    Google Scholar 

  19. Moreno, P.: Speech recognition in Noisy Environments. Ph.D. Thesis, Carnegie Mellon University (1996)

    Google Scholar 

  20. P. Price W. M. Fisher, J.B., Pallet, D.S.: The DARPA 1000 word resource management database for continuous speech recognition. In: Proc. IEEE Conf. on Acoustics Speech and Signal Processing, pp. 651–654. Seattle, Wa. (1998)

    Google Scholar 

  21. Palomaki, K.J., Brown, G.J., Barker, J.: Techniques for handling convolutional distortion with missing data automatic speech recognition. Speech Communication 43, 123–142 (2004)

    Article  Google Scholar 

  22. Papoulis, A.: Probability, Random Variables, and Stochastic Processes. McGraw Hill Inc., New York (1991)

    Google Scholar 

  23. Raj, B.: Reconstruction of incomplete spectrograms for robust speech recognition. Ph.D. thesis, Carnegie Mellon University (2000)

    Google Scholar 

  24. Raj, B., Parikh, V., Stern, R.M.: The effects of background music on speech recognition accuracy. In: Proc. IEEE Conf. on Acoustics, Speech and Signal Processing. Munich, Germany (1997)

    Google Scholar 

  25. Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Communication 43, 275–296 (2004)

    Article  Google Scholar 

  26. Raj, B., Singh, R.: Reconstructing spectral vectors with uncertain spectrographic masks for robust speech recognition. In: Automatic Speech Recognition and Understanding Workshop. Puerto Rico (2006)

    Google Scholar 

  27. Raj, B., Virtanen, T., Chaudhuri, S., Singh, R.: Non-negative matrix factorization based compensation of music for automatic speech recognition. In: Proceedings of Interspeech. Makuhari Japan (2010)

    Google Scholar 

  28. Renevey, P.: Speech in noisy conditions using missing feature approach. Ph.D. Thesis EPFL No. 2303, Swiss Federal Institute of Technology (2000)

    Google Scholar 

  29. Reyes-Gomez, M.J., Jojic, N., Ellis, D.P.W.: Towards single-channel unsupervised source separation of speech mixtures: The layered harmonics/formants separation/tracking model. In: ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA). Jeju, Korea (2004)

    Google Scholar 

  30. Seltzer, M.L., Raj, B., Stern, R.M.: A bayesian framework for spectrographic mask estimation for missing feature speech recognition. Speech Communication 43, 379–393 (2004)

    Article  Google Scholar 

  31. Shaugnessey, D.O.: Speech Communication – Human and Machine. Addison Wesley (1987)

    Google Scholar 

  32. Smaragdis, P., Raj, B., Shashanka, M.: Missing data imputation for spectral audio signals. In: IEEE Intl. Workshop on Machine Learning for Signal Processing. Grenoble, France (2009)

    Google Scholar 

  33. Wang, D., Brown, G. (eds.): Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press (2006)

    Google Scholar 

  34. Warren, R.M., Reiner, K.R., Bashford, J.A., Brubaker, B.S.: Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits. Perception and Pscychophysics 57, 175–182 (1995)

    Article  Google Scholar 

  35. Yu, D., Deng, L., Droppo, J., Wu, J., Gong, Y., Acero, A.: Robust speech recognition using cepstral minimum-mean-square-error noise suppressor. IEEE Transactions on Acoustics, Speech and Language Processing 16(5) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bhiksha Raj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Raj, B., Singh, R. (2011). Reconstructing Noise-Corrupted Spectrographic Components for Robust Speech Recognition. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics