Abstract
Unexpected words attract listener’s attention. They are information-rich and getting them right is important for human communication. In the automatic recognition of speech (ASR), words that are not in the expected lexicon of the machine are typically substituted by some acoustically similar but nevertheless wrong words. The article discusses reasons for this undesirable behavior of the machine, describes some known examples of dealing with the unexpected words in human speech perception and their implications, and proposes an alternative architecture of ASR that could alleviate some of the problems with the unexpected acoustic inputs. Some published experimental results from using this alternative architecture are given.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Klatt, D.H.: Review of the ARPA speech understanding project. J. Acoust. Soc. Am. 62, 1345–1366 (1977)
Chase, L.L.: Error-Responsive Feedback Mechanism for Speech Recognizers, PhD Thesis, Carnegie-Mellon University
Allen, J.B.: Articulation and Intelligibility. Morgan & Claypool (2005)
Van Petten, C., et al.: Time course of word identification and semantic integration in spoken language. J. Experimental Psychology: Learning, Memory, and Cognition 25(2) (1999)
Boothroyd, A.: Speech perception and sensorineural hearing loss. In: Ross, M., Giolas, G. (eds.) Auditory Management of Hearing-Impaired Children, University Park, Baltimore, MD (1978)
Boothroyd, A., Nittrouer, S.: Mathematical treatment of context effects in phoneme and word recognition. J. Acoust. Soc. Am. 84(1), 101–114 (1988)
Miller, G.A., Heise, G.A., Lichten, W.: The intelligibility of speech as a function of the context of the test material. J. Exp. Psychol. 41, 329–335 (1951)
Grant, K.W., Seitz, P.F.: The recognition of isolated words ands words in sentences: Individual variability in the use of sentence context. J. Acoust. Soc. Am. 107(2)
Rankovic, C., Allen, J.B.: Study of Speech and Hearing in Bell Telephone Laboratories: The Fletcher Years. In: CD ROM with Correspondence, Internal Reports and Notebooks of R. Galt (1917-1933). Acoustical Society of America, Melville (2000)
Bourlard, H., Wellekens, C.J.: Links between Markov Models and Multilayer Perceptrons. In: Touretzky, D. (ed.) IEEE Conference on Neural Information Processing Systems, 1988, Denver, CO, pp. 502–510. Morgan-Kaufmann Publishers, San Francisco (1989)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Ketabdar, H., Hannemann, M., Hermansky, H.: Detection of Out-of-Vocabulary Words in Posterior Based ASR. In: Proceedings of the International Conference on Spoken Language Processing, Antwerp, Belgium (2007)
Wessel, F., et al.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech and Audio Processing 9(3), 288–298 (2001)
White, C., et al.: Confidence Estimation, OOV Detection And Language ID Using Phone-To-Word Transduction And Phone-Level Alignments. In: Proc. ICASSP (2008)
Burget, L., et al.: Combination Of Strongly And Weakly Constrained Recognizers For Reliable Detection Of OOVs. In: Proc. ICASSP (2008)
Kombrink, S., et al.: Posterior-based Out of Vocabulary Word Detection in Telephone Speech. In: Proc. Interspeech 2009, Brighton, U.K (2009)
Hannemann, M., et al.: Similarity scoring for recognized repeated Out-of-Vocabulary words. In: Proc. Interspeech 2010, Makuhari, Japan (2010)
Szöke, I., Fapso, M., Burget, L., Cernocky, J.: Hybrid Word-Subword Decoding for Spoken Term Detection. In: SSCS 2008 - Speech Search Workshop at SIGIR (2008)
Kombrink, S., et al.: Recovery of rare words in lecture speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 330–337. Springer, Heidelberg (2010)
Kombrink, S.: OOV detection and beyond. In: DIRAC workshop at ECML/PKDD, Barcelona (2010)
Tobias, J.V.: Foundations of Modern Auditory Theory. Academic Press, London (1970)
Mesgarani, N., et al.: Phoneme representation and classification in primary auditory cortex. Acoust. Soc. Am. 123, 899–909 (2008)
Mesgarani, N., et al.: Toward optimizing stream fusion in multistream recognition of speech. J. Acoust. Soc. Am. 130(1), EL14–EL18 (2011); (5 pages)
Hermansky, H., et al.: Performance Monitoring For Robustness In Automatic Recognition Of Speech. In: Proc. Symposium on Machine Learning in Speech and Language Processing, Bellevue, Washington, USA (June 2011)
Mesgarani, N., Thomas, S., Hermansky, H.: Adaptive Stream Fusion in Multistream Recognition of Speech. In: Proc. Interspeech (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hermansky, H. (2011). Dealing with Unexpected Words in Automatic Recognition of Speech. In: Habernal, I., Matoušek, V. (eds) Text, Speech and Dialogue. TSD 2011. Lecture Notes in Computer Science(), vol 6836. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23538-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-23538-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23537-5
Online ISBN: 978-3-642-23538-2
eBook Packages: Computer ScienceComputer Science (R0)