Abstract
Automatic Speech Recognition (ASR) and Understanding (ASU) systems heavily rely on machine learning techniques to solve the problem of mapping spoken utterances into words and meanings. The statistical methods employed, however, greatly deviate from the processes involved in human language acquisition in a number of key aspects. Although ASR and ASU have recently reached a level of accuracy that is sufficient for some practical applications, there are still severe limitations due, for example, to the amount of training data required and the lack of generalization of the resulting models. In our opinion, there is a need for a paradigm shift and speech technology should address some of the challenges that humans face when learning a first language and that are currently ignored by the ASR and ASU methods. In this paper, we point out some of the aspects that could lead to more robust and flexible models, and we describe some of the research we and other researchers have performed in the area.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ananthakrishnan, G., Salvi, G.: Using imitation to learn infant-adult acoustic mappings. In: Proc. of Interspeech, Firenze, Italy (2011)
Bailly, G.: Learning to speak. Sensori-motor control of speech movements* 1. Speech Communication 22(2-3), 251–267 (1997)
Driesen, J., ten Bosch, L., van Hamme, H.: Adaptive non-negative matrix factorization in a computational model of language acquisition. In: Proc. Interspeech (2009)
Guenther, F.H.: Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Psychological Review 102(3), 594–620 (1995)
Guenther, F.H., Gjaja, M.N.: The perceptual magnet effect as an emergent property of neural map formation 100(2), 1111–1121 (1996)
Markey, K.: The sensorimotor foundations of phonology: a computational model of early childhood articulatory and phonetic development. Ph.D. thesis, University of Colorado Doctoral Dissertation (1994)
Salvi, G.: Ecological language acquisition via incremental model-based clustering. In: Proceedings of Eurospeech, Lisbon, Portugal, pp. 1181–1184 (2005)
Salvi, G., Montesano, L., Bernardino, A., Santos-Victor, J.: Language bootstrapping: Learning word meanings from perception-action association. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 42(3), 660–671 (2012)
Stouten, V., Demuynck, K., van Hamme, H.: Discovering phone patterns in spoken utterances by non-negative matrix factorization. IEEE Signal Processing Lett. 15, 131–134 (2008)
Vanhainen, N., Salvi, G.: Word discovery with beta process factor analysis. In: Proc. of Interspeech, Portland, Oregon (2012)
Westermann, G., Reck Miranda, E.: A new model of sensorimotor coupling in the development of speech. Brain and Language 89(2), 393–400 (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Salvi, G. (2013). Biologically Inspired Methods for Automatic Speech Understanding. In: Chella, A., Pirrone, R., Sorbello, R., Jóhannsdóttir, K. (eds) Biologically Inspired Cognitive Architectures 2012. Advances in Intelligent Systems and Computing, vol 196. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34274-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-34274-5_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34273-8
Online ISBN: 978-3-642-34274-5
eBook Packages: EngineeringEngineering (R0)