Abstract
This paper presents a system that recognizes a limited vocabulary of spoken words in a speaker-independent manner. The system requires only minimal hardware support for acoustic preprocessing. In contrast to other approaches to word-level recognition, it reduces the information content of the speech signals by a compression algorithm before presenting them as inputs to a standard 3-layer backpropagation network. The network learns to recognize the utterances of the speakers in the training set, and the trained network is then used to recognize the spoken words of unknown speakers. Recognition rates of up to 91% were obtained for unknown speakers of the same sex and up to 72% for a mix of both male and female speakers. Since the training times are fast and the system is very cost effective, the approach is practically feasible for a variety of applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Behme H, ‘A Neural Net for Recognition and Storing of Spoken Words’, In: Parallel Processing in Neural Systems and Computers, pp. 379-382, Elsevier Science Publishers, 1990.
Bengio Y, Cardin R, and De Mori R, ‘Speaker Independent Speech Recognition with Neural Networks and Speech Knowledge’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 218–225, Morgan Kaufman Publishers, 1990.
Bourlard H, and Morgan N, ‘A Continuous Speech Recognition System Embedding MLP into HMM’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 186–193, Morgan Kaufman Publishers, 1990.
Franzini M A, ‘Learning to Recognize Spoken Words: A study in Connectionist Speech Recognition’, In: Proceedings of the 1988 Connectionist Models Summer School, pp. 407-416, Morgan Kaufman Publishers, 1988.
Grajski K A, Witmer D P, and Chen C, ‘A Preliminary Note on Static and Recurrent Neural Networks for Word-Level Speech Recognition’, In: Proceedings of the 1990 International Joint Conference on Neural Networks, Vol. 2, pp. 245–248, Lawrence Erlbaum Publishers, 1990.
Hampshire II J B, and Waibel A, ‘Connectionist Architectures for Multi-Speaker Phoneme Recognition’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 203–210, Morgan Kaufman Publishers, 1990.
Hertz J A, Krogh A, and Palmer R, ‘Introduction to the Theory of Neural Computation’, Addison-Wesley, Reading, Massachusetts, 1991.
Kohonen T, ‘The Neural Phonetic Typewriter’, IEEE Computer, 3: 11–22, 1988.
Kowalewski F, and Strube H, ‘Word Recognition with a Recurrent Neural Network’, In: Parallel Processing in Neural Systems and Computers, pp. 390-394, Elsevier Publishers, 1990.
Lee K, ‘Context-Dependent Phonetic Hidden Markov Models for Speaker-Independent Continuous Speech Recognition’, IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(4), 1990.
Lee Y, and Lippmann R P, ‘Practical Characteristics of Neural Network and Conventional Pattern Classifiers on Artificial and Speech Problems’, In: Advances in Neural Information Processing Systems, Vol. 2, pp. 168–177, Morgan Kaufman Publishers, 1990.
Peacocke R D, and Graf D H, ‘An Introduction to Speech and Speaker Recognition’, IEEE Computer, 8: 26–33, 1990.
Rabiner L R, and Gold B, ‘Theory and Applications of Digital Signal Processing’, Prentice-Hall, 1975.
Rigoll G, ‘Neural Network Based Continous Speech Recognition by Combining Self Organizing Maps and Hidden Markov Modelling’, In: Lecture Notes in Computer Science, Vol. 134, pp. 58–65, Springer-Verlag, Berlin, 1990.
Rumelhart, D E, Hinton, G, and Williams, R E, ‘Learning Internal Representations by Error Propagation’, In: Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, 318-362, MIT Press
Sung C, and Jones W C, ‘A Speech Recognition System Featuring Neural Network Processing of Global Lexical Features’, In: Proceedings of the 1990 International Joint Conference on Neural Networks, Vol. 2, pp. 437–440, Lawrence Erlbaum Publishers, 1990.
Waibel A, Hanazawa T, Hinton G, Shikano K, and Lang K, ‘Phoneme Recognition Using Time-Delay Neural Networks’, IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3): 328–339, 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag/Wien
About this paper
Cite this paper
Freisleben, B., Bohn, CA. (1993). Speaker-Independent Word Recognition with Backpropagation Networks. In: Albrecht, R.F., Reeves, C.R., Steele, N.C. (eds) Artificial Neural Nets and Genetic Algorithms. Springer, Vienna. https://doi.org/10.1007/978-3-7091-7533-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-7091-7533-0_36
Publisher Name: Springer, Vienna
Print ISBN: 978-3-211-82459-7
Online ISBN: 978-3-7091-7533-0
eBook Packages: Springer Book Archive