Connectionist Large Vocabulary Speech Recognition

Waibel, Alex

doi:10.1007/978-3-642-76626-8_28

Alex Waibel³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

275 Accesses

Abstract

In this paper, the problem of large vocabulary word recognition is addressed from a connectionist perspective. The problem is not only of practical interest but also of scientific importance, since a workable solution must integrate pattern recognition under consideration of sequential, symbolic constraints. We have developed two large vocabulary word recognition systems based on different speech recognition philosophies. One of the systems exploits the power of neural networks in performing accurate classification, the other the power of producing good non-linear function approximation and signal prediction. We present each system’s operation and evaluate its performance. Both achieved respectable recognition scores in excess of 90% correct for vocabularies of up to 5000 words. We suggest further avenues towards improvement of either system and in the process discuss the relative strengths of either approach.

We gratefully acknowledge IEICE for permission to reprint this paper. It has originally appeared in the journal of the IEICE, Vol.J73-D-II, No.8 pp. 1122–1131, (Aug. 1990) by the same author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang K., “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE, Transactions on Acoustics, Speech and Signal Processing, March 1989.
Google Scholar
Waibel, A., Sawai, H. and Shikano, K., “Modularity and Scaling in Large Phonemic Neural Networks”, IEEE Transactions on Acoustics, Speech, Signal Processing, December 1989.
Google Scholar
Moore, R.K. and Peeling, S.M., “Minimally Distinct Word-Pair Discrimination Using a Back-Propagation Network”, Computer, Speech and Language, Vol. 3, No. 2, 1989, pp. 119–132.
Article Google Scholar
Robinson, A.J. and Fallside, F., “A Dynamic Connections Model for Phoneme Recognition”, Proceedings of nEuro’88, IEE, 1988.
Google Scholar
McDermott, E., Iwamida, H., Katagiri, S. and Tohkura, Y., Shift-Tolerant LVQ and Hybrid LVQ-HMM for Phoneme Recognition, Morgan Kaufmann, 1990.
Google Scholar
Burr, D.J., “A Neural Network Digit Recognizer”, IEEE International Conference on Systems, Man, and Cybernetics, October 1986.
Google Scholar
Burr, D.J., “Experiments on Neural Net Recognition of Spoken and Written Text”, IEEE Transactions on Acoustics, Speech; Signal Processing, July 1988, pp. 1162–1168.
Google Scholar
Sakoe, H., Isotani, R., Yoshida, K., Iso, K., and Watanabe, T., “Speaker-Independent Word Recognition Using Dynamic Programming Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1989, pp. 29–32.
Google Scholar
Isotani, R., Yoshida, K., Iso, K., Watanabe, T. and Sakoe, K., “Dynamic Neural Network — A New Speech Recognition Model Based on Dynamic Programming and Neural Network”, IEICE Technical Report, September 1988.
Google Scholar
Bottou, L-Y., “Reconnaissance de la Parole par Reseaux multi-couches”, Proceedings of Neuro-Nimes 88, November 1988.
Google Scholar
Bottou, L., Fogelman-Soulie, F., Blanchet, P., Lienard, J.S., “Experiments with Time-Delay Networks and Dynamic Time Warping for Speaker Independent Isolated Digits Recognition”, Proceedings of the Eurospeech, September 1989.
Google Scholar
Franzini, M.A., Lee, K.F., Waibel, A.H., “Connectionist Viterbi Training: A New Hybrid Method for Continuous Speech Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1990.
Google Scholar
Iso, K. and Watanabe, T., “Speaker-Independent Word Recognition Using A Neural Prediction Model”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.
Google Scholar
Wong, M.K. and Chun, H.W., “Towards a Massively Parallel System for Word Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1986, pp. 37.4.1–37.4.4.
Google Scholar
Lippmann R.P. and Gold, B., “Neural-Net Classifiers Useful for Speech Recognition”, IEEE International Conference on Neural Networks, June 1987.
Google Scholar
J. L. Elman, “Finding Structure in Time”, Tech. report CRL Technical Report 8801, University of California, San Diego, 1988.
Google Scholar
Bridle, J.S., “Alpha-Nets: A Recurrent Neural Network Architecture with a Hidden Markov Model Interpretation”, Speech Communication, 1990, (to appear)
Google Scholar
Young, S.J., “Competitive Training: A Connectionist Approach to the Discriminative Training of Hidden Markov Models”, Tech. report CUED/F-INFENG/TR.41, Cambridge University, March 1990.
Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang K., “Phoneme Recognition Using Time-Delay Neural Networks”, Tech. report TR-1-0006, ATR Interpreting Telephony Research Laboratories, October 1987.
Google Scholar
Hampshire, J. and Waibel, A., “The Meta-Pi Network: Connectionist Rapid Adaptation for High-Performance Multi-Speaker Phoneme Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.
Google Scholar
Waibel, A., “Modular Construction of Time-Delay Neural Networks for Speech Recognition”, Neural Computation, MIT-Press, March 1989.
Google Scholar
Sagisaka, Y., Takeda, K., Katagiri, S. and Kuwabara, H., “Japanese Speech Database with Fine Acoustic-Phonetic Transcriptions”, Tech. report, ATR Interpreting Telephony Research Laboratories, May 1987.
Google Scholar
Miyatake, M., Sawai, H., Shikano, K., “Integrated Training for Spotting Japanese Phonemes Using Large Phonemic Time-Delay Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1990.
Google Scholar
Bourlard, H. and Wellekens, C.J., “Speech Pattern Discrimination and Multilayer Perceptrons”, Computer, Speech and Language, Vol. 3, 1989, pp. 1–19.
Article Google Scholar
U. Bodenhausen, “The Tempo Algorithm: Learning in a Neural Network with Adaptive Time-Delays”, Proceedings of the IJCNN, IJCNN, January 1990, pp. 597–600.
Google Scholar
Bourlard, H. and Wellekens, C.J., “Links between Markov Models and Multilayer Perceptrons”, Advances in Neural Network Information Processing Systems, Morgan Kaufmann, 1988.
Google Scholar
N. Morgan and H. Bourlard, “Continuous Speech Recognition Using Multilayer Perceptrons with Hidden Markov Models”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990, pp. 26.S8.1.
Google Scholar
Rabiner, L.R, Wilpon, J.G. and Juang, B.H., “A Segmental K-Means Training Procedure for Connected Word Recognition”, AT&T Technical Journal, May 1986.
Google Scholar
Niles, L.T. and Silverman, H.F., “Combining Hidden Markov Model and Neural Network Classifiers”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990, pp. 417–420.
Google Scholar
Tamura, S. and Waibel A., “Noise Reduction Using Connectionist Models”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1988, pp. S12.7.
Google Scholar
Atal, B., “Non-Linear Mapping between Speech Codes”, Personal Communication
Google Scholar
Lapedes A. and Farber R., “Nonlinear Signal Processing Using Neural Networks; Prediction and System Modeling”, Tech. report LA-UR-87-2662, Los Alamos National Laboratory, 1987.
Google Scholar
Levin, E., “Speech Recognition Using Hidden Control Neural Network Architecture”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, IEEE, April 1990.
Google Scholar
Sakoe, H., “Dynamic Neural Network — A New Speech Recognition Model Based on Dynamic Programming and Neural Network”, IEICE Technical Report, December 1987.
Google Scholar
H.Sakoe, S.Chiba, “Dynamic Programming Optimization for Spoken Word Recognition”, IEEE Transactions on Acoustics, Speech, Signal Processing, Vol. ASSP-26, No. 1, February 1978, pp. 43–49.
Article MATH Google Scholar
Kukich, K., “Back-Propagation Topologies for Sequence Generation”, IEEE International Conference on Neural Networks, 1988, pp. 301–308.
Google Scholar
Tebelskis, J. and Waibel, A., “Large Vocabulary Recognition Using Linked Predictive Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.
Google Scholar
Endo, T., Tamura, S. and Nakamura, M., “Phoneme Recognition Using Neural Prediction Models”, Tech. report TR-1-0107, ATR Interpreting Telephony Research Laboratories, August 1989.
Google Scholar
Waibel, A. and Lee, K.F., Readings in Speech Recognition, Morgan Kaufmann Publishers, San Mateo, CA, 1990.
Google Scholar
Furui, S. and Sondhi, M.M., Advances in Acoustics and Speech Processing, Marcel Dekker, Inc., New York, NY, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA, 15213, USA
Alex Waibel

Authors

Alex Waibel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Torino, Italy
Pietro Laface
School of Computer Science, 3480 University St., Montreal, Quebec, H3A 2A7, Canada
Renato De Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Waibel, A. (1992). Connectionist Large Vocabulary Speech Recognition. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-642-76626-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics