Abstract
In this paper, the problem of large vocabulary word recognition is addressed from a connectionist perspective. The problem is not only of practical interest but also of scientific importance, since a workable solution must integrate pattern recognition under consideration of sequential, symbolic constraints. We have developed two large vocabulary word recognition systems based on different speech recognition philosophies. One of the systems exploits the power of neural networks in performing accurate classification, the other the power of producing good non-linear function approximation and signal prediction. We present each system’s operation and evaluate its performance. Both achieved respectable recognition scores in excess of 90% correct for vocabularies of up to 5000 words. We suggest further avenues towards improvement of either system and in the process discuss the relative strengths of either approach.
We gratefully acknowledge IEICE for permission to reprint this paper. It has originally appeared in the journal of the IEICE, Vol.J73-D-II, No.8 pp. 1122–1131, (Aug. 1990) by the same author.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang K., “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE, Transactions on Acoustics, Speech and Signal Processing, March 1989.
Waibel, A., Sawai, H. and Shikano, K., “Modularity and Scaling in Large Phonemic Neural Networks”, IEEE Transactions on Acoustics, Speech, Signal Processing, December 1989.
Moore, R.K. and Peeling, S.M., “Minimally Distinct Word-Pair Discrimination Using a Back-Propagation Network”, Computer, Speech and Language, Vol. 3, No. 2, 1989, pp. 119–132.
Robinson, A.J. and Fallside, F., “A Dynamic Connections Model for Phoneme Recognition”, Proceedings of nEuro’88, IEE, 1988.
McDermott, E., Iwamida, H., Katagiri, S. and Tohkura, Y., Shift-Tolerant LVQ and Hybrid LVQ-HMM for Phoneme Recognition, Morgan Kaufmann, 1990.
Burr, D.J., “A Neural Network Digit Recognizer”, IEEE International Conference on Systems, Man, and Cybernetics, October 1986.
Burr, D.J., “Experiments on Neural Net Recognition of Spoken and Written Text”, IEEE Transactions on Acoustics, Speech; Signal Processing, July 1988, pp. 1162–1168.
Sakoe, H., Isotani, R., Yoshida, K., Iso, K., and Watanabe, T., “Speaker-Independent Word Recognition Using Dynamic Programming Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1989, pp. 29–32.
Isotani, R., Yoshida, K., Iso, K., Watanabe, T. and Sakoe, K., “Dynamic Neural Network — A New Speech Recognition Model Based on Dynamic Programming and Neural Network”, IEICE Technical Report, September 1988.
Bottou, L-Y., “Reconnaissance de la Parole par Reseaux multi-couches”, Proceedings of Neuro-Nimes 88, November 1988.
Bottou, L., Fogelman-Soulie, F., Blanchet, P., Lienard, J.S., “Experiments with Time-Delay Networks and Dynamic Time Warping for Speaker Independent Isolated Digits Recognition”, Proceedings of the Eurospeech, September 1989.
Franzini, M.A., Lee, K.F., Waibel, A.H., “Connectionist Viterbi Training: A New Hybrid Method for Continuous Speech Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1990.
Iso, K. and Watanabe, T., “Speaker-Independent Word Recognition Using A Neural Prediction Model”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.
Wong, M.K. and Chun, H.W., “Towards a Massively Parallel System for Word Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1986, pp. 37.4.1–37.4.4.
Lippmann R.P. and Gold, B., “Neural-Net Classifiers Useful for Speech Recognition”, IEEE International Conference on Neural Networks, June 1987.
J. L. Elman, “Finding Structure in Time”, Tech. report CRL Technical Report 8801, University of California, San Diego, 1988.
Bridle, J.S., “Alpha-Nets: A Recurrent Neural Network Architecture with a Hidden Markov Model Interpretation”, Speech Communication, 1990, (to appear)
Young, S.J., “Competitive Training: A Connectionist Approach to the Discriminative Training of Hidden Markov Models”, Tech. report CUED/F-INFENG/TR.41, Cambridge University, March 1990.
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang K., “Phoneme Recognition Using Time-Delay Neural Networks”, Tech. report TR-1-0006, ATR Interpreting Telephony Research Laboratories, October 1987.
Hampshire, J. and Waibel, A., “The Meta-Pi Network: Connectionist Rapid Adaptation for High-Performance Multi-Speaker Phoneme Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.
Waibel, A., “Modular Construction of Time-Delay Neural Networks for Speech Recognition”, Neural Computation, MIT-Press, March 1989.
Sagisaka, Y., Takeda, K., Katagiri, S. and Kuwabara, H., “Japanese Speech Database with Fine Acoustic-Phonetic Transcriptions”, Tech. report, ATR Interpreting Telephony Research Laboratories, May 1987.
Miyatake, M., Sawai, H., Shikano, K., “Integrated Training for Spotting Japanese Phonemes Using Large Phonemic Time-Delay Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1990.
Bourlard, H. and Wellekens, C.J., “Speech Pattern Discrimination and Multilayer Perceptrons”, Computer, Speech and Language, Vol. 3, 1989, pp. 1–19.
U. Bodenhausen, “The Tempo Algorithm: Learning in a Neural Network with Adaptive Time-Delays”, Proceedings of the IJCNN, IJCNN, January 1990, pp. 597–600.
Bourlard, H. and Wellekens, C.J., “Links between Markov Models and Multilayer Perceptrons”, Advances in Neural Network Information Processing Systems, Morgan Kaufmann, 1988.
N. Morgan and H. Bourlard, “Continuous Speech Recognition Using Multilayer Perceptrons with Hidden Markov Models”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990, pp. 26.S8.1.
Rabiner, L.R, Wilpon, J.G. and Juang, B.H., “A Segmental K-Means Training Procedure for Connected Word Recognition”, AT&T Technical Journal, May 1986.
Niles, L.T. and Silverman, H.F., “Combining Hidden Markov Model and Neural Network Classifiers”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990, pp. 417–420.
Tamura, S. and Waibel A., “Noise Reduction Using Connectionist Models”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1988, pp. S12.7.
Atal, B., “Non-Linear Mapping between Speech Codes”, Personal Communication
Lapedes A. and Farber R., “Nonlinear Signal Processing Using Neural Networks; Prediction and System Modeling”, Tech. report LA-UR-87-2662, Los Alamos National Laboratory, 1987.
Levin, E., “Speech Recognition Using Hidden Control Neural Network Architecture”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, IEEE, April 1990.
Sakoe, H., “Dynamic Neural Network — A New Speech Recognition Model Based on Dynamic Programming and Neural Network”, IEICE Technical Report, December 1987.
H.Sakoe, S.Chiba, “Dynamic Programming Optimization for Spoken Word Recognition”, IEEE Transactions on Acoustics, Speech, Signal Processing, Vol. ASSP-26, No. 1, February 1978, pp. 43–49.
Kukich, K., “Back-Propagation Topologies for Sequence Generation”, IEEE International Conference on Neural Networks, 1988, pp. 301–308.
Tebelskis, J. and Waibel, A., “Large Vocabulary Recognition Using Linked Predictive Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.
Endo, T., Tamura, S. and Nakamura, M., “Phoneme Recognition Using Neural Prediction Models”, Tech. report TR-1-0107, ATR Interpreting Telephony Research Laboratories, August 1989.
Waibel, A. and Lee, K.F., Readings in Speech Recognition, Morgan Kaufmann Publishers, San Mateo, CA, 1990.
Furui, S. and Sondhi, M.M., Advances in Acoustics and Speech Processing, Marcel Dekker, Inc., New York, NY, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Waibel, A. (1992). Connectionist Large Vocabulary Speech Recognition. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-76626-8_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive