Skip to main content

Connectionist Large Vocabulary Speech Recognition

  • Conference paper
Speech Recognition and Understanding

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

  • 275 Accesses

Abstract

In this paper, the problem of large vocabulary word recognition is addressed from a connectionist perspective. The problem is not only of practical interest but also of scientific importance, since a workable solution must integrate pattern recognition under consideration of sequential, symbolic constraints. We have developed two large vocabulary word recognition systems based on different speech recognition philosophies. One of the systems exploits the power of neural networks in performing accurate classification, the other the power of producing good non-linear function approximation and signal prediction. We present each system’s operation and evaluate its performance. Both achieved respectable recognition scores in excess of 90% correct for vocabularies of up to 5000 words. We suggest further avenues towards improvement of either system and in the process discuss the relative strengths of either approach.

We gratefully acknowledge IEICE for permission to reprint this paper. It has originally appeared in the journal of the IEICE, Vol.J73-D-II, No.8 pp. 1122–1131, (Aug. 1990) by the same author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang K., “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE, Transactions on Acoustics, Speech and Signal Processing, March 1989.

    Google Scholar 

  2. Waibel, A., Sawai, H. and Shikano, K., “Modularity and Scaling in Large Phonemic Neural Networks”, IEEE Transactions on Acoustics, Speech, Signal Processing, December 1989.

    Google Scholar 

  3. Moore, R.K. and Peeling, S.M., “Minimally Distinct Word-Pair Discrimination Using a Back-Propagation Network”, Computer, Speech and Language, Vol. 3, No. 2, 1989, pp. 119–132.

    Article  Google Scholar 

  4. Robinson, A.J. and Fallside, F., “A Dynamic Connections Model for Phoneme Recognition”, Proceedings of nEuro’88, IEE, 1988.

    Google Scholar 

  5. McDermott, E., Iwamida, H., Katagiri, S. and Tohkura, Y., Shift-Tolerant LVQ and Hybrid LVQ-HMM for Phoneme Recognition, Morgan Kaufmann, 1990.

    Google Scholar 

  6. Burr, D.J., “A Neural Network Digit Recognizer”, IEEE International Conference on Systems, Man, and Cybernetics, October 1986.

    Google Scholar 

  7. Burr, D.J., “Experiments on Neural Net Recognition of Spoken and Written Text”, IEEE Transactions on Acoustics, Speech; Signal Processing, July 1988, pp. 1162–1168.

    Google Scholar 

  8. Sakoe, H., Isotani, R., Yoshida, K., Iso, K., and Watanabe, T., “Speaker-Independent Word Recognition Using Dynamic Programming Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1989, pp. 29–32.

    Google Scholar 

  9. Isotani, R., Yoshida, K., Iso, K., Watanabe, T. and Sakoe, K., “Dynamic Neural Network — A New Speech Recognition Model Based on Dynamic Programming and Neural Network”, IEICE Technical Report, September 1988.

    Google Scholar 

  10. Bottou, L-Y., “Reconnaissance de la Parole par Reseaux multi-couches”, Proceedings of Neuro-Nimes 88, November 1988.

    Google Scholar 

  11. Bottou, L., Fogelman-Soulie, F., Blanchet, P., Lienard, J.S., “Experiments with Time-Delay Networks and Dynamic Time Warping for Speaker Independent Isolated Digits Recognition”, Proceedings of the Eurospeech, September 1989.

    Google Scholar 

  12. Franzini, M.A., Lee, K.F., Waibel, A.H., “Connectionist Viterbi Training: A New Hybrid Method for Continuous Speech Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1990.

    Google Scholar 

  13. Iso, K. and Watanabe, T., “Speaker-Independent Word Recognition Using A Neural Prediction Model”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.

    Google Scholar 

  14. Wong, M.K. and Chun, H.W., “Towards a Massively Parallel System for Word Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1986, pp. 37.4.1–37.4.4.

    Google Scholar 

  15. Lippmann R.P. and Gold, B., “Neural-Net Classifiers Useful for Speech Recognition”, IEEE International Conference on Neural Networks, June 1987.

    Google Scholar 

  16. J. L. Elman, “Finding Structure in Time”, Tech. report CRL Technical Report 8801, University of California, San Diego, 1988.

    Google Scholar 

  17. Bridle, J.S., “Alpha-Nets: A Recurrent Neural Network Architecture with a Hidden Markov Model Interpretation”, Speech Communication, 1990, (to appear)

    Google Scholar 

  18. Young, S.J., “Competitive Training: A Connectionist Approach to the Discriminative Training of Hidden Markov Models”, Tech. report CUED/F-INFENG/TR.41, Cambridge University, March 1990.

    Google Scholar 

  19. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. and Lang K., “Phoneme Recognition Using Time-Delay Neural Networks”, Tech. report TR-1-0006, ATR Interpreting Telephony Research Laboratories, October 1987.

    Google Scholar 

  20. Hampshire, J. and Waibel, A., “The Meta-Pi Network: Connectionist Rapid Adaptation for High-Performance Multi-Speaker Phoneme Recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.

    Google Scholar 

  21. Waibel, A., “Modular Construction of Time-Delay Neural Networks for Speech Recognition”, Neural Computation, MIT-Press, March 1989.

    Google Scholar 

  22. Sagisaka, Y., Takeda, K., Katagiri, S. and Kuwabara, H., “Japanese Speech Database with Fine Acoustic-Phonetic Transcriptions”, Tech. report, ATR Interpreting Telephony Research Laboratories, May 1987.

    Google Scholar 

  23. Miyatake, M., Sawai, H., Shikano, K., “Integrated Training for Spotting Japanese Phonemes Using Large Phonemic Time-Delay Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, May 1990.

    Google Scholar 

  24. Bourlard, H. and Wellekens, C.J., “Speech Pattern Discrimination and Multilayer Perceptrons”, Computer, Speech and Language, Vol. 3, 1989, pp. 1–19.

    Article  Google Scholar 

  25. U. Bodenhausen, “The Tempo Algorithm: Learning in a Neural Network with Adaptive Time-Delays”, Proceedings of the IJCNN, IJCNN, January 1990, pp. 597–600.

    Google Scholar 

  26. Bourlard, H. and Wellekens, C.J., “Links between Markov Models and Multilayer Perceptrons”, Advances in Neural Network Information Processing Systems, Morgan Kaufmann, 1988.

    Google Scholar 

  27. N. Morgan and H. Bourlard, “Continuous Speech Recognition Using Multilayer Perceptrons with Hidden Markov Models”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990, pp. 26.S8.1.

    Google Scholar 

  28. Rabiner, L.R, Wilpon, J.G. and Juang, B.H., “A Segmental K-Means Training Procedure for Connected Word Recognition”, AT&T Technical Journal, May 1986.

    Google Scholar 

  29. Niles, L.T. and Silverman, H.F., “Combining Hidden Markov Model and Neural Network Classifiers”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990, pp. 417–420.

    Google Scholar 

  30. Tamura, S. and Waibel A., “Noise Reduction Using Connectionist Models”, IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1988, pp. S12.7.

    Google Scholar 

  31. Atal, B., “Non-Linear Mapping between Speech Codes”, Personal Communication

    Google Scholar 

  32. Lapedes A. and Farber R., “Nonlinear Signal Processing Using Neural Networks; Prediction and System Modeling”, Tech. report LA-UR-87-2662, Los Alamos National Laboratory, 1987.

    Google Scholar 

  33. Levin, E., “Speech Recognition Using Hidden Control Neural Network Architecture”, Proceedings of the International Conference on Acoustics, Speech and Signal Processing, IEEE, April 1990.

    Google Scholar 

  34. Sakoe, H., “Dynamic Neural Network — A New Speech Recognition Model Based on Dynamic Programming and Neural Network”, IEICE Technical Report, December 1987.

    Google Scholar 

  35. H.Sakoe, S.Chiba, “Dynamic Programming Optimization for Spoken Word Recognition”, IEEE Transactions on Acoustics, Speech, Signal Processing, Vol. ASSP-26, No. 1, February 1978, pp. 43–49.

    Article  MATH  Google Scholar 

  36. Kukich, K., “Back-Propagation Topologies for Sequence Generation”, IEEE International Conference on Neural Networks, 1988, pp. 301–308.

    Google Scholar 

  37. Tebelskis, J. and Waibel, A., “Large Vocabulary Recognition Using Linked Predictive Neural Networks”, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, April 1990.

    Google Scholar 

  38. Endo, T., Tamura, S. and Nakamura, M., “Phoneme Recognition Using Neural Prediction Models”, Tech. report TR-1-0107, ATR Interpreting Telephony Research Laboratories, August 1989.

    Google Scholar 

  39. Waibel, A. and Lee, K.F., Readings in Speech Recognition, Morgan Kaufmann Publishers, San Mateo, CA, 1990.

    Google Scholar 

  40. Furui, S. and Sondhi, M.M., Advances in Acoustics and Speech Processing, Marcel Dekker, Inc., New York, NY, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Waibel, A. (1992). Connectionist Large Vocabulary Speech Recognition. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76626-8_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-76628-2

  • Online ISBN: 978-3-642-76626-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics