Phoneme discrimination using connectionist networks

  • Raymond L. Watrous
Part of the Lecture Notes in Computer Science book series (LNCS, volume 661)


The goal of this research was to evaluate the potential of connectionist networks for speech recognition. The research demonstrated that solutions to a representative set of phoneme discrimination problems could be obtained for a single male speaker. There are many unanswered questions about how these network models and methods might be extended to recognition of a complete set of phonemes, spoken by different talkers in continuous speech. Nevertheless, it may be concluded that connectionist networks are, in principle, sufficient models for acoustic phonetic speech recognition.


Target Function Hide Unit Output Unit Stop Consonant Closure Duration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    P. Denes. Effect of duration on the perception of voicing. Journal of the Acoustical Society of America, 27(4):761–764, July 1955.Google Scholar
  2. 2.
    Jeffrey L. Elman and David Zipser. Learning the hidden structure of speech. Journal of the Acoustical Society of America, 83(4):1615–1626, April 1988.Google Scholar
  3. 3.
    Stephen Grossberg. The adaptive self-organization of serial order in behavior: Speech, language, and motor control. In Pattern Recognition by Humans and Machines, volume I, pages 187–294. Academic Press, New York, 1986.Google Scholar
  4. 4.
    Geoffrey Hinton. Learning translation invariant recognition in a massively parallel network. In Proceedings of the Conference on Parallel Architectures and Languages Europe (PARLE), volume I, pages 1–15. Springer Verlag, June 1987.Google Scholar
  5. 5.
    Michael I. Jordan. Serial order: A parallel distributed processing approach. Technical Report ICS Report 8604, Institute for Cognitive Science, May 1986.Google Scholar
  6. 6.
    Gary Kuhn, Raymond L. Watrous, and Bruce Ladendorf. Connected recognition with a recurrent network. Speech Communication, 9:41–48, 1990.Google Scholar
  7. 7.
    Kevin J. Lang and Geoffrey E. Hinton. A time-delay neural network architecture for speech recognition. Technical Report CMU-CS-88-152, Carnegie Mellon University, December 1988.Google Scholar
  8. 8.
    Leigh Lisker. Closure duration and the intervocalic voiced-voiceless distinction in English. Language, 33:42–49, 1957.Google Scholar
  9. 9.
    Leigh Lisker. Reconciling monophthongal vowel percepts and continuously varying F patterns. Technical Report SR-79/80, Haskins Laboratories, 1984.Google Scholar
  10. 10.
    Leigh Lisker. Voicing in English: A catalogue of acoustic features signalling /b/ versus /p/ in trochees. Language and Speech, 29(1):3–11, 1986.Google Scholar
  11. 11.
    David G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, second edition, 1984.Google Scholar
  12. 12.
    Donald M. MacKay. Self-organization in the time domain. In Marshall C. Yovits, George T. Jacobi, and Gordon D. Goldstein, editors, Self-Organizing Systems 1962, pages 37–48. Spartan Books, Washington, D.C., 1962.Google Scholar
  13. 13.
    Thomas B. Martin. Acoustic Recognition of a Limited Vocabulary in Continuous Speech. PhD thesis, University of Pennsylvania, 1970.Google Scholar
  14. 14.
    Paul Mueller, Thomas Martin, and F. Putzrath. General principles of operations in neuron nets with application to acoustical pattern recognition. In Biological Prototypes and Synthetic Systems. Plenum Press, New York, NY, 1962.Google Scholar
  15. 15.
    D. Obrecht. Three experiments in the perception of geminate consonants in arabic. Language and Speech, 8:31–41, 1965.Google Scholar
  16. 16.
    Barak A. Pearlmutter. Two new learning procedures for recurrent networks. Neural Network Review, 3:99–101, 1990.Google Scholar
  17. 17.
    G. Peterson and J. Shoup. A physiological theory of phonetics. Journal of Speech and Hearing Research, 9:5–67, 1966.Google Scholar
  18. 18.
    A.J. Robinson and F. Fallside. A dynamic connectionist model for phoneme recognition. In The Proceedings of NEURO '88, the First European Conference on Neural Networks, June 1988.Google Scholar
  19. 19.
    David E. Rumelhart, Goeffrey Hinton, and Ronald Williams. Learning internal representations by error propagation. In D. E. Rumelhart, J. L. McClelland, and the PDP research group, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Volume I Foundations, chapter 8. MIT Press, Cambridge, MA, 1986.Google Scholar
  20. 20.
    Richard S. Sutton. Learning to predict by the method of temporal differences. Technical Report TR87-509.1, GTE Laboratories, Inc., April 1987.Google Scholar
  21. 21.
    David W. Tank and John J. Hopfield. Neural computation by concentrating information in time. Proceedings of the Natural Academy of Sciences USA, 84:1896–1900, 1987.Google Scholar
  22. 22.
    Raymond L. Watrous. Learning algorithms for connectionist networks: Applied gradient methods of nonlinear optimization. In Proceedings of the First International Conference on Neural Networks, volume II, pages 619–627, June 1987.Google Scholar
  23. 23.
    Raymond L. Watrous. Speech Recognition using Connectionist Networks. PhD thesis, University of Pennsylvania, 1988.Google Scholar
  24. 24.
    Raymond L. Watrous. Context-modulated discrimination of similar vowels using second-order connectionist networks. Computer, Speech and Language, 5: 341–362, 1991.Google Scholar
  25. 25.
    Raymond Watrous. Speaker normalization using second-order connectionist networks. In 119th Meeting of the Acoustical Society of America, page S107, May 1990.Google Scholar
  26. 26.
    Raymond L. Watrous. Phoneme discrimination using connectionist networks. Journal of the Acoustical Society of America, 87(4):1753–1772, March 1990.Google Scholar
  27. 27.
    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K.Lang. Phoneme recognition using time-delay neural networks. Technical Report TR-1-0006, ATR Interpreting Telephony Research Laboratories, October 1987.Google Scholar
  28. 28.
    A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. Lang. Phoneme recognition: Neural networks vs. hidden markov models. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume S, pages 107–110, April 1988.Google Scholar
  29. 29.
    Raymond L. Watrous, Bruce Ladendorf, and Gary M. Kuhn. Complete gradient optimization of a recurrent network applied to /b/, /d/, /g/ discrimination. Journal of the Acoustical Society of America, 87(3):1301–1309, 1990.Google Scholar
  30. 30.
    Raymond L. Watrous and Lokendra Shastri. Learning phonetic features using connectionist networks: An experiment in speech recognition. Technical Report MS-CIS-86-78, University of Pennsylvania, October 1986.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1993

Authors and Affiliations

  • Raymond L. Watrous
    • 1
  1. 1.Siemens Corporate ResearchPrinceton

Personalised recommendations