Advertisement

Representation and Recognition of Temporal Patterns

  • Robert F. Port

Abstract

How can a nervous system represent for itself the temporal relations of patterns that it knows? In order to label auditory patterns, the nervous system must store early portions in order to identify the whole. Both linguists and engineer-scientists have a similar need to record spoken words. This paper reviews three basic models for handling the information-collection problem that supports pattern recognition, whether by scientists or others. Many of these techniques have been implemented in connectionist networks. In linguistic models for words, there are only ordered symbols, i.e. either phonemic segments or words. In engineering and speech science, time windows are built that store the entire signal and allow parametric description of time. But such windows are not plausible for nervous systems. A third alternative is a memory in the form of a dynamic system. These models are driven through a trajectory in state space by the input signals. Thus, the recognition process for familiar patterns produces a distinct trajectory through state space for each learned pattern. Among the advantages of such a system are that (1) it tends to recognize patterns despite changes in the rate of presentation, and (2) the system can be run continuously yet will respond as quickly as possible at appropriate times. Evidence is reviewed about human auditory memory for complex tone sequences. The data suggest that human auditory memory exhibits many similarities to the dynamic model.

Keywords

Hide Markov Model Temporal Pattern Speech Recognition Acoustical Society Speech Perception 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abraham, R. & Shaw, C. (1983) Dynamics, the Geometry of Behavior, Part 1. Santa Cruz, CA: Aerial Press.Google Scholar
  2. Anderson, S. & Port, R. (1990) Network model of auditory pattern recognition. Technical Report 11, Indiana University, Cognitive Science Program.Google Scholar
  3. Baird, B. (1986) Nonlinear dynamics of pattern formation and pattern recognition in the rabbit olfactory bulb. Physica, 22D, 150–175.Google Scholar
  4. Barlow, W.R.L. (1965) The mechanism of directionally selective units in a rabbit’s retina. Journal of Physiology, 173, 477–504.Google Scholar
  5. Bever, T.G. (1973) Serial position and response biases do not account for the effect of syntactic structure on the location of brief noises during sentences. Journal of Psycholinguistic Research, 2(3), 287–288.Google Scholar
  6. Bregman, A.S. & Campbell, J. (1971) Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89, 244–249.CrossRefGoogle Scholar
  7. Carlson, R. & Granstrom, B. (Eds) (1982) Representation of Speech in the Peripheral Auditory System. Amsterdam: Elsevier.Google Scholar
  8. Chomsky, N. & Halle, M. (1968) The Sound Pattern of English. New York: Harper & Row.Google Scholar
  9. Clements, G.N. (1985) The geometry of phonological features. Phonology Yearbook, 2, 223–274.CrossRefGoogle Scholar
  10. Crowder, R. & Morton, J. (1969) Precategorical acoustic storage. Perception and Psychophysics, 5, 365–373.CrossRefGoogle Scholar
  11. Dorman, M., Raphael, L. & Liberman, A. (1979) Some experiments on the sound of silence in phonetic perception. Journal of the Acoustical Society of America, 65, 1518–1532.CrossRefGoogle Scholar
  12. Elman, J. (1988) Finding structure in time. Cognitive Science, 14, 179–211.CrossRefGoogle Scholar
  13. Elman, J.L. & McClelland, J.L. (1986) Interactive processes in speech perception: the TRACE model. In: J. McClelland & D. Rumelhart (Eds) Parallel Distributed Processing, Vol 2, 58–121. Cambridge, MA: MIT Press.Google Scholar
  14. Elman, J. & Zipser, D. (1988) Learning the hidden structure of speech. Journal of the Acoustical Society of America, 83, 615–626.CrossRefGoogle Scholar
  15. Espinoza-Varas, B. & Watson, C. (1986) Temporal discrimination for single components of nonspeech auditory patterns. Journal of the Acoustical Society of America, 80(6), 1685–1694.CrossRefGoogle Scholar
  16. Fant, G. (1973) Speech Sounds and Features. Cambridge, MA: MIT Press.Google Scholar
  17. Gasser, M. & Lee, C.-D. (1989) Networks that learn phonology. Technical Report 300, Computer Science Department, Indiana University.Google Scholar
  18. Goldsmith, J. (1976) Autosegmental Phonology. New York: Garland Press.Google Scholar
  19. Grossberg, S. (1982) Studies of Mind and Brain, Vol. 70 of Boston Studies in the Philosophy of Science. Dordrecht, the Netherlands: D. Reidel.Google Scholar
  20. Grossberg, S. (1986) The adaptive self-organization of serial order in behavior: speech language, and motor control. In: E. Schwab & H. Nusbaum (Eds) Pattern Recognition by Humans and Machines: Speech Perception. Orlando, FL: Academic Press.Google Scholar
  21. Halle, M. & Stevens, K.N. (1980) A note on laryngeal features. Quarterly Progress Report, Research Lab of Electronics, MIT, 101, 198–213.Google Scholar
  22. Handel, S. (1989) Listening: an Introduction to the Perception of Auditory Events. Cambridge, MA: Bradford Books/MIT Press.Google Scholar
  23. Hare, M.L. (1990) The role of similarity in Hungarian vowel harmony: a connectionist account. Connection Science, 2, 123–150.CrossRefGoogle Scholar
  24. Harris, C.L. & Ellman, J.L. (1989) Representing variable information with simple recurrent networks. In: Proceedings of the Eleventh Annual Conference of the Cognitive Science Society, 635–642. Hillsdale, NJ: Erlbaum.Google Scholar
  25. Hinton, G. (1988) Representing part-whole hierarchies in connectionist networks. In: Proceedings of the Tenth Annual Conference of the Cognitive Science Society, 48–54. Hillsdale, NJ: Erlbaum.Google Scholar
  26. Hirsch, M.W. (1989) Convergent activation dynamics in continuous time network. Neural Networks, 2, 331–349.CrossRefGoogle Scholar
  27. Hopfield, J.J. (1982) Neural networks and physical systems with emergent collective computational abilities. In: Proceedings of the National Academy of Sciences, Vol. 79, 2554–2558. National Academy of Sciences.CrossRefGoogle Scholar
  28. Itakura, F. (1975) Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23, 67–72.CrossRefGoogle Scholar
  29. Jakobson, R., Fant, G. & Halle, M. (1952) Preliminaries to Speech Analysis: the Distinctive Features and their Correlates. Cambridge, MA: MIT Press.Google Scholar
  30. Kantowicz, B. & Sorkin, R. (1983) Human Factors: Understanding People-System Relationships. New York: Wiley.Google Scholar
  31. Keeler, J. (1988) Comparison between Kanerva’s SDM and Hopfield-type neural networks. Cognitive Science, 12, 299–329.CrossRefGoogle Scholar
  32. Kewley-Port, D. (1983) Time-varying features as correlates of place of articulation in stop consonants. Journal of the Acoustical Society of America, 73, 322–335.CrossRefGoogle Scholar
  33. Kidd, G.R. & Watson, C.S. (1988) Detection of changes in frequency-and time-transposed auditory patterns. Journal of the Acoustical Society of America, 84, 5141–5142.CrossRefGoogle Scholar
  34. Klatt, D. (1976) Linguistic uses of segmental duration in English: acoustic and perceptual evidence. Journal of the Acoustical Society of America, 59, 1208–1221.CrossRefGoogle Scholar
  35. Klatt, D. (1986) Problem of variability in speech recognition and in models of speech perception. In: Perkell, J. & Klatt, D. (Eds) Invariance and Variability in the Speech Processes, 300–320. Hillsdale, NJ: Erlbaum.Google Scholar
  36. Ladefoged, P. (1989) Representing phonetic structure. Working Papers in Phonetics 73, University of California, Los Angeles.Google Scholar
  37. Lakoff, G. (1988) Cognitive phonology. Paper presented at the LSA Annual Meeting.Google Scholar
  38. Lang, K.J., Waibel, A.H. & Hinton, G.E. (1990) A time-delay neural network architecture for isolated word recognition. Neural Networks, 3(1), 23–43.CrossRefGoogle Scholar
  39. Lashley, K.S. (1951) The problem of serial order in behavior. In: L. A. Jefress (Ed.) Cerebral Mechanisms in Behavior, 112–136. New York: Wiley.Google Scholar
  40. Lea, W.A. (1980) Trends in Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  41. Leek, M.R. & Watson, C. (1984) Learning to detect auditory pattern components. Journal of the Acoustical Society of America, 76, 1037–1044.CrossRefGoogle Scholar
  42. Lehiste, I. (1970) Suprasegmentals. Cambridge, MA: MIT Press.Google Scholar
  43. Levinson, S.E. (1985) A unified theory of composite pattern analysis for automatic speech recognition. In: F. Fallside & W.A. Woods (Eds) Computer Speech Processing, 243–272. Englewood Clifs, NJ: Prentice-Hall.Google Scholar
  44. Liberman, A., Cooper, F., Shankweiler, D. & Studdert-Kennedy, M. (1967) Perception of the speech code. Psychological Review, 74, 431–461.CrossRefGoogle Scholar
  45. Lisker, L. & Abramson, A. (1964) A cross-language study of voicing in initial stops: acoustical measurements. Word, 20, 384–422.Google Scholar
  46. Lisker, L. & Abramson, A. (1971) Distinctive features and laryngeal control. Language, 44, 767–785.CrossRefGoogle Scholar
  47. Luenberger, D.G. (1979) Introduction to Dynamic Systems. New York: Wiley.Google Scholar
  48. Mannes, C. & Dorffner, G. (1989) Self-organizing detectors for spatiotemporal patterns. Department of Medical Cybernetics and Artificial Intelligence, University of Vienna, Austria.Google Scholar
  49. Moore, B.C.J. (1982) An Introduction to Psychology of Hearing. New York: Harcourt Brace Jovanovich, 2nd edition.Google Scholar
  50. Neisser, U. (1967) Cognitive Psychology. New York: Appleton.Google Scholar
  51. O’Shaugnessy, D. (1987) Speech Communication: Human and Machine. Reading, MA: Addison-Wesley.Google Scholar
  52. Port, R. (1986) Invariance in phonetics. In: J. Perkell & D. Klatt (Eds) Invariance and Variability in Speech Processes, 540–558. Hillsdale, NJ: Erlbaum.Google Scholar
  53. Port, R. & Anderson, S. (1989) Recognition of melody fragments in continuously performed music. In: G. Olson & E. Smith (Eds) Proceedings of the Eleventh Annual Meeting of the Cognitive Science Society, 820–827. Hillsdale, NJ: Erlbaum.Google Scholar
  54. Port, R. & Crawford, P. (1989) Pragmatic effects on neutralization rules. Journal of Phonetics, 16(4) 257–282.Google Scholar
  55. Port, R. & Dalby, J. (1982) C/V ratio as a cue for voicing in English. Journal of the Acoustical Society of America, 69, 262–274.CrossRefGoogle Scholar
  56. Port, R.F. (1981) Linguistic timing factors in combination. Journal of the Acoustical Society of America, 69, 262–274.CrossRefGoogle Scholar
  57. Port, R.F. & Rotunno, R. (1979) Relation between voice-onset time and vowel duration. Journal of the Acoustical Society of America, 66(3), 654–662.CrossRefGoogle Scholar
  58. Rabiner, L. & Juang, B. (1986) An introduction to hidden Markov models. IEEE ASSP Magazine, 4–16.Google Scholar
  59. Repp, B. (1984) Categorical perception: issues, methods and findings. In: N.J. Lass (Ed.) Speech and Language: Advances in Basic Research and Practice, Vol. 10, 243–335. Hillsdale, NJ: Erlbaum.Google Scholar
  60. Port, R. & Reilly, W. & Maki, D. (1988) Use of syllable-scale timing to discriminate words. Journal of the Acoustical Society of America, 83(1), 265–273.CrossRefGoogle Scholar
  61. Robinson, D.E. & Watson, C.S. (1972) Psychophysical methods in modern psychoacoustics. In: J.V. Tobias (Ed.) Foundations of Modern Auditory Theory, Vol. 2, 99–131. New York: Academic Press.Google Scholar
  62. Sachs, M.B. & Young, E.D. (1980) Effects of nonlinearities on speech encoding in the auditory nerve. Journal of the Acoustical Society of America, 68, 858–875.CrossRefGoogle Scholar
  63. Sankoff, D. & Kruskal, J.B. (Eds) (1983) Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.Google Scholar
  64. Sejnowski, T. & Rosenberg, C. (1987) Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168.Google Scholar
  65. Selfridge, O.G. (1959) Pandemonium: a paradigm for learning. In: Mechanisation of Thought Processes, 511–531. London: H.M. Stationery Office.Google Scholar
  66. Shamma, S.A. (1989) Stereausis: binaural processing without neural delays. Journal of the Acoustical Society of America, 86(3), 989–1006.CrossRefGoogle Scholar
  67. Skarda, C. & Freeman, W. (1987) How brains make chaos in order to make sense of the world. Behavioral and Brain Sciences, 10, 161–195.CrossRefGoogle Scholar
  68. Smythe, E.J. (1987) The detection of formant transitions in a connectionist network. In: Proceedings of the First IEEE International Conference on Neural Networks, 495–503. San Diego, CA.Google Scholar
  69. Smythe, E.J. (1988) Temporal computation in connectionist models. Technical Report 251, Indiana University, Computer Science Department, Indiana University, Bloomington, IN.Google Scholar
  70. Spiegel, M.F. & Watson, C.S. (1981) Factors in the discrimination of tonal patterns. III. Frequency discrimination with components of well-learned patterns. Journal of the Acoustical Society of America, 69(1), 223–230.CrossRefGoogle Scholar
  71. Stevens, K.N. (1983) Design features of speech sound systems. In: P. MacNeilage (Ed.) The Production of Speech, 247–262. New York: Springer-Verlag.CrossRefGoogle Scholar
  72. Stevens, K.N. & Blumstein, S.E. (1981) The search for invariant acoustic correlates of phonetic features. In: P. Eimas & J. Miller (Eds) Perspectives on the Study of Speech. Hillsdale, NJ: Erlbaum.Google Scholar
  73. Stevens, S.S. (1951) Mathematics, measurement and psychophysics. In: S. S. Stevens (Ed.) Handbook of Experimental Psychology, 1–49. New York: Wiley.Google Scholar
  74. Swets, J.A. (1961) Is there a sensory threshold? Science, 34, 168–177.CrossRefGoogle Scholar
  75. Tank, D. & Hopfield, J. (1987) Neural computation by concentrating information in time. In: Proceedings of the National Academy of Sciences, 1896–1900.Google Scholar
  76. Vaissière, J. (1985) Speech recognition: a tutorial. In: F. Fallside & W. A. Woods (Eds) Computer Speech Processing, 191–242. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  77. Waibel, A. (1986) Prosody and Speech Recognition. PhD thesis, Carnegie-Mellon University, Computer Science Dept. Pittsburgh, PA.Google Scholar
  78. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. & Lang, K. (1988) Phoneme recognition: neural networks vs. hidden Markov models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 107–110. IEEE.Google Scholar
  79. Warren, R. & Bashford, J. (1981) Perception of acoustic iterance: pitch and infrapitch. Perception and Psychophysics, 29(4), 395–402.CrossRefGoogle Scholar
  80. Watrous, R. (1990) Phoneme discrimination using connectionist networks. Journal of the Acoustical Society of America, 87, 1753–1772.CrossRefGoogle Scholar
  81. Watson, C. & Foyle, D. (1985) Central factors in the discrimination and identification of complex sounds. Journal of the Acoustical Society of America, 78, 375–380.CrossRefGoogle Scholar
  82. Watson, C.S. (1987) Uncertainty, informational masking, and the capacity of immediate auditory memory. In: W. A. Yost (Ed.) Auditory Processing of Complex Sounds, 267–277. Hillsdale, NJ: Erlbaum.Google Scholar
  83. Watson, C.S., Wroton, H.W., Kelly, W.J. & Benbasset, C.A. (1975) Factors in the discrimination of tonal patterns. I. Component frequency, temporal position, and silent intervals. Journal of the Acoustical Society of America, 57, 1175–1181.CrossRefGoogle Scholar
  84. Wheeler, D. & Touretzky, D. (1989) A connectionist implementation of cognitive phonology. Technical Report CMU-CS-89-144, School of Computer Science, CMU.Google Scholar
  85. Williams, R. & Zipser, D. (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2), 270–280.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1992

Authors and Affiliations

  • Robert F. Port
    • 1
  1. 1.Department of Linguistics, Department of Computer ScienceIndiana UniversityBloomingtonUSA

Personalised recommendations