Abstract
How can a nervous system represent for itself the temporal relations of patterns that it knows? In order to label auditory patterns, the nervous system must store early portions in order to identify the whole. Both linguists and engineer-scientists have a similar need to record spoken words. This paper reviews three basic models for handling the information-collection problem that supports pattern recognition, whether by scientists or others. Many of these techniques have been implemented in connectionist networks. In linguistic models for words, there are only ordered symbols, i.e. either phonemic segments or words. In engineering and speech science, time windows are built that store the entire signal and allow parametric description of time. But such windows are not plausible for nervous systems. A third alternative is a memory in the form of a dynamic system. These models are driven through a trajectory in state space by the input signals. Thus, the recognition process for familiar patterns produces a distinct trajectory through state space for each learned pattern. Among the advantages of such a system are that (1) it tends to recognize patterns despite changes in the rate of presentation, and (2) the system can be run continuously yet will respond as quickly as possible at appropriate times. Evidence is reviewed about human auditory memory for complex tone sequences. The data suggest that human auditory memory exhibits many similarities to the dynamic model.
The author is grateful to Svën Anderson for important contributions to the work described here. He is also grateful to Charles Watson, Gary R. Kidd, Michael Gasser, Jungyul Suh and John W. R. Merrill for helpful discussion of these ideas. This research was supported in part by the Air Force Office of Scientific Research, Grant 870089, and by the National Science Foundation, Grants DCR-8505635 and DCR-8518725.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abraham, R. & Shaw, C. (1983) Dynamics, the Geometry of Behavior, Part 1. Santa Cruz, CA: Aerial Press.
Anderson, S. & Port, R. (1990) Network model of auditory pattern recognition. Technical Report 11, Indiana University, Cognitive Science Program.
Baird, B. (1986) Nonlinear dynamics of pattern formation and pattern recognition in the rabbit olfactory bulb. Physica, 22D, 150–175.
Barlow, W.R.L. (1965) The mechanism of directionally selective units in a rabbit’s retina. Journal of Physiology, 173, 477–504.
Bever, T.G. (1973) Serial position and response biases do not account for the effect of syntactic structure on the location of brief noises during sentences. Journal of Psycholinguistic Research, 2(3), 287–288.
Bregman, A.S. & Campbell, J. (1971) Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89, 244–249.
Carlson, R. & Granstrom, B. (Eds) (1982) Representation of Speech in the Peripheral Auditory System. Amsterdam: Elsevier.
Chomsky, N. & Halle, M. (1968) The Sound Pattern of English. New York: Harper & Row.
Clements, G.N. (1985) The geometry of phonological features. Phonology Yearbook, 2, 223–274.
Crowder, R. & Morton, J. (1969) Precategorical acoustic storage. Perception and Psychophysics, 5, 365–373.
Dorman, M., Raphael, L. & Liberman, A. (1979) Some experiments on the sound of silence in phonetic perception. Journal of the Acoustical Society of America, 65, 1518–1532.
Elman, J. (1988) Finding structure in time. Cognitive Science, 14, 179–211.
Elman, J.L. & McClelland, J.L. (1986) Interactive processes in speech perception: the TRACE model. In: J. McClelland & D. Rumelhart (Eds) Parallel Distributed Processing, Vol 2, 58–121. Cambridge, MA: MIT Press.
Elman, J. & Zipser, D. (1988) Learning the hidden structure of speech. Journal of the Acoustical Society of America, 83, 615–626.
Espinoza-Varas, B. & Watson, C. (1986) Temporal discrimination for single components of nonspeech auditory patterns. Journal of the Acoustical Society of America, 80(6), 1685–1694.
Fant, G. (1973) Speech Sounds and Features. Cambridge, MA: MIT Press.
Gasser, M. & Lee, C.-D. (1989) Networks that learn phonology. Technical Report 300, Computer Science Department, Indiana University.
Goldsmith, J. (1976) Autosegmental Phonology. New York: Garland Press.
Grossberg, S. (1982) Studies of Mind and Brain, Vol. 70 of Boston Studies in the Philosophy of Science. Dordrecht, the Netherlands: D. Reidel.
Grossberg, S. (1986) The adaptive self-organization of serial order in behavior: speech language, and motor control. In: E. Schwab & H. Nusbaum (Eds) Pattern Recognition by Humans and Machines: Speech Perception. Orlando, FL: Academic Press.
Halle, M. & Stevens, K.N. (1980) A note on laryngeal features. Quarterly Progress Report, Research Lab of Electronics, MIT, 101, 198–213.
Handel, S. (1989) Listening: an Introduction to the Perception of Auditory Events. Cambridge, MA: Bradford Books/MIT Press.
Hare, M.L. (1990) The role of similarity in Hungarian vowel harmony: a connectionist account. Connection Science, 2, 123–150.
Harris, C.L. & Ellman, J.L. (1989) Representing variable information with simple recurrent networks. In: Proceedings of the Eleventh Annual Conference of the Cognitive Science Society, 635–642. Hillsdale, NJ: Erlbaum.
Hinton, G. (1988) Representing part-whole hierarchies in connectionist networks. In: Proceedings of the Tenth Annual Conference of the Cognitive Science Society, 48–54. Hillsdale, NJ: Erlbaum.
Hirsch, M.W. (1989) Convergent activation dynamics in continuous time network. Neural Networks, 2, 331–349.
Hopfield, J.J. (1982) Neural networks and physical systems with emergent collective computational abilities. In: Proceedings of the National Academy of Sciences, Vol. 79, 2554–2558. National Academy of Sciences.
Itakura, F. (1975) Minimum prediction residual principle applied to speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23, 67–72.
Jakobson, R., Fant, G. & Halle, M. (1952) Preliminaries to Speech Analysis: the Distinctive Features and their Correlates. Cambridge, MA: MIT Press.
Kantowicz, B. & Sorkin, R. (1983) Human Factors: Understanding People-System Relationships. New York: Wiley.
Keeler, J. (1988) Comparison between Kanerva’s SDM and Hopfield-type neural networks. Cognitive Science, 12, 299–329.
Kewley-Port, D. (1983) Time-varying features as correlates of place of articulation in stop consonants. Journal of the Acoustical Society of America, 73, 322–335.
Kidd, G.R. & Watson, C.S. (1988) Detection of changes in frequency-and time-transposed auditory patterns. Journal of the Acoustical Society of America, 84, 5141–5142.
Klatt, D. (1976) Linguistic uses of segmental duration in English: acoustic and perceptual evidence. Journal of the Acoustical Society of America, 59, 1208–1221.
Klatt, D. (1986) Problem of variability in speech recognition and in models of speech perception. In: Perkell, J. & Klatt, D. (Eds) Invariance and Variability in the Speech Processes, 300–320. Hillsdale, NJ: Erlbaum.
Ladefoged, P. (1989) Representing phonetic structure. Working Papers in Phonetics 73, University of California, Los Angeles.
Lakoff, G. (1988) Cognitive phonology. Paper presented at the LSA Annual Meeting.
Lang, K.J., Waibel, A.H. & Hinton, G.E. (1990) A time-delay neural network architecture for isolated word recognition. Neural Networks, 3(1), 23–43.
Lashley, K.S. (1951) The problem of serial order in behavior. In: L. A. Jefress (Ed.) Cerebral Mechanisms in Behavior, 112–136. New York: Wiley.
Lea, W.A. (1980) Trends in Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall.
Leek, M.R. & Watson, C. (1984) Learning to detect auditory pattern components. Journal of the Acoustical Society of America, 76, 1037–1044.
Lehiste, I. (1970) Suprasegmentals. Cambridge, MA: MIT Press.
Levinson, S.E. (1985) A unified theory of composite pattern analysis for automatic speech recognition. In: F. Fallside & W.A. Woods (Eds) Computer Speech Processing, 243–272. Englewood Clifs, NJ: Prentice-Hall.
Liberman, A., Cooper, F., Shankweiler, D. & Studdert-Kennedy, M. (1967) Perception of the speech code. Psychological Review, 74, 431–461.
Lisker, L. & Abramson, A. (1964) A cross-language study of voicing in initial stops: acoustical measurements. Word, 20, 384–422.
Lisker, L. & Abramson, A. (1971) Distinctive features and laryngeal control. Language, 44, 767–785.
Luenberger, D.G. (1979) Introduction to Dynamic Systems. New York: Wiley.
Mannes, C. & Dorffner, G. (1989) Self-organizing detectors for spatiotemporal patterns. Department of Medical Cybernetics and Artificial Intelligence, University of Vienna, Austria.
Moore, B.C.J. (1982) An Introduction to Psychology of Hearing. New York: Harcourt Brace Jovanovich, 2nd edition.
Neisser, U. (1967) Cognitive Psychology. New York: Appleton.
O’Shaugnessy, D. (1987) Speech Communication: Human and Machine. Reading, MA: Addison-Wesley.
Port, R. (1986) Invariance in phonetics. In: J. Perkell & D. Klatt (Eds) Invariance and Variability in Speech Processes, 540–558. Hillsdale, NJ: Erlbaum.
Port, R. & Anderson, S. (1989) Recognition of melody fragments in continuously performed music. In: G. Olson & E. Smith (Eds) Proceedings of the Eleventh Annual Meeting of the Cognitive Science Society, 820–827. Hillsdale, NJ: Erlbaum.
Port, R. & Crawford, P. (1989) Pragmatic effects on neutralization rules. Journal of Phonetics, 16(4) 257–282.
Port, R. & Dalby, J. (1982) C/V ratio as a cue for voicing in English. Journal of the Acoustical Society of America, 69, 262–274.
Port, R.F. (1981) Linguistic timing factors in combination. Journal of the Acoustical Society of America, 69, 262–274.
Port, R.F. & Rotunno, R. (1979) Relation between voice-onset time and vowel duration. Journal of the Acoustical Society of America, 66(3), 654–662.
Rabiner, L. & Juang, B. (1986) An introduction to hidden Markov models. IEEE ASSP Magazine, 4–16.
Repp, B. (1984) Categorical perception: issues, methods and findings. In: N.J. Lass (Ed.) Speech and Language: Advances in Basic Research and Practice, Vol. 10, 243–335. Hillsdale, NJ: Erlbaum.
Port, R. & Reilly, W. & Maki, D. (1988) Use of syllable-scale timing to discriminate words. Journal of the Acoustical Society of America, 83(1), 265–273.
Robinson, D.E. & Watson, C.S. (1972) Psychophysical methods in modern psychoacoustics. In: J.V. Tobias (Ed.) Foundations of Modern Auditory Theory, Vol. 2, 99–131. New York: Academic Press.
Sachs, M.B. & Young, E.D. (1980) Effects of nonlinearities on speech encoding in the auditory nerve. Journal of the Acoustical Society of America, 68, 858–875.
Sankoff, D. & Kruskal, J.B. (Eds) (1983) Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.
Sejnowski, T. & Rosenberg, C. (1987) Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168.
Selfridge, O.G. (1959) Pandemonium: a paradigm for learning. In: Mechanisation of Thought Processes, 511–531. London: H.M. Stationery Office.
Shamma, S.A. (1989) Stereausis: binaural processing without neural delays. Journal of the Acoustical Society of America, 86(3), 989–1006.
Skarda, C. & Freeman, W. (1987) How brains make chaos in order to make sense of the world. Behavioral and Brain Sciences, 10, 161–195.
Smythe, E.J. (1987) The detection of formant transitions in a connectionist network. In: Proceedings of the First IEEE International Conference on Neural Networks, 495–503. San Diego, CA.
Smythe, E.J. (1988) Temporal computation in connectionist models. Technical Report 251, Indiana University, Computer Science Department, Indiana University, Bloomington, IN.
Spiegel, M.F. & Watson, C.S. (1981) Factors in the discrimination of tonal patterns. III. Frequency discrimination with components of well-learned patterns. Journal of the Acoustical Society of America, 69(1), 223–230.
Stevens, K.N. (1983) Design features of speech sound systems. In: P. MacNeilage (Ed.) The Production of Speech, 247–262. New York: Springer-Verlag.
Stevens, K.N. & Blumstein, S.E. (1981) The search for invariant acoustic correlates of phonetic features. In: P. Eimas & J. Miller (Eds) Perspectives on the Study of Speech. Hillsdale, NJ: Erlbaum.
Stevens, S.S. (1951) Mathematics, measurement and psychophysics. In: S. S. Stevens (Ed.) Handbook of Experimental Psychology, 1–49. New York: Wiley.
Swets, J.A. (1961) Is there a sensory threshold? Science, 34, 168–177.
Tank, D. & Hopfield, J. (1987) Neural computation by concentrating information in time. In: Proceedings of the National Academy of Sciences, 1896–1900.
Vaissière, J. (1985) Speech recognition: a tutorial. In: F. Fallside & W. A. Woods (Eds) Computer Speech Processing, 191–242. Englewood Cliffs, NJ: Prentice-Hall.
Waibel, A. (1986) Prosody and Speech Recognition. PhD thesis, Carnegie-Mellon University, Computer Science Dept. Pittsburgh, PA.
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K. & Lang, K. (1988) Phoneme recognition: neural networks vs. hidden Markov models. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, 107–110. IEEE.
Warren, R. & Bashford, J. (1981) Perception of acoustic iterance: pitch and infrapitch. Perception and Psychophysics, 29(4), 395–402.
Watrous, R. (1990) Phoneme discrimination using connectionist networks. Journal of the Acoustical Society of America, 87, 1753–1772.
Watson, C. & Foyle, D. (1985) Central factors in the discrimination and identification of complex sounds. Journal of the Acoustical Society of America, 78, 375–380.
Watson, C.S. (1987) Uncertainty, informational masking, and the capacity of immediate auditory memory. In: W. A. Yost (Ed.) Auditory Processing of Complex Sounds, 267–277. Hillsdale, NJ: Erlbaum.
Watson, C.S., Wroton, H.W., Kelly, W.J. & Benbasset, C.A. (1975) Factors in the discrimination of tonal patterns. I. Component frequency, temporal position, and silent intervals. Journal of the Acoustical Society of America, 57, 1175–1181.
Wheeler, D. & Touretzky, D. (1989) A connectionist implementation of cognitive phonology. Technical Report CMU-CS-89-144, School of Computer Science, CMU.
Williams, R. & Zipser, D. (1989) A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2), 270–280.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Port, R.F. (1992). Representation and Recognition of Temporal Patterns. In: Sharkey, N. (eds) Connectionist Natural Language Processing. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-2624-3_15
Download citation
DOI: https://doi.org/10.1007/978-94-011-2624-3_15
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-5160-6
Online ISBN: 978-94-011-2624-3
eBook Packages: Springer Book Archive