Skip to main content

Scene Analysis

  • Chapter
Auditory Computation

Part of the book series: Springer Handbook of Auditory Research ((SHAR,volume 6))

Abstract

One person talks to another in a crowded, noisy room; a soloist performs a concerto with an orchestra; a car screeches to a halt in the street outside: in each of these situations, the auditory system is faced with the problem of separating several different sources of sound from the complex, composite signal that reaches the ears.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adelson EH, Bergen JR (1986) The extraction of spatio-temporal energy in human and machine vision. In: Proceedings, Workshop on Motion: Representation and Analysis, pp. 151–155. Los Alamitos, CA: IEEE Computer Society Press.

    Google Scholar 

  • Anstis S, Saida S (1985) Adaptation to auditory streaming of frequency-modulated tones. J Exp Psychol Hum Percept Perform 11(3):257–271.

    Google Scholar 

  • Assman PF, Summerfield Q (1989) Modelling the perception of concurrent vowels: vowels with different fundamental frequencies. J Acoust Soc Am 88:680–697.

    Google Scholar 

  • Balzano GJ (1980) The group-theoretic description of twelvefold and microtonal pitch systems. Comp Music J 4:66–84.

    Google Scholar 

  • Barinaga M (1990) The mind revealed? Science 249:856–858.

    PubMed  CAS  Google Scholar 

  • Békésy G von (1963) Three experiments concerned with pitch perception. J Acoust Soc Am 35(4):602–606.

    Google Scholar 

  • Borden GJ, Harris KS (1984) Speech Science Primer: Physiology, Acoustics, and Perception of Speech. Baltimore: Williams & Wilkins.

    Google Scholar 

  • Bregman AS (1978) Auditory streaming is cumulative. J Exp Psychol Hum Percept Perform 4(3):380–387.

    PubMed  CAS  Google Scholar 

  • Bregman AS (1990) Auditory Scene Analysis. Cambridge: MIT Press.

    Google Scholar 

  • Bregman AS, Dannenbring G (1973) The effect of continuity on auditory stream segregation. Percept & Psychophys 13(2):308–312.

    Google Scholar 

  • Bregman AS, Pinker S (1978) Auditory streaming and the building of timbre. Can J Psychol 32(1):19–31.

    PubMed  CAS  Google Scholar 

  • Bregman AS, Rudnicky A (1975) Auditory segregation: stream or steams? J Exp Psychol Hum Percept Perform 1(3):263–267.

    PubMed  CAS  Google Scholar 

  • Bregman AS, Abramson J, Doehring P, Darwin CJ (1985) Spectral integration based on common amplitude modulation. Percept Psychophys 37:483–493.

    PubMed  CAS  Google Scholar 

  • Brown GJ (1992) Computational Auditory Scene Analysis. Ph.D. thesis, University of Sheffield, England.

    Google Scholar 

  • Brown GJ, Cooke M (1993) Physiologically-motivated signal representations for computational auditory modeling. In: Cooke M, Beet SW, Crawford M (eds) Visual Representations of Speech Signals. New York: Wiley, pp. 181–188.

    Google Scholar 

  • Carlyon RP (1991) Discriminating between coherent and incoherent frequency modulation of complex tones. J Acoust Soc Am 89(l):329–340.

    PubMed  CAS  Google Scholar 

  • Carloyon RP, Stubbs RJ (1989) Detecting single-cycle frequency modulation imposed on sinusoidal, harmonic, and inharmonic carriers. J Acoust Soc Am 85(6):2563–2574.

    Google Scholar 

  • Chafe C, Jaffe DA (1986) Source separation and note identification in polyphonic music. Proc IEEE Int Conf Acoust Speech Sig Proc 2:25.6.1–25.6.4.

    Google Scholar 

  • Chowning JM (1980) Computer synthesis of the singing voice. In: Sound Generation in Winds, Strings, Computers. Stockholm: Royal Swedish Academy of Music, Publ. No. 29, pp. 4–13.

    Google Scholar 

  • Ciocca W, Bregman AS (1989) The effects of auditory streaming on duplex perception. Percept Psychophys 46(1):39–48.

    PubMed  CAS  Google Scholar 

  • Cohen EA (1984) Some effects of inharmonic partials on interval perception. Music Percept l(3):323–349.

    Google Scholar 

  • Cohen MF, Schubert ED (1987) Influence of place synchrony on detection of a sinusoid. J Acoust Soc Am 81(2):452–458.

    PubMed  CAS  Google Scholar 

  • Cooke MP (1991) Modelling Auditory Processing and Organisation. Ph.D. thesis, University of Sheffield, Sheffield.

    Google Scholar 

  • Cooke MP, Crawfod MD (1993) Tracking spectral dominances in an auditory model. In: Cooke MP, Beet SW, Crawford MD (eds) Visual Representations of Speech Signals. New York: Wiley, pp. 197–204.

    Google Scholar 

  • Darwin CJ (1984) Perceiving vowels in the presence of another sound: constraints on formant perception. J Acoust Soc Am 76(6): 1636–1647.

    PubMed  CAS  Google Scholar 

  • Deutsch D (1975) Two-channel listening to musical scales. J Acoust Soc Am 57(5): 1156–1160.

    PubMed  CAS  Google Scholar 

  • Dirks DD, Bower D (1970) Effect of forward and backward masking on speech intelligibility. J Acoust Soc Am 47(4): 1003–1008.

    PubMed  CAS  Google Scholar 

  • Dowling WJ (1978) Scale and contour: two components of a theory of memory for melodies. Psychol Rev 85(4):341–354.

    Google Scholar 

  • Durlach NI (1963) Equalization and cancellation theory of binaural masking-level differences. J Acoust Soc Am 35(8):1206–1218.

    Google Scholar 

  • Durlach NI (1964) Note on binaural masking-level differences at high frequencies. J Acoust Soc Am 36(3):576–581.

    Google Scholar 

  • Erickson R (1982) New music and psychology. In: Deutsch D (ed) The Psychology of Music. London: Academic Press, pp. 517–536.

    Google Scholar 

  • Fodor JA (1983) Modularity of Mind. Cambridge: MIT Press.

    Google Scholar 

  • Freeman WJ (1975) Mass Action in the Nervous System. London: Academic Press.

    Google Scholar 

  • Fuchs W (1962) Mathematical analysis of formal structure of music. IRE Trans Inform Theory, IT 8:225–228.

    Google Scholar 

  • Gardner RB, Darwin CJ (1986) Grouping of vowel harmonics by frequency modulation: absence of effects on phonemic categorization. Percept Psychophys 40(3): 183–187.

    PubMed  CAS  Google Scholar 

  • Gardner RB, Wilson JP (1979) Evidence for direction-specific channels in the processing of frequency modulation. J Acoust Soc Am 66(3):704–709.

    PubMed  CAS  Google Scholar 

  • Goldstein JL (1973) An optimum processor theory for the central formation of he pitch of complex tones. J Acoust Soc Am 54(6): 1496–1516.

    PubMed  CAS  Google Scholar 

  • Gordon JW (1984) Perception of Attack Transients in Musical Tones. Ph.D. thesis, Dept. of Music, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Gray CM, Singer W (1989) Stimulus-specific neuronal oscillations in orientation columns of cat visual cortex. Proc Natl Acad Sci USA 86:1698–1702.

    PubMed  CAS  Google Scholar 

  • Gray CM, König P, Engel AK, Singer W (1989) Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature 338:334–337.

    PubMed  CAS  Google Scholar 

  • Grey JM (1975) An Exploration of Musical Timbre. Ph.D. thesis, Dept. of Music, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Hafter ER (1971) Quantitive evaluation of a lateralization model of masking-level differences. J Acoust Soc Am 50(4):1116–1122.

    Google Scholar 

  • Hall JW, Haggard MP, Fernandes MA (1984) Detection in noise by spectrotermporal pattern analysis. J Acoust Soc Am 76:50–56.

    PubMed  CAS  Google Scholar 

  • Hartmann WM (1988) Pitch perception and the segregation and integration of auditory entities. In Edelman GM, Gall WE, Cowan WM (eds) Auditory Function: Neurobiological Bases of Hearing. New York: Wiley, pp. 623–645.

    Google Scholar 

  • Hebb DO (1949) The Organization of Behavior. New York: Wiley.

    Google Scholar 

  • Heeger DJ (1991) Nonlinear model of neural responses in cat visual cortex. In: Landy MS, Movshon JA (eds) Computational Models of Visual Processing. Cambridge: MIT Press.

    Google Scholar 

  • Jeffress LA (1972) Binaural signal detection: vector theory. In: Tobias JV (ed) Foundations of Modern Auditory Theory, Vol. II. London: Academic Press, pp. 351–368.

    Google Scholar 

  • Jeffress LA, Blodgett HC, Sandel TT, Wood III CL (1956) Masking of tonal signals. J Acoust Soc Am 28:416–426.

    Google Scholar 

  • Jenison RL, Greenberg S, Kluender KR, Rhode WS (1991) A composite model of the auditory periphery for the processing of speech based on the filter response functions of single auditory-nerve fibers. J Acoust Soc Am 90:773–786.

    PubMed  CAS  Google Scholar 

  • Johannesma P, Aertsen A, van den Boogaard H, Eggermont J, Epping W (1986) From synchrony to harmony: ideas on the function of neural assemblies and on the interpretation of neural synchrony. In: Palm G, Aertsen A (eds) Brain Theory. Berlin: Springer, pp. 25–47.

    Google Scholar 

  • Kay RH, Matthews DR (1972) On the existence in human auditory pathways of channels selectively tuned to the modulation present in frequency-modulated tones. J Physiol 225:657–677.

    PubMed  CAS  Google Scholar 

  • Knudsen EI (1981) The hearing of the barn owl. Sci Am 245(6):113–125

    Google Scholar 

  • Licklider JCR (1951) A duplex theory of pitch perception. Experientia 7:128–133.

    PubMed  CAS  Google Scholar 

  • Lindemann W (1986) Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. J Acoust Soc Am 80:1608–1622.

    PubMed  CAS  Google Scholar 

  • Lyon RF (1982) A computational model of filtering, detection, and compression in the cochlea. Proc IEEE Int Conf Acoust Speech Sig Proc 2:1282–1285.

    Google Scholar 

  • Lyon RF (1984) Computational models of neural auditory processing. Proc IEEE Int Conf Acoust Speech Sig Proc 36.1.1–36.1.4.

    Google Scholar 

  • Lyon RF (1986) Experiments with a computational model of the cochlea. Proc IEEE Int Conf Acoust Speech Sig Proc: 1975–1978.

    Google Scholar 

  • Lyon RF, Mead CA (1988) Cochlear hydrodynamics demystified. Tech Rept CSTR 88-4, California Institute of Technology, Pasadena.

    Google Scholar 

  • Massaro DW (1987) Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Hillsdale: Erlbaum.

    Google Scholar 

  • McAdams S (1984) Spectral Fusion, Spectral Parsing, and the Formation of Auditory Images. Ph.D. thesis, Stanford University, Palo Alto, CA.

    Google Scholar 

  • McAdams S (1989) Segregation of concurrent sounds I: Effects of frequency modulation coherence. J Acoust Soc Am 86(6):2148–2159.

    PubMed  CAS  Google Scholar 

  • Meddis R, Hewitt M (1991) Virtual pitch and phase sensitivity of a computer model of the auditory periphery: I. Pitch identification. J Acoust Soc Am 89(6):2866–2882.

    Google Scholar 

  • Mellinger DK (1991) Event Formation and Separation in Musical Sound. Ph.D. thesis, Dept of Music, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Mellinger DK, Clark CW (1993) A method for filtering bioacoustic transients by spectrogram image convolution. Proc IEEE Oceans’93, pp. 122–127.

    Google Scholar 

  • Mendelson JR, Cynader MS (1985) Sensitivity of cat auditory primary cortex (AI) neurons to the direction and rate of frequency modulation. Brain Res 327:331–335.

    PubMed  CAS  Google Scholar 

  • Metz PJ, von Bismark G, Durlach NI (1968) Further results on binaural unmasking and the EC model. II. Noise bandwidth and interaural phase. J Acoust Soc Am 43(5): 1085–1091.

    PubMed  CAS  Google Scholar 

  • Miller GA, Licklider JCR (1950) The intelligibility of interrupted speech. J Acoust Soc Am 22(2): 167–173.

    Google Scholar 

  • Møller AR (1977) Coding of time-varying sounds in the cochlear nucleus. Audiology 17:446–468.

    Google Scholar 

  • Moore BCJ (1989) An Introducion to the Psychology of Hearing, 3rd Ed. London: Academic Press.

    Google Scholar 

  • Moore BCJ (1990) Co-modulation masking release: spectro-termporal pattern analysis in hearing. Br J Audiol 24:131–137.

    PubMed  CAS  Google Scholar 

  • Moore BCJ, Glasberg BR, Peters RW (1985) Relative dominance of individual partials in determining the pitch of complex tones. J Acoust Soc Am 77(5): 1853–1860.

    Google Scholar 

  • Moore BCJ, Peters RW, Glasberg BR (1985) Thresholds for the detection of inharmonicity in complex tones. J Acoust Soc Am 77(5): 1861–1867.

    PubMed  CAS  Google Scholar 

  • Moorer JA (1975) On the Segmentation and Analysis of Continuous Musical Sound by Digital Computer. Ph.D. thesis, Dept. of Music, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Oppenheim AV, Schafer RW (1975) Digital Signal Processing. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Ortmannn O (1926) On the melodic relativity of tones. Psychol Monogr 35(1): 1–47.

    Google Scholar 

  • Parsons TW (1976) Separation of speech from interfering noise by means of harmonic selection. J Acoust Soc Am 60(4):911–918.

    Google Scholar 

  • Patterson RD (1987) A pulse ribbon model of peripheral auditory processing. In: Yost WA, Watson CS (eds) Auditory Processing of Complex Sounds. Hillsdale, NJ: Erlbaum, pp. 167–179.

    Google Scholar 

  • Pickles JO (1988) An Introduction to the Physiology of Hearning. London: Academic Press.

    Google Scholar 

  • Pierce JR (1983) The Science of Musical Sound. New York: Freeman.

    Google Scholar 

  • Rabiner LR, Gold B (1975) Theory and Application of Digital Signal Processing. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  • Rand TC (1974) Dichotic release from masking for speech. J Acoust Soc Am 55(3):678–680.

    PubMed  CAS  Google Scholar 

  • Rasch RA (1978) The perception of simultaneous notes such as in polyphonic music. Acustica 40:21–33.

    Google Scholar 

  • Rasch RA (1979) Synchronization in performed ensemble music. Acustica 43: 121–131.

    Google Scholar 

  • Reynolds R (1983) Archipelago. New York: C. F. Peters.

    Google Scholar 

  • Rhode WS, Smith PH (1986) Encoding timing and intensity in the ventral cochlear nucleus of the cat. J Neurophysiol (Bethesda) 56(2):261–286.

    CAS  Google Scholar 

  • Schooneveldt GP, Moore BCJ (1987) Comodulation masking release (CMR): effects of signal frequency, flanking-band frequency, masker bandwidth, flanking-band level, and monotic versus dichotic presentation of the flanking band. J Acoust Soc Am 82(6): 1944–1956.

    PubMed  CAS  Google Scholar 

  • Schooneveldt GP, Moore BCJ (1988) Failure to obtain comodulation masking release with frequency-modulated maskers. J Acoust Soc Am 83(6):2290–2292.

    PubMed  CAS  Google Scholar 

  • Schreiner CE, Langner G (1988) Coding of temporal patterns in the central auditory nervous system. In: Edelman GM, Gall WE, Cowan WM (eds) Auditory Function. New York: Wiley, pp. 337–361.

    Google Scholar 

  • Schreiner CE, Mendelson JR (1990) Functional topography of cat primary auditory cortex: distribution of integrated excitation. J Neurophysiol (Bethesda) 64(5): 1442–1459.

    CAS  Google Scholar 

  • Schreiner CE, Urbas JV (1986) Representation of amplitude modulation in the auditory cortex of the cat. I. Anterior auditory field. Hear Res 21:227–241.

    PubMed  CAS  Google Scholar 

  • Schreiner CE, Urbas JV (1988) Representation of amplitude modualtion in the auditory cortex of the cat. II. Comparison between cortical fields. Hear Res 32:49–64.

    PubMed  CAS  Google Scholar 

  • Schroeder MR (1968) Period histogram and product spectrum: new methods for fundamental-frequency measurement. J Acoust Soc Am 43(4):829–834.

    PubMed  CAS  Google Scholar 

  • Schwede GW (1983) An algorithm and architecture for constant-Q spectrum analysis. Proc IEEE Int Conf Acoust Speech Sig Proc 3:1384–1387.

    Google Scholar 

  • Seneff S (1988) A joint-synchrony/mean-rate model of auditory speech processing. J Phonet 16:55–76.

    Google Scholar 

  • Serra X (1988) An Environment for the Analysis, Transformation, and Resynthesis of Music Souds. Ph.D. thesis, Dept. of Music, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Shepard RN (1982) Geometrical approximations to the structure of musical pitch. Psychol Rev 89:305–333.

    PubMed  CAS  Google Scholar 

  • Shepard RN (1989) Internal representation of universal regularities: a challenge for connectionism. In: Nadel L, et al. (eds) Neural Connections, Mental Computation. Cambridge: MIT Press, pp. 104–134.

    Google Scholar 

  • Slaney M (1988) Lyon’s cochlear model. Technical Report 13, Apple Computer. Available from the Apple Corporate Library, Cupertino, CA 95014.

    Google Scholar 

  • Slaney M (1990) Interactive signal processing documents. IEEE ASSP Mag 7(2):8–20.

    Google Scholar 

  • Suga N (1990) Cortical computational maps for auditory imaging. Neural Networkds 3:3–21.

    Google Scholar 

  • Terhardt E (1972) Zur Tonhöhenwahrnehmung von Klängen II: Ein Funktionsschema. Acustica 26:187–199.

    Google Scholar 

  • van Noorden LPAS (1975) Temporal Coherence in the Perception of Time Sequences. Ph.D. thesis, Technische Hogeschool Eindhoven, Netherlands.

    Google Scholar 

  • van Noorden LPAS (1977) Minimum differences of level and frequency for perceptural fission of tone sequences ABAB. J Acoust Soc Am 61(4): 1041–1045.

    PubMed  Google Scholar 

  • von der Malsburg C (1986) Am I thinking assemblies? In: Palm G, Aertsen A (eds) Brain Theory. Berlin: Springer, pp. 161–176.

    Google Scholar 

  • von der Malsburg C, Schneider W (1986) A neural cocktail-party processor. Biol Cybern 54:29–40.

    PubMed  Google Scholar 

  • Wang K, Shamma S (1995) Auditory analysis of spectro-temporal information in acoustic signals. IEEE Engineering in Medicine and Biol 14(2): 186–194.

    Google Scholar 

  • Warren RM (1982) Auditory Perception: A New Synthesis. New York: Pergamon Press.

    Google Scholar 

  • Warren WH Jr, Verbrugge RR (1984) Auditory perception of breaking and bouncing events. J Exp Psychol Hum Percept Perform 10(5):704–712.

    PubMed  Google Scholar 

  • Weintraub M (1985) A Theory and Computational Model of Auditory Monaural Sound Separation. Ph.D. thesis, Stanford University, Palo Alto, CA.

    Google Scholar 

  • Wessel DL (1979) Timbre space as a musical control structure. Comp Music J 3(2):45–52.

    Google Scholar 

  • Whitfield IC, Evans EF (1965) Responses of auditory cortical neurons to stimuli of changing frequency. J Neurophysiol (Bethesda) 28:655–672.

    CAS  Google Scholar 

  • Wightman FL (1973) The pattern-transformation model of pitch. J Acoust Soc Am 54(2):407–416.

    PubMed  CAS  Google Scholar 

  • Wise JD, Caprio JR, Parks TW (1976) Maximum likelihood pitch estimation. IEEE Trans Acoust Speech Sig Proc 24(5):418–423.

    Google Scholar 

  • Yin TC, Chan JCK (1988) Neural mechanisms underlying interaural time senstivity to tones and noise. In: Edelman GM, Gall WE, Cowan WM (eds) Auditory Function: Neurobiological Bases of Hearing. New York: Wiley, pp. 385–430.

    Google Scholar 

  • Young ED, Shofner WP, White JA, Robert J-M, Voigt HF (1988) Response properties of cochlear nucleus neurons in relationship to physiological mechanisms. In: Edelman GM, Gall WE, Cowan WM (eds) Auditory Function: Neurobiological Bases of Hearing. New York: Wiley, pp. 277–312.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

Mellinger, D.K., Mont-Reynaud, B.M. (1996). Scene Analysis. In: Hawkins, H.L., McMullen, T.A., Popper, A.N., Fay, R.R. (eds) Auditory Computation. Springer Handbook of Auditory Research, vol 6. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-4070-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-4070-9_7

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-8487-1

  • Online ISBN: 978-1-4612-4070-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics