Advertisement

Timbre Recognition and Sound Source Identification

  • Trevor R. AgusEmail author
  • Clara Suied
  • Daniel Pressnitzer
Chapter
Part of the Springer Handbook of Auditory Research book series (SHAR, volume 69)

Abstract

The ability to recognize many sounds in everyday soundscapes is a useful and impressive feature of auditory perception in which timbre likely plays a key role. This chapter discusses what is known of timbre in the context of sound source recognition. It first surveys the methodologies that have been used to characterize a listener’s ability to recognize sounds and then examines the types of acoustic cues that could underlie the behavioral findings. In some studies, listeners were directly asked to recognize familiar sounds or versions of them that were truncated, filtered, or distorted by other resynthesis methods that preserved some cues but not others. In other studies, listeners were exposed to novel sounds, and the build-up of cues over time or the learning of new cues was tracked. The evidence currently available raises an interesting debate that can be articulated around two qualitatively different hypotheses: Are sounds recognized through distinctive features unique to each sound category (but of which there would need to be many to cover all recognized categories) or rather, are sounds recognized through a relatively small number of perceptual dimensions in which different sounds have their own recognizable position?

Keywords

Acoustic cues Auditory memory Auditory sketching Perceptual learning Psychomechanics Resynthesis Reverse correlation Textures Timbre 

Notes

Acknowledgements

DP was supported by the ANR grants ANR-10-LABX-0087 and ANR-10-IDEX- 0001-02, and by the European Research Council (ERC ADAM No. 295603).

Compliance with Ethics Requirements

Trevor Agus declares that he has no conflict of interest.

Clara Suied declares that she has no conflict of interest.

Daniel Pressnitzer declares that he has no conflict of interest.

References

  1. Agus TR, Thorpe SJ, Pressnitzer D (2010) Rapid formation of robust auditory memories: insights from noise. Neuron 66:610–618CrossRefGoogle Scholar
  2. Agus TR, Suied C, Thorpe SJ, Pressnitzer D (2012) Fast recognition of musical sounds based on timbre. J Acoust Soc America 131:4124–4133.  https://doi.org/10.1121/1.3701865CrossRefGoogle Scholar
  3. Agus TR, Paquette S, Suied C et al (2017) Voice selectivity in the temporal voice area despite matched low-level acoustic cues. Sci Rep 7:11526.  https://doi.org/10.1038/s41598-017-11684-1CrossRefPubMedPubMedCentralGoogle Scholar
  4. Andrillon T, Kouider S, Agus T, Pressnitzer D (2015) Perceptual learning of acoustic noise generates memory-evoked potentials. Curr Biol 25.  https://doi.org/10.1016/j.cub.2015.09.027CrossRefGoogle Scholar
  5. Backhaus VH (1932) Über die Bedeutung der Ausgleichsvorgänge in der Akustik. Z Tech Phys 13:31–46Google Scholar
  6. Beauchamp JW (1975) Analysis and synthesis of cornet tones using nonlinear interharmonic relationships. J Aud Eng Soc 23:778–795Google Scholar
  7. Belin P, Zatorre RJ, Lafaille P et al (2000) Voice-selective areas in human auditory cortex. Nature 403:309–312.  https://doi.org/10.1038/35002078CrossRefPubMedGoogle Scholar
  8. Benade AH (1990) Fundamentals of musical acoustics. Dover Publications, New YorkGoogle Scholar
  9. Berger KW (1964) Some factors in the recognition of timbre. J Acoust Soc Am 36:1888–1891.  https://doi.org/10.1121/1.1919287CrossRefGoogle Scholar
  10. Bigand E, Delbé C, Gérard Y, Tillmann B (2011) Categorization of extremely brief auditory stimuli: domain-specific or domain-general processes? PLoS One 6:e27024.  https://doi.org/10.1371/journal.pone.0027024CrossRefPubMedPubMedCentralGoogle Scholar
  11. Brimijoin OW, Akeroyd MA, Tilbury E, Porr B (2013) The internal representation of vowel spectra investigated using behavioral response-triggered averaging. J Acoust Soc Am 133:EL118–EL122.  https://doi.org/10.1121/1.4778264CrossRefPubMedGoogle Scholar
  12. Cavanagh P (2005) The artist as neuroscientist. Nature 434:301–307CrossRefGoogle Scholar
  13. Chi T, Ru P, Shamma SA (2005) Multiresolution spectrotemporal analysis of complex sounds. J Acoust Soc Am 118:887–906CrossRefGoogle Scholar
  14. Clark M, Luce D, Abrams R et al (1963) Preliminary experiments on the aural significance of parts of tones of orchestral instruments and on choral tones. J Aud Eng Soc 11:45–54Google Scholar
  15. Coath M, Denham SL (2005) Robust sound classification through the representation of similarity using response fields derived from stimuli during early experience. Biol Cybern 93:22–30.  https://doi.org/10.1007/s00422-005-0560-4CrossRefPubMedGoogle Scholar
  16. Delorme A, Richard G, Fabre-Thorpe M (2000) Ultra-rapid categorisation of natural scenes does not rely on colour cues: a study in monkeys and humans. Vis Res 40:2187–2200.  https://doi.org/10.1016/S0042-6989(00)00083-3CrossRefPubMedGoogle Scholar
  17. Elliott CA (1975) Attacks and releases as factors in instrument identification. J Res Mus Ed 23:35–40.  https://doi.org/10.2307/3345201CrossRefGoogle Scholar
  18. Elliott TM, Hamilton LS, Theunissen FE (2013) Acoustic structure of the five perceptual dimensions of timbre in orchestral instrument tones. J Acoust Soc Am 133:389–404.  https://doi.org/10.1121/1.4770244CrossRefPubMedPubMedCentralGoogle Scholar
  19. Formisano E, De Martino F, Bonte M, Goebel R (2008) “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322:970–973CrossRefGoogle Scholar
  20. Fritz J, Shamma S, Elhilali M, Klein D (2003) Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat Neurosci 6:1216–1223CrossRefGoogle Scholar
  21. Giordano BL, McAdams S (2006) Material identification of real impact sounds: effects of size variation in steel, glass, wood, and plexiglass plates. J Acoust Soc Am 119:1171–1181.  https://doi.org/10.1121/1.2149839CrossRefPubMedGoogle Scholar
  22. Giordano BL, McAdams S (2010) Sound source mechanics and musical timbre perception: evidence from previous studies. Music Percept 28:155–168CrossRefGoogle Scholar
  23. Gosselin F, Schyns PG (2001) Bubbles: a technique to reveal the use of information in recognition tasks. Vis Res 41:2261–2271.  https://doi.org/10.1016/S0042-6989(01)00097-9CrossRefPubMedGoogle Scholar
  24. Gray GW (1942) Phonemic microtomy: the minimum duration of perceptible speech sounds. Commun Monogr 9:75–90Google Scholar
  25. Grey JM (1977) Multidimensional perceptual scaling of musical timbres. J Acoust Soc Am 61:1270–1277CrossRefGoogle Scholar
  26. Grey JM, Moorer JA (1977) Perceptual evaluations of synthesised musical instrument tones. J Acoust Soc Am 62:454–462CrossRefGoogle Scholar
  27. Gygi B, Kidd GR, Watson CS (2004) Spectral-temporal factors in the identification of environmental sounds. J Acoust Soc Am 115:1252–1265CrossRefGoogle Scholar
  28. Gygi B, Kidd GR, Watson CS (2007) Similarity and categorization of environmental sounds. Percept Psychophys 69:839–855CrossRefGoogle Scholar
  29. Gygi B, Shafiro V (2011) The incongruency advantage for environmental sounds presented in natural auditory scenes. J Exp Psychol Hum Percept Perform 37:551–565.  https://doi.org/10.1037/a0020671CrossRefPubMedPubMedCentralGoogle Scholar
  30. Helmholtz H (1877) Die Lehre von den Tonempfindungen als physiologische Grundlage für die Theorie der Musik, 4th edn. F. Vieweg und Sohn, Braunschweig. English edition: Helmholtz H (1954) On the sensations of tone as a physiological basis for the theory of music (trans: Ellis AJ), 2nd edn. Dover, New YorkGoogle Scholar
  31. Hjortkjær J, McAdams S (2016) Spectral and temporal cues for perception of material and action categories in impacted sound sources. J Acoust Soc Am 140:409–420.  https://doi.org/10.1121/1.4955181CrossRefPubMedGoogle Scholar
  32. Hjortkjær J, Kassuba T, Madsen KH et al (2018) Task-modulated cortical representations of natural sound source categories. Cereb Cortex 28:295–306.  https://doi.org/10.1093/cercor/bhx263CrossRefPubMedGoogle Scholar
  33. Hromádka T, Zador AM (2009) Representations in auditory cortex. Curr Opin Neurobiol 19:430–433.  https://doi.org/10.1016/S0959-4388(09)00096-8CrossRefPubMedPubMedCentralGoogle Scholar
  34. Isnard V, Taffou M, Viaud-Delmon I, Suied C (2016) Auditory sketches: very sparse representations of sounds are still recognizable. PLoS One 11.  https://doi.org/10.1371/journal.pone.0150313CrossRefGoogle Scholar
  35. Iverson P, Krumhansl CL (1993) Isolating the dynamic attributes of musical timbre. J Acoust Soc Am 94:2595–2603.  https://doi.org/10.1121/1.407371CrossRefPubMedGoogle Scholar
  36. Kang H, Agus TR, Pressnitzer D (2017) Auditory memory for random time patterns. J Acoust Soc Am 142:2219–2232.  https://doi.org/10.1121/1.5007730CrossRefPubMedGoogle Scholar
  37. Kell AJE, Yamins DLK, Shook EN et al (2018) A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron 98:630–644.  https://doi.org/10.1016/j.neuron.2018.03.044CrossRefPubMedGoogle Scholar
  38. Kumar S, Bonnici HM, Teki S et al (2014) Representations of specific acoustic patterns in the auditory cortex and hippocampus. Proc R Soc B Biol Sci 281:20141000.  https://doi.org/10.1098/rspb.2014.1000CrossRefGoogle Scholar
  39. Lakatos S, McAdams S, Causse R (1997) The representation of auditory source characteristics: simple geometric form. Percept Psychophys 59:1180–1190CrossRefGoogle Scholar
  40. Leaver AM, Rauschecker JP (2010) Cortical representation of natural complex sounds: effects of acoustic features and auditory object category. J Neurosci 30(22):7604–7612.  https://doi.org/10.1523/JNEUROSCI.0296-10.2010CrossRefPubMedPubMedCentralGoogle Scholar
  41. Lewis JW, Wightman FL, Brefczynski JA et al (2004) Human brain regions involved in recognizing environmental sounds. Cereb Cortex 14:1008–1021.  https://doi.org/10.1093/cercor/bhh061bhh061CrossRefPubMedGoogle Scholar
  42. Luo S, Zhu L, Althoefer K, Liu H (2017) Knock-knock: acoustic object recognition by using stacked denoising autoencoders. Neurocomputing 267:18–24.  https://doi.org/10.1016/j.neucom.2017.03.014CrossRefGoogle Scholar
  43. McAdams S (1993) Recognition of sound sources and events. In: McAdams S, Bigand E (eds) Thinking in sound: the cognitive psychology of human audition. Oxford University Press, Oxford, pp 146–198CrossRefGoogle Scholar
  44. McAdams S, Winsberg S, Donnadieu S et al (1995) Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes. Psychol Res 58:177–192.  https://doi.org/10.1007/BF00419633CrossRefPubMedGoogle Scholar
  45. McAdams S, Beauchamp JW, Meneguzzi S (1999) Discrimination of musical instrument sounds resynthesized with simplified spectrotemporal parameters. J Acoust Soc Am 105:882–897.  https://doi.org/10.1121/1.426277CrossRefPubMedGoogle Scholar
  46. McDermott JH, Simoncelli EP (2011) Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71:926–940.  https://doi.org/10.1016/j.neuron.2011.06.032CrossRefPubMedPubMedCentralGoogle Scholar
  47. McDermott JH, Schemitsch M, Simoncelli EP (2013) Summary statistics in auditory perception. Nat Neurosci 16:493–498.  https://doi.org/10.1038/nn.3347CrossRefPubMedPubMedCentralGoogle Scholar
  48. McPherson MJ, McDermott JH (2018) Diversity in pitch perception revealed by task dependence. Nat Hum Behav 2:52–66.  https://doi.org/10.1038/s41562-017-0261-8CrossRefPubMedGoogle Scholar
  49. Newton MJ, Smith LS (2012) A neurally inspired musical instrument classification system based upon the sound onset. J Acoust Soc Am 131:4785–4798.  https://doi.org/10.1121/1.4707535CrossRefPubMedGoogle Scholar
  50. Norman-Haignere S, Kanwisher NG, McDermott JH (2015) Distinct cortical pathways for music and speech revealed by hypothesis-free voxel decomposition. Neuron 88:1281–1296.  https://doi.org/10.1016/j.neuron.2015.11.035CrossRefPubMedPubMedCentralGoogle Scholar
  51. Occelli F, Suied C, Pressnitzer D et al (2016) A neural substrate for rapid timbre recognition? Neural and behavioral discrimination of very brief acoustic vowels. Cereb Cortex 26:2483–2496.  https://doi.org/10.1093/cercor/bhv071CrossRefPubMedGoogle Scholar
  52. Ogg M, Slevc LR, Idsardi WJ (2017) The time course of sound category identification : insights from acoustic features. J Acoust Soc Am 142:3459–3473CrossRefGoogle Scholar
  53. Patil K, Pressnitzer D, Shamma S, Elhilali M (2012) Music in our ears: the biological bases of musical timbre perception. PLoS Comput Biol 8(11):e1002759.  https://doi.org/10.1371/journal.pcbi.1002759CrossRefGoogle Scholar
  54. Pressnitzer D, Agus T, Suied C (2015) Acoustic timbre recognition. In: Jaeger D., Jung R. (eds) Encyclopedia of computational neuroscience. Springer, New York, pp. 128–133Google Scholar
  55. Risset J-C, Wessel DL (1999) Exploration of timbre by analysis and synthesis. In: The psychology of music, pp 113–169.  https://doi.org/10.1016/B978-012213564-4/50006-8CrossRefGoogle Scholar
  56. Robinson K, Patterson RD (1995) The duration required to identify the instrument, the octave, or the pitch chroma of a musical note. Music Percept 13:1–15.  https://doi.org/10.2307/40285682CrossRefGoogle Scholar
  57. Rosch E, Mervis CB (1975) Family resemblances: studies in the internal structure of categories. Cogn Psychol 7:573–605.  https://doi.org/10.1016/0010-0285(75)90024-9CrossRefGoogle Scholar
  58. Saldanha EL, Corso JF (1964) Timbre cues and the identification of musical instruments. J Acoustic Soc Am 36:2021–2026CrossRefGoogle Scholar
  59. Shannon RV, Zeng FG, Kamath V et al (1995) Speech recognition with primarily temporal cues. Science 270:303–304CrossRefGoogle Scholar
  60. Sidwell A, Summerfield Q (1985) The effect of enhanced spectral contrast on the internal representation of vowel-shaped noise. J Acoust Soc Am 78:495–506CrossRefGoogle Scholar
  61. Siedenburg K, Fujinaga I, McAdams S (2016) A comparison of approaches to timbre descriptors in music information retrieval and music psychology. J New Mus Res 45:27–41.  https://doi.org/10.1080/09298215.2015.1132737CrossRefGoogle Scholar
  62. Siedenburg K, McAdams S (2017) Four distinctions for the auditory “wastebasket” of timbre. Front Psychol 8:1747.  https://doi.org/10.3389/fpsyg.2017.01747
  63. Suied C, Agus TR, Thorpe SJ, Pressnitzer D (2013a) Processing of short auditory stimuli: the rapid audio sequential presentation paradigm (RASP). In: Basic aspects of hearing. Springer, New York, pp 443–451CrossRefGoogle Scholar
  64. Suied C, Drémeau A, Pressnitzer D, Daudet L (2013b) Auditory sketches: sparse representations of sounds based on perceptual models. In: Aramaki M, Barthet M, Kronland-Martinet R, Ystad S (eds) From sounds to music and emotions. CMMR 2012. Lecture Notes in Computer Science. Springer, Berlin/HeidelbergGoogle Scholar
  65. Suied C, Agus TR, Thorpe SJ et al (2014) Auditory gist: recognition of very short sounds from timbre cues. J Acoust Soc Am 135:1380–1391.  https://doi.org/10.1121/1.4863659CrossRefPubMedGoogle Scholar
  66. Thoret E, Depalle P, McAdams S (2016) Perceptually salient spectrotemporal modulations for recognition of sustained musical instruments. J Acoust Soc Am 140:EL478–EL483.  https://doi.org/10.1121/1.4971204CrossRefPubMedGoogle Scholar
  67. Thoret E, Depalle P, McAdams S (2017) Perceptually salient regions of the modulation power spectrum for musical instrument identification. Front Psychol 8:587.  https://doi.org/10.3389/fpsyg.2017.00587
  68. Tomasino B, Canderan C, Marin D et al (2015) Identifying environmental sounds: a multimodal mapping study. Front Hum Neurosci 9:567.  https://doi.org/10.3389/fnhum.2015.00567
  69. Venezia JH, Hickok G, Richards VM (2016) Auditory “bubbles”: efficient classification of the spectrotemporal modulations essential for speech intelligibility. J Acoust Soc Am 140:1072–1088.  https://doi.org/10.1121/1.4960544CrossRefPubMedPubMedCentralGoogle Scholar
  70. Viemeister NF, Wakefield GH (1991) Temporal integration and multiple looks. J Acoust Soc Am 90:858–865CrossRefGoogle Scholar
  71. Warren WH, Verbrugge RR (1984) Auditory perception of breaking and bouncing events: a case study in ecological acoustics. J Exp Psychol Hum Percept Perform 10:704–712.  https://doi.org/10.1037/0096-1523.10.5.704CrossRefPubMedGoogle Scholar
  72. Wedin L, Goude G (1972) Dimension analysis of the perception of instrumental timbre. Scand J Psychol 13:228–240.  https://doi.org/10.1111/j.1467-9450.1972.tb00071.xCrossRefPubMedGoogle Scholar
  73. Wun S, Horner A, Wu B (2014) Effect of spectral centroid manipulation on discrimination and identification of instrument timbres. J Aud Eng Soc 62:575–583.  https://doi.org/10.17743/jaes.2014.0035CrossRefGoogle Scholar
  74. Yonelinas AP (2002) The nature of recollection and familiarity: a review of 30 years of research. J Mem Lang 46:441–517.  https://doi.org/10.1006/jmla.2002.2864CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Trevor R. Agus
    • 1
    Email author
  • Clara Suied
    • 2
  • Daniel Pressnitzer
    • 3
  1. 1.School of Arts, English and LanguagesQueen’s University BelfastBelfastUK
  2. 2.Département de Neurosciences et Sciences CognitivesInstitut de recherche biomédicale des arméesBrétigny-sur-OrgeFrance
  3. 3.Laboratoire des Systèmes Perceptifs, Département d’études CognitivesParis Science & Lettres – PSL University, Centre national de la recherche scientifiqueParisFrance

Personalised recommendations