Prolegomena to Music Semantics


We argue that a formal semantics for music can be developed, although it will be based on very different principles from linguistic semantics and will yield less precise inferences. Our framework has the following tenets: (i) Music cognition is continuous with normal auditory cognition. (ii) In both cases, the semantic content derived from an auditory percept can be identified with the set of inferences it licenses on its causal sources, analyzed in appropriately abstract ways (e.g. as ‘voices’ in some Western music). (iii) What is special about music semantics is that it aggregates inferences based on normal auditory cognition with further inferences drawn on the basis of the behavior of voices in tonal pitch space (through more or less stable positions, for instance). (iv) This makes it possible to define an inferential semantics but also a truth-conditional semantics for music. In particular, a voice undergoing a musical movement m is true of an object undergoing a series of events e just in case there is a certain structure-preserving map between m and e. (v) Aspects of musical syntax (notably Lerdahl and Jackendoff’s ‘time-span reductions’) might be derivable on semantic grounds from an event mereology (‘partology’), which also explains some cases in which tree structures are inadequate for music (overlap, ellipsis). (vi) Intentions and emotions may be attributed at several levels (the source, the musical narrator, the musician), and we speculate on possible explanations of the special relation between music and emotions.

This is a preview of subscription content, access via your institution.


  1. 1.

    This notion of semantics corresponds to what Koelsch 2012 calls ‘extra-musical meaning’.

  2. 2.

    For a recent critical discussion of some ‘no semantics’ views, see Berg Larsen 2017 (p. 28), who cites (and seeks to refute) the following opinion by Kivy 1990: “in the long run syntax without semantics must completely defeat linguistic interpretation. And although musical meaning may exist as a theory, it does not exist as a reality of listening”.

  3. 3.

    The term ‘virtual source’ is due to Bregman, e.g. Bregman 1994. See also Nudds 2007 for an analysis of auditory cognition in terms of source perception.

  4. 4.

    Peirce’s tripartition includes icons, indices and symbols. Indices are representations “whose relation to their objects consists in a correspondence in fact” (Atkin 2013; Peirce 1868). By contrast, icons involve a ‘likeness’ between the representations and their objects. Whether musical signs in our analysis should also be treated as icons depends on how ‘likeness’ is defined; we come back to this point in Section 7.2. See also Koelsch 2011, 2012 for a discussion of Peirce’s tripartition in a musical context.

  5. 5.

    With the addition of tonal inferences, we move further away from an analysis in terms of mere Peircian indices.

  6. 6.

    While the notion of ‘energy’ should be further explicated, we can rely at this point on an intuitive notion of folk psychology, according to which objects are taken to have different levels of energy depending on their movements and more generally on their behavior.

  7. 7.

    Our analysis builds on further insights that have been developed in the literature on music cognition, and which might also find a place in the present framework. One influential line of inquiry takes various semantic inferences to be based on the attribution of animacy and intentions to some musical elements such as pitches, chords, and motives (Lerdahl 2001; Maus 1988; Monahan 2013). A second but related line takes important semantic inferences to be triggered by sound properties found in animal signals and/or in human speech (Cook 2007; Cross and Woodruff 2008; Blumstein et al. 2012; Bowling et al. 2010; Huron 2015; Ilie and Thompson 2006, and Juslin and Laukka 2003). Both directions are compatible with Bregman’s general enterprise, and our source-based semantics makes important use of their insights, but in the general case it does not require that the virtual sources should be animate. A third line of investigation takes music to trigger inferences about movement (Clarke 2001; Eitan and Granot 2006; Larson 2012; Saslaw 1996) – which is compatible with the analysis of musical meaning as a ‘journey through tonal pitch space’. Our source-based semantics allows the virtual sources to move in space, but it allows for many other types of events as well. Relatedly, a fourth line of investigation takes certain properties of music – e.g. the ‘final ritard’ that signals the end of a piece – to imitate properties of forces and friction in the natural world (Desain and Honing 1996; Honing 2003; Larson 2012). Within our framework, these are particular ways of triggering semantic inferences, but there are many others as well.

  8. 8.

    In Heider and Simmel’s animations, the interpretation involves attributions of agency and intentions – for instance a triangle may appear to have a destructive behavior. But further and more basic properties can be attributed to abstract shapes as well. As an example, Kominsky et al. 2017 showed subjects abstract animations involving several pairs of dots. In each pair, a moving dot collided at speed s into another dot at a standstill, which then started to move at speed s’. They showed that subjects were quicker to spot pairs in which the ratio s/s’ was 3/1 than pairs in which it was 1/3, and suggested a reason: a ratio of 3/1 is consistent with causal laws of elastic collision, whereas a ratio of 1/3 is not (an example in the ‘violation’ condition can be seen in AV00 In this case, subjects seem to take the dots to be indicative of events that obey certain physical laws of the external world.

  9. 9.

    Retrieved online on January 7, 2018 at Dynamics were re-established by A. Bonetto on the basis of the orchestral score.

  10. 10.

    Score retrieved online on January 8, 2018 at,_Op.30_(Strauss,_Richard).

  11. 11.

    Huron 2016 investigates the cognitive principles (primarily from Auditory Scene Analysis) behind voice leading principles. In his words (from the Introduction), “voice leading is a codified practice that helps musicians craft simultaneous musical lines so that each line retains its perceptual independence for an enculturated listener”.

  12. 12.

    A similar distinction is needed for non-musical sounds: we may perceive a car as approaching within a background of road-related noises that might not be as distinct. Similarly, an animal’s call may be perceived in a background of other noises, such as the rain falling or the wind blowing. See footnote 35 for a reference to Leonard Bernstein’s discussion of semantic inferences that are arguably licensed by the string accompaniment in Charles Ives’s Unanswered Question.

  13. 13.

    Unsurprisingly, in his Carnival of the Animals, Saint-Saëns uses the clarinet to represent a cuckoo [AV05] and the flutes to represent an aviary [AV06]. But some semantic effects are more subtle, as in Saint-Saëns’s use of flutes in the melody intended to evoke an aquarium [AV07]. Presumably the smooth and continuous sound produced by the flute helps evoke the movement of a marine animal; the less continuous sound of a piano would be less apt to do so.

  14. 14.

    As noted by R. Casati (p.c.), the effect is strengthened by the grace note (it might contribute to the understanding of the rebound).

  15. 15.

    Here and throughout, we follow standard musical convention in notating the contrabass part one octave higher than it sounds.

  16. 16.

    If we add a crude crescendo instead, and a final accent, the ending sounds more intentional, as if the source gradually gained stamina as it approaches its goal, and signaled its success with a triumphant spike of energy [AV16]. An intentional, triumphant effect is often produced by fortissimo endings, e.g. at the end of Beethoven’s Symphony No. 8 [AV17].

  17. 17.

    This is a sufficiently important inference that some animals apparently evolved mechanisms – specifically, laryngeal descent – to lower their vocal-tract resonant frequencies so as to exaggerate their perceived size (Fitch and Reby 2001).

  18. 18.

    In Charles Ives’s Unanswered Question, the repetition of the trumpet motive lends itself to a dialogical interpretation: a question is repeated several times in near-identical form, and answers are increasingly frustrating. We revisit this example in Section 9.3.

  19. 19.

    See Eitan and Granot 2006 for more specific methods designed to test the relation between music and movement.

  20. 20.

    We should note that while tonal inferences can only be understood by reference to the formal properties of tonal pitch space, they might well be grounded in some properties of normal auditory cognition, for instance in animal signals, human voices, or more general inferences relating consonance/dissonance to properties of the source; we briefly discuss some possibilities at the end of this section.

  21. 21.

    We set aside the case of auditory illusions with a contradictory content.

  22. 22.

    We sometimes contrast ‘real world events’ with ‘musical events’, but in all cases our world events are possibilia.

  23. 23.

    See for instance Larson (1995) and Schlenker (2010) for handbook summaries of the analysis of linguistic meaning as truth conditions.

  24. 24.

    The paradigm was not fully minimal, in the sense that further aspects of the sign tended to be modified as well. For more controlled paradigms, see Schlener, to appear.

  25. 25.

    Abstract animations that were designed to complement musical pieces would be particularly interesting to investigate in this connection. A nice example is offered by Mary Ellen Bute’s Tarantella [AV31], an abstract animation that was conceived in conjunction with piano music by Edwin Gerschefski. One could explore in future work the ways in which the music and the visual animation converge on a single semantic effect or not.

  26. 26.

    See for instance Sportiche et al. 2013 for a textbook introduction.

  27. 27.

    An essential issue for future research will be to determine to what extent the details of musical meaning affect musical structure. One could adhere to the remarks made above by taking them as a simple reconstruction of Lerdahl and Jackendoff’s Gestalt-based views. On this deflationary view, Gestalt principles of grouping arise from an attempt to recover the structure of the actual events that caused an auditory percept, and no reference to fictional sources is needed. But it could also be that the details of our semantics affect grouping structure. As an example, one could imagine that in a sequence > < (diminuendo followed by a crescendo), one will be more tempted to put a group boundary after > if it is realized with a strong rallentando, as this suggests that the source is dying out, and that the following crescendo (<) corresponds to a very different event and possibly to a different source. Such issues have yet to be investigated.

  28. 28.

    As an example, “in the span covering measure 2, the V6 is chosen over the V43, and proceeds for consideration in the span covering measures 1–2.; here it is less stable than the opening I, so it does not proceed to the next larger span; and so forth. As a result of this procedure, a particular time-span level produces a particular reductional level (the sequence of heads of the time-spans at that level).” (Lerdahl and Jackendoff 1983 p. 120)

  29. 29.

    Pictures extracted from an animation (endlessreference), retrieved online on January 13, 2018 at

  30. 30.

    Thanks to A. Bonetto for suggesting that we consider a version with triplets.

  31. 31.

    As E. Chemla and J. Katz (p.c.) note, further examples would be needed to ensure that the effect is due to a new note rather than to a new contour (since in (42)a the contour of the focused triplet is flat, just like that of its antecedent, whereas in (42)b,c the contour is not flat).

  32. 32.

    The sound examples were produced as follows: Bonetto produced (42)c on an electronic piano. (42)a,b were produced from the recorded version of (42)c via manipulations (with the software GarageBand).

  33. 33.

    As an example, consider a piece ending with a crescendo, which may often be interpreted as an intentional signal that a goal has been reached. If one artificially modifies a MIDI version of Mahler’s Frère Jacques in such a way that we add a crescendo by the horn on the very last part of the last note of the piece (a D) [AV35], we get an effect which is in remarkably bad taste, but is easily interpretable: the horn player is triumphantly indicating that the end of the piece has finally been reached. In this case, the voices all finish diminuendo, so that this final crescendo can’t coherently be attributed to the virtual sources. Nor is it natural to think that the narrator intends this final crescendo, which contradicts the musical intention that can be inferred from the diminunendo of the last bars (the procession is moving away, or at least its sound is gradually dying out). Thus one can only attribute this triumphant outburst to the musician – which also explains why the effect is in such bad taste.

  34. 34.

    In one of his Young People’s Concerts devoted to Charles Ives, Leonard Bernstein (1967) discusses The Unanswered Question in insightful terms [AV36] (he also adds a meta-musical reinterpretation of Ives’s Unanswered Question, replacing the ‘question of existence’ with the question: ‘Whither music?’; this is of no relevance here.)

  35. 35.

    This section solely seeks to establish the main distinctions as they relate to music semantics; we do not do justice to the vast literature on emotions in music and in art in general (see Juslin and Sloboda 2010 for a collection of survey articles on music and emotion).

  36. 36.

    Bowling et al. 2012 further claim that interval size is correlated with affect in language and in music: “in both Tamil and English speech negative/subdued affect is characterized by relatively small prosodic intervals, whereas positive/excited affect is characterized by relatively large prosodic intervals”; similarly, in both Carnatic and Western music melodic intervals “are generally larger in melodies associated with positive/excited emotion, and smaller in melodies associated with negative/subdued emotion”.

  37. 37.

    Due to the role of dissonance in some phenomena of the natural world, it is not always clear whether certain tonal/atonal effects should be attributed to normal auditory cognition or to the interaction of the voices with tonal pitch space. It does not follow from this that the latter component can be eliminated, since the details of tonal pitch space are not given by normal auditory cognition, and may be culturally determined as well.

  38. 38.

    Aucouturier et al. 2016 also manipulated their subjects’ voice in real time with similar means (e.g. addition of a vibrato). Spectacularly, they showed that subjects hearing their own manipulated voice through earphones mostly fail to detect something abnormal, but that the manipulation nonetheless affects their own emotional state – as if they monitored it by way of voice cues.

  39. 39.

    Aucouturier et al.’s vibrato manipulation is illustrated in (i)b (“vibrato was sinusoidal with a depth of 15 cents and frequency of 8.5 Hz. Inflection had an initial pitch shift of +120 cents and a duration of 150 ms.” (p. 4).

    (i) Effect of voice manipulation on the perception of emotions (from Aucouturier et al. 2016)

    a. Natural male voice [French]

    b. ‘Afraid’ manipulation: it ‘operates on pitch using both vibrato and inflection’

  40. 40.

    As mentioned above, Cook 2007 and Bowling et al. 2010, 2012 seek to derive some semantic differences between minor and major chords from normal auditory cognition, and thus some of these effects might conceivably fall under the category of ‘normal auditory cognition’.

  41. 41.

    As Arthur Bonetto notes, this piano reduction is a compromise between the Hal Leonard version and the MIDI file we modified, though closer to Hermann’s original: note values are from the first source, one half step higher (to simplify our analysis and transformations); but the sixteenth note triplets and richer chords are from the second source.

  42. 42.

    In greater detail, the transformations were as follows:

    (i) From (50)a to (50)b: Bar 1: F# > G Bar 2: F# > G; B > Bb Bars 3–4/6–7: F > G; Gb > G; B > Bb Bar 5: C > D; B > Bb; Ab > G; Eb > D.

    (ii) From (50)a to (50)c: same as (i), but the boxed F > G in (i) becomes F > F# instead.

  43. 43.

    See Gabrielsson 2002 for a discussion of the possible relations between perceived and felt emotion, and Evans and Schubert 2008 for relevant experimental data.

  44. 44.

    Simon Boccanegra, Teatro La Fenice 2014–2015, conductor Myung-Whun Chung, RAI, with Simone Piazzola as Simon.

  45. 45.

    Languages in which objects name themselves are called ‘Lagadonian languages’ in the philosophical literature (e.g. Lewis 1986).

  46. 46.

    In more standard systems, propositional letters are true at certain possible worlds. Events are typically thought to be more fined-grained than worlds because distinct events co-exist in the same possible world. An event- rather than world-based semantics facilitates the comparison with our music semantics, which builds on the idea that sequences of musical notes or chords can be true of sequences of extra-musical events (treated as possibilia).

  47. 47.

    Pesetsky and Katz 2009 take prolongational reductions to be central to their ‘identity thesis’ for music and language. For them, time-span reductions share properties with prosodic structure in phonology, whereas prolongational reductions play the role of (and share properties with) syntactic structure. They further suggest that prolongational reductions need not be taken to be derivative from time-span reductions, as argued by Lerdahl and Jackendoff 1983 and Lerdahl 2001.

  48. 48.

    It would be interesting to explore in the future a closer correspondence between the musical and the visual domains, in particular by seeking visual counterparts of the various cues that trigger inferences about the sources, be they drawn from normal auditory cognition or from harmonic considerations. In this connection, simple systems of music visualization give rise to interesting abstract animations, but the transposition to a different modality only preserves some of the inferences triggered by the music. For instance, in its basic form, Stephen Malinowski’s ‘Music Animation Machine’ [] only encodes pitch and duration; loudness, for instance, is not represented. Coloration can be added to encode instrumentation or harmony. Harmonic coloring [] has thus been used to provide an animated rendition of Stravinsky’s Rite of Spring [AV50;]. But it is clear that even harmonic coloring only yields a crude (and not necessarily intuitive) encoding of the complex harmonic relations among notes and chords.

  49. 49.

    Thanks to B. Spector (p.c.) for raising this question.

  50. 50.

    Thanks to J. MacFarlane (p.c.) for raising this question.


  1. Arnal, L.H., A. Flinker, A. Kleinschmidt, A.L. Giraud, and D. Poeppel. 2015. Human screams occupy a privileged niche in the communication soundscape. Current Biology 25 (15): 2051–2056.

    Article  Google Scholar 

  2. Atkin, Albert. 2013. Peirce's theory of signs. In The Stanford encyclopedia of philosophy (Summer 2013 Edition), ed. Edward N. Zalta.

  3. Aucouturier, J.J., P. Johansson, L. Hall, R. Segnini, L. Mercadié, and K. Watanabe. 2016. Covert digital manipulation of vocal emotion alter speakers’ emotional states in a congruent direction. Proceedings of the National Academy of Sciences 113 (4): 948–953.

    Article  Google Scholar 

  4. Berg Larsen, Joakim. 2017. Conceptions of meaning in music. On the possibility of meaning in absolute music. Master’s thesis in philosophy, Arctic University of Norway. Retrieved online at on Jan 6 2018.

  5. Bernstein, Leonard. 1967. Charles Ives: American Pioneer. Young People's Concerts. Television series, February 23, 1967.

  6. Blumstein, Daniel T., Gregory A. Bryant, and Peter Kaye. 2012. The sound of arousal in music is context-dependent. Biology Letters 8: 744–747.

    Article  Google Scholar 

  7. Bonin, T.L., J.L. Trainor, M. Belyk, and P. Andrews. 2016. The source dilemma hypothesis: Perceptual uncertainty contributes to musical emotion. Cognition 154: 174–181.

    Article  Google Scholar 

  8. Bowling, D.L., K. Gill, J. Choi, J. Prinz, and D. Purves. 2010. Major and minor music compared to excited and subdued speech. Journal of the Acoustical Society of America 127: 491–503.

    Article  Google Scholar 

  9. Bowling, D.L., J. Sundararajan, S. Han, and D. Purves. 2012. Expression of emotion in Eastern and Western music mirrors vocalization. PLoS One 7 (3): e31942.

    Article  Google Scholar 

  10. Bregman, Albert S. 1994. Auditory scene analysis. Cambridge: MIT Press.

    Google Scholar 

  11. Charnavel, Isabelle. 2016. First steps towards a generative theory of dance cognition: grouping structures in dance perception. Manuscript, Harvard University.

  12. Clarke, Eric. 2001. Meaning and the specification of motion in music. Musicae Scientiae 5: 213–234.

    Article  Google Scholar 

  13. Cohn, N., R. Jackendoff, P.J. Holcomb, and G.R. Kuperberg. 2014. The grammar of visual narrative: neural evidence for constituent structure in sequential image comprehension. Neuropsychologia 64: 63–70.

    Article  Google Scholar 

  14. Cook, Norman D. 2007. The sound symbolism of major and minor harmonies. Music Perception 24: 315–319.

    Article  Google Scholar 

  15. Cross, I., and G.E. Woodruff. 2008. Music as a communicative medium. In The prehistory of language, vol. 1, ed. R. Botha and C. Knight, 113–144. Oxford: Oxford University Press.

    Google Scholar 

  16. Davidson, Donald. 1967. The logical form of action sentences. In The logic of decision and action, ed. N. Rescher, 81–94. Pittsburgh: University of Pittsburgh Press.

    Google Scholar 

  17. de Vries, Mark. 2013. Multidominance and locality. Lingua 134: 149–169.

    Article  Google Scholar 

  18. Desain, P., and H. Honing. 1996. Physical motion as a metaphor for timing in music: the final ritard. In Proceedings of the International Computer Music Conference (pp. 458–460). International Computer Association.

  19. Eitan, Zohar, and Roni Y. Granot. 2006. How music moves. Music Perception 23 (3): 221–247.

    Article  Google Scholar 

  20. Evans, P., and E. Schubert. 2008. Relationships between expressed and felt emotions in music. Musicae Scientiae 12: 75–99.

    Article  Google Scholar 

  21. Fitch, Tecumseh W., and D. Reby. 2001. The descended larynx is not uniquely human. Proceedings of the Royal Society of London. Series B 268: 1669–1675.

    Article  Google Scholar 

  22. Forte, Allen. 1959. Schenker's conception of musical structure. Journal of Music Theory. 3: 1–30.

    Article  Google Scholar 

  23. Gabrielsson, Alf, and Eric Lindström. 2010. The role of structure in the musical expression of emotions. In Handbook of music and emotion: theory, research, and applications, ed. P.N. Juslin and J.A. Sloboda, 367–400. Oxford: Oxford University Press.

    Google Scholar 

  24. Gabrielsson, A. 2002. Emotion perceived and emotion felt: Same or different? Musicae Scientiae, Special Issue, 123–147.

  25. Godoy, R.I., and M. Leman, eds. 2010. Musical gestures: sound, movement, and meaning. New York: Routledge.

    Google Scholar 

  26. Granroth-Wilding, Mark, and Mark Steedman. 2014. A robust parser-interpreter for jazz chord sequences. Journal of New Music Research 43 (4): 355–374.

    Article  Google Scholar 

  27. Greenberg, Gabriel. 2013. Beyond resemblance. Philosophical Review 122: 2.

    Article  Google Scholar 

  28. Grice, Paul. 1957. Meaning. The Philosophical Review 66: 377–388.

    Article  Google Scholar 

  29. Heider, F., and M. Simmel. 1944. An experimental study of apparent behavior. American Journal of Psychology 57: 243–259.

    Article  Google Scholar 

  30. Heim, Irene, and Angelika Kratzer. 1998. Semantics in generative grammar. Oxford: Blackwell.

    Google Scholar 

  31. Honing, H. 2003. The final ritard: on music, motion, and kinematic models. Computer Music Journal 27 (3): 66–72.

    Article  Google Scholar 

  32. Huron, David. 2006. Sweet anticipation: music and the psychology of expectation. Cambridge: MIT Press.

    Google Scholar 

  33. Huron, David. 2015. Cues and signals: An ethological approach to music-related emotion. In Music and meaning, annals of semiotics 6/2015, ed. Brandt and Carmo. Liège: Presses Universitaires de Liège.

    Google Scholar 

  34. Huron, David. 2016. Voice leading: the science behind a musical art. Cambridge: MIT Press.

    Google Scholar 

  35. Ilie, G., and W.F. Thompson. 2006. A comparison of acoustic cues in music and speech for three dimensions of affect. Music Perception 23: 319–329.

    Article  Google Scholar 

  36. Ives, Charles. 1908. Foreword to The unanswered question. Available online at

  37. Jackendoff, Ray. 1982. Semantics and cognition. Cambridge: MIT Press.

    Google Scholar 

  38. Jackendoff, Ray. 2009. Parallels and nonparallels between language and music. Music Perception 26 (3): 195–204.

    Article  Google Scholar 

  39. Juslin, P., and P. Laukka. 2003. Communication of emotions in vocal expression and music performance: different channels, same code? Psychological Bulletin 129 (5): 770–814.

    Article  Google Scholar 

  40. Juslin, P.N., and J.A. Sloboda, eds. 2010. Handbook of music and emotion: theory, research, and applications. Oxford: Oxford University Press.

    Google Scholar 

  41. Kivy, Peter. 1990. Music alone. Ithaca: Cornell University Press.

    Google Scholar 

  42. Koelsch, S. 2011. Towards a neural basis of processing musical semantics. Physics of Life Reviews 8 (2): 89–105.

    Google Scholar 

  43. Koelsch, S. 2012. Musical semantics. In Brain and music. Oxford: Wiley-Blackwell.

    Google Scholar 

  44. Koelsch, S., E. Kasper, D. Sammler, K. Schulze, T. Gunter, and A.D. Friederici. 2004. Music, language and meaning: brain signatures of semantic processing. Nature Neuroscience 7 (3): 302–307.

    Article  Google Scholar 

  45. Kominsky, J.F., B. Strickland, A.E. Wertz, C. Elsner, K. Wynn, and F.C. Keil. 2017. Categories and constraints in causal perception. Psychological Science 28 (11): 1649–1662.

    Article  Google Scholar 

  46. Kracht, Marcus. 2003. The mathematics of language (Studies in Generative Grammar, 63). Berlin: Mouton de Gruyter.

    Google Scholar 

  47. Larson, Richard. 1995. Semantics. In An invitation to cognitive science, vol. I: language, ed. N.D. Osherson, L. Gleitman, and M. Liberman. Cambridge: MIT Press.

    Google Scholar 

  48. Larson, Steve. 2012. Musical forces: motion, metaphor, and meaning in music. Bloomington: Indiana University Press.

    Google Scholar 

  49. Lemasson, Alban, Karim Ouattara, Hélène Bouchet, and Klaus Zuberbühler. 2010. Speed of call delivery is related to context and caller identity in Campbell’s monkey males. Naturwissenschaften 97 (11): 1023–1027.

    Article  Google Scholar 

  50. Lerdahl, Fred, and Ray Jackendoff. 1983. A generative theory of tonal music. Cambridge: MIT Press.

    Google Scholar 

  51. Lerdahl, Fred. 2001. Tonal pitch space. Oxford: Oxford University Press.

    Google Scholar 

  52. Lerdahl, Fred, and Carol L. Krumhansl. 2007. Modeling tonal tension. Music Perception 24 (4): 329–366.

    Article  Google Scholar 

  53. Lewis, David K. 1970. General semantics. Synthese 22(1/2): 18–67. Repr. 1983. In Philosophical papers, Vol. I: 189–229. Oxford: Oxford University Press.

  54. Lewis, David. 1979. Attitudes de dicto and de se. Philosophical Review 88 (4): 513–543.

    Article  Google Scholar 

  55. Lewis, David K. 1986. On the plurality of worlds. Oxford: Blackwell.

    Google Scholar 

  56. Longuet-Higgins, H.C. 1962a. Letter to a musical friend. The Music Review 23: 244–248.

    Google Scholar 

  57. Longuet-Higgins, H.C. 1962b. Second letter to a musical friend. The Music Review 23: 271–280.

    Google Scholar 

  58. Napoli, Donna Jo and Lisa Kraus. To appear. Suggestions for a parametric typology of dance. Leonardo. doi:

  59. Maus, Fred Everett. 1988. Music as drama. Music Theory Spectrum 10 (10th Anniversary Issue): 56–73.

    Article  Google Scholar 

  60. McDermott, J.H., A.J. Lehr, and A.J. Oxenham. 2010. Individual differences reveal the basis of consonance. Current Biology 20: 1035–1041.

    Article  Google Scholar 

  61. Meyer, L.B. 1956. Emotion and meaning in music. Chicago: University of Chicago Press.

    Google Scholar 

  62. Monahan, Seth. 2013. Action and agency revisited. Journal of Music Theory 57: 2.

    Article  Google Scholar 

  63. Nudds, Matthew. 2007. Auditory perception and sounds. Manuscript.

  64. Ohala, J.J. 1994. The frequency code underlies the sound-symbolic use of voice pitch. In Sound symbolism, ed. L. Hinton, J. Nichols, and J.J. Ohala, 325–347. Cambridge: Cambridge University Press.

    Google Scholar 

  65. Pankhurst, Tom. 2008. Schenkerguide. A brief handbook and website for Schenkerian analysis. New York: Routledge.

    Google Scholar 

  66. Patel-Grosz, Pritty, Patrick G. Grosz, Tejaswinee Kelkar, and Alexander Refsum Jensenius. 2017. Exploring the semantics of dance. Slides of a talk given at the Harvard Language & Cognition lab (LangCog), 14 Feb 2017.

  67. Peirce, Charles S. 1868. On a new list of categories. Proceedings of the American Academy of Arts and Sciences 7: 287–298.

    Article  Google Scholar 

  68. Pesetsky, David, and Jonah Katz. 2009. The identity thesis for music and language. Manuscript, MIT.

  69. Rohrmeier, Martin. 2011. Towards a generative syntax of tonal harmony. Journal of Mathematics and Music 5 (1): 35–53.

    Article  Google Scholar 

  70. Rooth, Mats. 1996. Focus. In Handbook of contemporary semantic theory, ed. Lappin Shalom, 271–297. Oxford: Blackwell.

    Google Scholar 

  71. Rosner, B.S., and E. Narmour. 1992. Harmonic closure: music theory and perception. Music Perception 9 (4): 383–411.

    Article  Google Scholar 

  72. Saslaw, Janna. 1996. Forces, containers, and paths: the role of body-derived image schemas in the conceptualization of music. Journal of Music Theory 40 (2): 217–243.

    Article  Google Scholar 

  73. Schlenker, Philippe. 2010. Semantics. In Linguistics encyclopedia, 3rd edition, ed. K. Malmkjaer, 462–477. Abingdon: Routledge.

    Google Scholar 

  74. Schlenker, Philippe. 2011. Indexicality and de se reports. In Semantics, Volume 2, Article 61, ed. Maienborn von Heusinger and Portner, 1561–1604. Berlin: Mouton de Gruyter.

    Google Scholar 

  75. Schlenker, Philippe. 2017. Outline of music semantics. Music Perception: An Interdisciplinary Journal 35 (1): 3–37.

    Article  Google Scholar 

  76. Schlenker, Philippe. To appear. Iconic pragmatics. Natural Language & Linguistic Theory.

  77. Schlenker, Philippe, Jonathan Lamberton, and Mirko Santoro. 2013. Iconic variables. Linguistics & Philosophy 36 (2): 91–149.

    Article  Google Scholar 

  78. Schwarzschild, Roger. 1999. GIVENness, AvoidF and other constraints on the placement of accent. Natural Language Semantics 7 (2): 141–177.

    Article  Google Scholar 

  79. Sievers, B., L. Polansky, M. Casey, and T. Wheatley. 2013. Music and movement share a dynamic structure that supports universal expressions of emotion. Proceedings of the National Academy of Sciences 110: 70–75.

    Article  Google Scholar 

  80. Sportiche, Dominique, Hilda Koopman, and Edward Stabler. 2013. An introduction to syntactic analysis and theory. Malden: Wiley-Blackwell.

    Google Scholar 

  81. Thompson, W.F., and L.L. Cuddy. 1992. Perceived key movement in four-voice harmony and single voices. Music Perception 9 (4): 427–438.

    Article  Google Scholar 

  82. Varzi, Achille. 2015. "Mereology", The Stanford Encyclopedia of Philosophy (Winter 2015 Edition), ed. Edward N. Zalta.

  83. Wolff, Francis. 2015. Pourquoi la musique? Fayard 2015.

  84. Zacks, Jeffrey M., Barbara Tversky, and Gowri Iyer. 2001. Perceiving, remembering, and communicating structure in events. Journal of Experimental Psychology: General 130: 29–58.

    Article  Google Scholar 

Download references


A summary of the main ideas can be found in Schlenker 2017, which greatly benefited from the critical comments of David Temperley and three anonymous referees for Music Perception. There are explicit overlaps between this earlier article and the present piece. I am very grateful to the Editors of Music Perception for allowing me to expand on the earlier article in the present piece. Critical comments on 'Outline' indirectly benefited the present piece as well. In addition, this article was greatly improved thanks to the very perceptive critical comments of two anonymous referees for Review of Philosophy & Psychology, as well as to numerous constructive suggestions by Editor Roberto Casati. Many thanks to all three of them. Remaining shortcomings are entirely my own.

For helpful conversations, I wish to thank Jean-Julien Aucouturier, John Bailyn, Karol Beffa, Arthur Bonetto, Laurent Bonnasse-Gahot, Clément Canonne, Emmanuel Chemla, Didier Demolin, Paul Egré, John Halle, Ray Jackendoff, Jonah Katz, Fred Lerdahl, John MacFarlane, Salvador Mascarenhas, Markus Neuwirth, Rob Pasternak, Claire Pelofi, Martin Rohrmeier, Benjamin Spector, Morton Subotnick, Francis Wolff, as well as audiences at New York University, SUNY Long Island, Ecole Normale Supérieure, the IRCAM workshop on 'Emotions and Archetypes: Music and Neurosciences' (June 8-9 2016, IRCAM, Paris), the Barenboim-Said Academy (June 12, 2017), and EPFL (December 4, 2017, Lausanne).  I learned much from initial conversations with Morton Subotnick before this project was conceived. Jonah Katz's presence in Paris a few years ago, and continued conversations with him, were extremely helpful. I also benefited from Emmanuel Chemla's insightful comments on many aspects of this project, as well as from Paul Egré's and Laurent-Bonnasse-Gahot very detailed comments on the long and/or on the short version of this piece.  Finally, I am grateful to Lucie Ravaux for practical help with the manuscript and references.


The research leading to these results received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement N°324115–FRONTSEM (PI: Schlenker). Research was conducted at Institut d’Etudes Cognitives, Ecole Normale Supérieure - PSL Research University. Institut d’Etudes Cognitives is supported by grants ANR-10-LABX-0087 IEC et ANR-10-IDEX-0001-02 PSL*.

Author information



Corresponding author

Correspondence to Philippe Schlenker.


Appendix I. Varieties of internal semantics

This Appendix discusses in greater detail the notion of an ‘internal semantics’ for music, briefly mentioned in Section 2.2.

Before we say a word about the ‘internal’ semantics in music, we consider how such a semantics can be constructed for a system as simple as the la li lu example of the Section 2.1. The key is that a syntactic system that has no semantics relating it to the external world can still be endowed with a semantics that pertains to the form of the expressions themselves. In (54), we have done so for the context-free grammar defined in the main text in (1)b. Just as is standard for human language, each step in a derivation tree is interpreted by a semantic step. The result is not exciting: each syllable denotes itself, and each sequence denotes itself as well, with the proviso that the interpretation procedure adds pauses between groups of 2 syllables.Footnote 45 Some simple examples are given in (55).

(54) a. Lexical semantics:

  • [[la]] = la

  • [[lu]] = lu

  • [[li]] = li

  • b. Compositional semantics

Notation: ˆ is used to represent concatenation of expressions; for strings s and s’, sˆ_ˆs’ denotes the concatenation of s and s’ with a pause in between.

For any words w, w’ of the lexicon Lex and for any sequences l and s of categories L and S respectively,

[[[L w w’]]] = [[w]] ˆ [[w’]].

[[l s]] = [[l]] ˆ_ˆ[[s]]

(55) Examples

  • [[[L la lu]]] = [[la]] ˆ [[lu]] = laˆlu.

  • [[[L la lu] [L la li]]] = [[[L la lu]]] ˆ_ˆ[[[L la li]]] = laˆluˆ_ˆlaˆli

Now this semantics adds very little to the syntax. But one could develop a more subtle variety of this internal semantics, one that only keeps track of certain properties of the form of our syllable sequences. For example, in (56) we define a semantics that keeps track of the vowels that appear at the end of our 2-syllable groups. Thus la lu will ‘denote’ u, while la li will ‘denote’ i, and the sequence la lu la li will denote the sequence iˆu, i.e. the concatenation of the vowels i and u.

(56) Semantics based on vocalic paths.

  • [[[L la lu]]] = u.

  • [[[L la li]]] = i

For any sequences l and s of categories L and S respectively,

[[l s]] = [[l]] ˆ[[s]].

(57) Examples

  • [[[L la lu]]] = u.

  • [[[L la lu] [L la li]]] = [[[L la lu]]] ˆ[[[L la li]]] = uˆi

We can think of this semantics as associating with some strings a ‘vocalic path’ that tracks the sequence of some particularly important phonemes that appear in it – here they are the non-predictable vowels of each 2-syllable group.

While no interesting analysis would postulate that music has the kind of semantics exemplified by (54), there are prominent examples of music semantics that develop more sophisticated versions of (56). Thus Granroth-Wilding and Steedman 2014 endow their formal syntax for jazz chord sequences with a semantics that encodes paths in a tonal pitch space whose structure is depicted in (59). In their analysis (framed within Combinatory Categorial Grammar), surface chords can be assigned syntactic categories that give rise to derivation trees. Each derivational step in the syntax goes hand in hand with a semantic step. And the semantics encodes movements in tonal pitch space.

A minimal example is given in (58), which provides the semantics of a sequence V7-I within a tonal pitch space whose structure is displayed in (59). The final I denotes a location in tonal pitch space, with coordinates <0, 0>. The penultimate V7 denotes a function from x to a position that ensures a 1-step leftward movement towards x, written as: λx. leftonto(x). Taking the location <0, 0 > as an argument, the result is: leftonto(<0, 0>). Assuming the tonic (i.e. <0, 0>) is a C (circled in (59)), this would correspond to a movement from a G (also circled in (59)) to that C.

(58) Example of a syntactic and semantic derivation in Granroth-Wilding and Steedeman’s (Granroth-Wilding and Steedman 2014) framework (fragment of their Fig. 19)


(59) Structure of the tonal pitch space assumed in Granroth-Wilding and Steedman 2014 (following Longuet-Higgins 1962a, 1962b)


We believe that this analysis is close to an intuition developed in some of Lerdahl’s work (Lerdahl 2001), in which the meaning of music is essentially likened to a journey through tonal pitch space.

Importantly, this semantics is ‘internal’ – and thus not a ‘real’ semantics, from our perspective – because it does not draw a connection between music and the (music-)external reality, unlike the semantics developed in this piece.

Appendix II. Music semantics vs. logical semantics

This Appendix compares in greater detail music semantics with the simple logical semantics sketched in Section 7.1.

In order to bring the comparison into sharper focus, we define a non-standard but particularly simple logic that shares some properties with our music semantics; the main differences will become easier to grasp within this shared background. In a nutshell, this logic is defined for a language made solely of propositional letters, and conjunctions. Since the only way to combine propositional letters is by way of conjunction, we don’t need an explicit conjunction sign and thus we will solely investigate sentences of the form: pi, pipk, pipkpr, etc., as is specified by the syntax in (60)a. Each propositional letter is taken to hold true of events,Footnote 46 and concatenation is interpreted as conjunction, as shown by the semantics in (60)b.

(60) A purely conjunctive logic

  • a. Syntax

  • Atomic propositions: for every i ≥ 0, pi is an atomic proposition.

  • If pi is an atomic proposition and F is a proposition, piF is a proposition.

  • b. Semantics

  • Let I be a function such that for every i ≥ 0, I(pi) is a set of events.

  • For any propositional letter pi, pi is true of event e just in case e is in I(pi).

  • If pi is an atomic proposition and F is a proposition (whether atomic or not), piF is true of event e just in case pi is true of e and F is true of e.

Let us immediately illustrate with the examples in (61). If p2 holds true of events e, e’ and e", while p3 holds true of events e' and e", the (implicit) conjunction p2p3 holds true of e’ and e”, as does p3p2. If in addition p1 holds true of e and e’, p1p2p3 holds true of e’.

(61) Examples

  • I(p1) = {e, e’}

  • I(p2) = {e, e’, e”}

  • I(p3) = {e’, e”}

  • a. For any event f, p2p3 is true of f iff p2 is true of f and p3 is true of f, iff f is in {e, e’, e”} and {e’, e”}, iff f is in {e’, e"}, iff f = e' or f = e".

  • b. For any event f, p3p2 is true of f iff p3 is true of f and p2 is true of f, iff f = e’ or f = e” (by a.).

  • c. For any event f, p1p2p3 is true of f iff p1 is true of f and p2p3 is true of f, iff f is in {e, e’} and f is in {e’, e”} (by a.), iff f = e’.

We note that our rules are designed in such a way that a string is always semantically analyzed from beginning to end: a string p1p2p2 is analyzed by the semantics (in (60)b(ii)) as having the structure [p1[p2p3]]. Nothing deep hinges on this: given our conjunctive semantics, whether a string p1p2p3 is analyzed as [p1[p2p3]] (as we do) or as [[p1p2] p3] won’t affect the truth conditions, since the end result will just be the conjunction of p1, p2 and p3.

One could think of p1p2p3 as a series of musical events, which may be true of some events such as e, e’, e”, etc. But as noted in the text, the similarities with our music semantics end there: the conjunctive logic above has no counterparts of our preservation principles (Time, Loudness, Harmonic stability); and in that logic, p1p2 denotes events that satisfy both p1 and p2 (order irrelevant), rather than pairs of events <e1, e2 > that satisfy p1 and p2 (in that order), as in our music semantics.

Appendix III. Complements on the syntax/semantics interface

This Appendix provides some complements to the discussion of the syntax/semantics interface in Section 8. Part A gives details about exceptions to tree structure in Lerdahl and Jackendoff’s analysis, as alluded to in Section 8.3.1. Part B formulates some questions raised for the present analysis by Lerdahl and Jackendoff’s prolongational reductions.

A. Exceptions to tree structure in Lerdahl and Jackendoff’s analysis of grouping

We argued in Section 8.3.1 that a mereology-based reconstruction of musical structure leads one to expect that two musical groups could have a part in common (as in (36)e in the main text) in two types of cases: when the denoted events are best analyzed as having a part in common (overlap); and when two distinct events share the same auditory trace (occlusion).

Lerdahl and Jackendoff 1983 emphasize that such cases do arise in music. Since they take grouping structure to result from principles of perception rather than from syntactic rules, they do not take these ‘exceptions’ to refute their account. On the contrary, they explain these exceptions by appealing to analogous cases in visual perception. The exceptions they list are of the two types we expect: in case of overlap, the denoted events are construed as sharing a part; in cases of occlusion, the auditory trace of an event occludes that of another event.

❒ Overlap

Lerdahl and Jackendoff 1983 illustrate visual overlap by the case in which a single line serves as the boundary between two objects, and is thus best seen as belonging to both, as in (62)a, which is preferably analyzed as (62)b rather than as (62)c,d. In our terms, this is a case in which the optimal mereological decomposition of the underlying object should not be minimal – although an alternative possibility is that we are dealing with two different lines that have a unique visual trace.

(62) Lerdahl and Jackendoff’s visual analogue of overlap (Lerdahl and Jackendoff 1983 p. 59)


Cases of overlap are probably pervasive in event decomposition as well. A person’s walk is a succession of cycles in which a foot touches ground, goes up, and touches ground again. Each subevent in which the foot touches ground is both the completion of a cycle and the beginning of the next one – an event counterpart of the object perception case in (62).

Lerdahl and Jackendoff 1983 cite the very beginning of Mozart’s K. 279 sonata as an example of auditory overlap, as seen in (63). The I chord at the beginning of bar 3 seems both to conclude the first group and to initiate the second, hence it can be taken as the trace of an event that plays a dual role as the end of one event and at the beginning of another. Alternatively, and less plausibly perhaps, this could be a case in which two distinct events have the same auditory trace (this is precisely the uncertainty we had in our discussion of the visual example in (62)).

(63) An example of overlap: the beginning of Mozart’s K. 279 sonata (Lerdahl and Jackendoff 1983 p. 56) [AV46]


❒ Occlusion

The second case involves an object that partly occludes another object, as in (64). Here the most natural interpretation of (64)a is as (64)b, which involves occlusion, rather than as (64)c and (64)d, which don’t.

(64) Lerdahl and Jackendoff’s visual analogue of elision (Lerdahl and Jackendoff p. 59)


Here too, an event counterpart of object occlusion is not hard to find: a train passing by will visually, and sometimes auditorily, occlude numerous other events.

In music, this case is illustrated by what Lerdahl and Jackendoff call ‘elision’. Their description (as well as the visual analogy they draw) makes clear that these are really cases of auditory occlusion, as in their discussion of the beginning of the allegro of Haydn’s Symphony 104, given in a reduction in (65). As they write:

“One’s sense is not that the downbeat of measure 16 is shared (...); a more accurate description of the intuition is that the last event [of the first group] is elided by the fortissimo.”

(65) An example of elision: the beginning of the allegro of the First movement of Haydn’s Symphony 104 (Lerdahl and Jackendoff 1983 p. 57) [AV 47].


In sum, in several cases grouping structure departs from a simple tree structure, in ways that can be explained if musical groups are perceived as the auditory traces of events, whose mereological structure is reflected on the musical surface. In particular, there are cases of overlap in which a part is best seen as belonging to two events, and cases of occlusion in which the auditory trace of one event occludes that of another event.

B. A note on prolongational reductions

Lerdahl and Jackendoff 1983 and Lerdahl 2001 take another notion of structure, prolongational reductions, to play a central role in music perception.Footnote 47 Specifically, prolongational reductions provide a hierarchy of events "in terms of perceived patterns of tension and relaxation" (Lerdahl 2001, cited above). To make things concrete, consider again the time-span structure in (39) in the main text. Lerdahl and Jackendoff argue that it is incapable of representing the intuitive patterns of tension and relaxation represented in (66):

One might say that the phrase begins in relative repose, increases in tension (second half of measure 1 to the downbeat of measure 3), stretches the tension in a kind of dynamic reversal to the opening (downbeat of measure 3 to downbeat of measure 4), and then relaxes the tension (the rest of measure 4). It would be highly desirable for a reduction to express this kind of musical ebb and flow. Time-span reduction cannot do this, not only because in such cases as this it derives a sequence of events incompatible with such an interpretation ([(39)] as opposed to [(66)]), but because the kind of information it conveys, while essential, is couched in completely different terms. It says that particular pitch—events are heard in relation to a particular beat, within a particular group, but it says nothing about how music flows across these segments. (Lerdahl and Jackendoff 1983 p. 122).

(66) Prolongation of the initial I chord at the beginning of of Mozart’s K. 331 piano sonata (Lerdahl and Jackendoff 1983) [AV48]


In the time-span structure in (39), the last bar forms a group, but it is headed by the V chord, which is harmonically essential (as it marks a half-cadence). As a result, the I chord at the beginning of the last bar plays a subordinate role. But intuitively it corresponds to the end of a tensing and relaxing motion that started on the same I chord, but at the beginning of bar 1. In Lerdahl and Jackendoff’s analysis, prolongational structures are derived top-down from time-span structures in such a way that subordinate time-span events can be ‘promoted’ to a higher hierarchical level if they play a key role in patterns of tension and relaxation.

From the present perspective, two main questions arise about prolongational reductions. First, could they have a counterpart in other areas of perception? In particular, if we could find visual scenes with (i) ‘headed’ events’ (in order to have a counterpart of time-span structures), and (ii) a natural notion of tension (e.g. in terms of more or less stable physical situation), could we also elicit intuitions about an equivalent of prolongational structures? This is what one would expect if the difference between these two kinds of structures derives from the kind of semantic information they seek to capture: event mereology for time-span structures, properties of the path of events in a certain space for prolongational structures.

Second, if prolongational reductions trace the ‘harmonic path’ followed by virtual sources in tonal pitch space, could one also investigate other types of paths, such as those defined by the melodic line or even the evolution of loudness? A key insight of Lerdahl and Jackendoff’s analysis of prolongational structure is that two harmonic events that are not linearly contiguous may still have a direct structural connection, as is the case of the two highlighted I chords in (66). But in the classic analysis of Schenker, the melodic line (and in particular a gradually descending melodic line called the ‘Urlinie’) has related properties: certain melodic elements can be ignored when tracing the general melodic line of a piece because they are structurally subordinate (Forte 1959; Pankhurst 2008). Could one develop a general theory in which Lerdahl and Jackendoff’s prologongational reductions and Schenker’s Urlinie are part of a broader typology? This and other questions pertaining to the interaction between prolongational reductions and music semantics must be left for future research.

Appendix IV. Extensions and further questions

This Appendix sketches some possible extensions of our analysis, and lays out further questions for future research.

❒ Context and granularity

In these Prolegomena, we only attempted to sketch the general form of a music semantics. One important issue in actual analyses will lie in determining the level of granularity of the interpretation. One may decide to take each and every musical event corresponds to a world event. But often one may want to have a less fine-grained interpretation. The same issue arises when determining under what conditions a pictorial representation is compatible (i.e. could denote) a real world situation, as is illustrated with the coarse-grained picture in (67).

(67) Coarse-grained pictorial representation of Barack Obama.


Two mechanisms are crucial if we are to ensure that these pictures represent their intended denotations.

  1. (i)

    First, we should make sure that the set of possible denotations is small enough – it may be restricted to the set of salient politicians in the situation.

  2. (ii)

    Second, we should make sure that not all details of the pictures are required to correspond to something in the intended denotation. For instance, in a pixelized representation, some edges are due to the requirement that squares are used to represent shapes, and must be in part disregarded.

Both mechanisms should prove important in music semantics.

  1. (i)

    First, semantic intuitions that would otherwise be very unclear can be sharpened by reducing the set of possible denotations. One way to do this is by way of titles or of explicit (linguistic) descriptions. This is an important device in ‘program music’, and we saw striking instances of this mechanism in Saint-Saëns’s Carnival.

  2. (ii)

    Second, when analyzing a piece, one may decide to interpret each and every note as corresponding to a world event, or one may adopt a more coarse-grained interpretation. For instance, if we were to ask of a given movement of a swan whether it makes true the beginning of Saint-Saëns’s piece discussed in (20) in the main text, one may answer in the positive if there is a series of two movements, the second of which leads into a new spatial area (thus interpreting the modulation). Or one may wish to find a much more precise correspondence between the piece – e.g. its precise melodic movement – and the scene it is purportedly true of. Of course a coarse-grained interpretation will make the piece true in many more situations than a fine-grained one.

❒ Interpreting a piece

If our analysis is on the right track, a musician interpreting a piece may make musical decisions that further specify the semantic interpretation of the musical score. Ending a piece fortissimo may produce the impression that the source is intentional and is reaching a goal. Ending a piece diminuendo and rallentando may yield the impression that the source is gradually losing energy and dying out. Ending diminuendo without much altering the speed may yield the impression that the source is moving away. These are of course simplifications, but the general point is that the musical interpreter may and sometimes must make semantic decisions that are left open by the score. Even after these decisions are made, there will be a plurality of situations that the music is compatible with (true of), but the musician’s interpretation will usually reduce the set of situations that are compatible with the score.

❒ Aesthetic considerations

We have been silent on aesthetic considerations – simply because it is one thing to set up a music semantics, and quite another to assess the aesthetic value of music. If successful, music semantics should come to explain why bad and good music alike produce semantic effects: it is not its goal to offer a music aesthetics. Still, one might hope that some aesthetic considerations might in the end build on insights gained from music semantics. But nothing at all in the present enterprise suggests that the aesthetic effects of music (or for that matter its psychological effects) reduce to its semantics. This would be as absurd as claiming that the aesthetic or psychological effects of poetry are exhausted by its semantics.

❒ Semantic effects beyond music

The key idea of our semantic analysis of music is that denotational inferences may be drawn both from normal cognition and from the tonal properties of music. This idea could in principle be applied to other areas as well.

First, one could ask whether a kind of visual counterpart of music could be devised. It would be based on animations that convey information by way of a combination of standard representational properties and ones that are internal to a more abstract system. A very simple example can be found in animated heat maps [AV49]: while not quite direct, the geographical content of the map is based on standard principles of visual perception, modulo some simplifications (as a first approximation, a country is seen on a map as if it were perceived from very high up, say from space); simultaneously, there is a color code which is based on natural properties of colors: ‘warm’ colors (e.g. red) represent high degrees of the relevant property, and ‘cool’ colors represent low degrees. But of course the structure of colors is entirely different from the structure of tonal pitch space, and thus it is only at a conceptual level that a correspondence between the auditory and the visual domain can be found.Footnote 48

Second, one could ask whether related ideas could be applied to the analysis of abstract painting. Certainly some standard principles of visual perception are at play in abstract painting – which is the reason we usually don’t just see shapes on a canvas, but (possibly very abstract) objects – something that already played an important role in Heider and Simmel’s abstract animations. It remains to be seen whether certain non-natural properties of the paintings could be interpreted in a way that is comparable to the tonal properties of music.

Finally, one might attempt to apply related ideas to dance, for two reasons: like music, it triggers referential and emotional inferences on the basis of natural and more abstract properties of perception; and in addition, it is often coordinated with music – and hence one might ask how the two mediums are combined (do they give rise to a single semantic representation?). For relevant work, see Charnavel 2016; Napoli and Kraus to appear, and Patel-Grosz et al. 2017.

❒ Further questions

Two types of further questions could be asked. One pertains to the nature of semantic inferences in music. The other concerns the nature of the theory we have proposed.

One could ask whether semantic inferences in music are always conscious.Footnote 49 This need not be the case: we had to create minimal pairs and to ask appropriately abstract questions about the virtual sources in order to bring these inferences to consciousness. Certainly this is a matter of degree: some semantic effects are more subtle than others. It might be that, quite generally, inferences that are abstract and hard to put in words tend to be less conscious than more concrete and easily articulable ones (one could explore this question in other cognitive domains, such as visual cognition or the recognition of tastes and odors). Be that as it may, we believe that even when semantic inferences are unconscious, they are crucial to explain some of the psychological effects of music, as well as interpretive choices made by performers.

Turning to the nature of the theory we have proposed, one could ask whether we have not overly stretched the extension of the term ‘semantics’.Footnote 50 Our source-based semantics is in part based on our ability to draw inferences on the causal sources of a sound. One could object that the term ‘semantics’ is properly used only if it pertains to an intention to mean something, perhaps along the lines of Grice’s notion of ‘utterer’s meaning’ (Grice 1957). Without taking a position on this issue, we note that our final account does have a place for the equivalent of a kind of utterer’s meaning, since the existence of a music semantics makes it possible to reconstruct the intentions of a musical narrator that attempts to convey a certain (highly abstract) message. The situation is in this respect no different from that of a narrator expressing herself in gestures or by way of drawings or visual animations.

Audiovisual examples

The audiovisual examples can be downloaded at the following URL:

Credits for audiovisual examples.

AV00 Kominsky, J.F., Strickland, B., Wertz, A.E., Elsner, C., Wynn, K., & Keil, F.C.: 2017, Categories and constraints in causal perception. Psychological Science 28,11: 1649–1662. Video of the ‘violation’ condition (of laws of elastic collision). (Thanks to B. Strickland for providing this video.)

AV01 Musicnet materials, retrieved online on January 7, 2018 at

AV02 Il con, Stanley Kubrick, 2001: A Space Odyssey. Retrieved online on January 7, 2018 at

AV05, AV06, AV07, AV08, AV09 Musicanth materials, retrieved online on January 9, 2018 at Pianists: Vivian Troon, Roderick Elms. Conductor: Andrea Licata Royal Philharmonic Orchestra.

AV13 Video cited in: Honing, H.: 2003, The final ritard: On music, motion, and kinematic models. Computer Music Journal, 27(3), 66–72.

AV17 Kurt Masur: Leipzig Gewandhaus Orchestra.

AV18 Musicanth materials, retrieved online on January 9, 2018 at Pianists: Vivian Troon, Roderick Elms. Conductor: Andrea Licata Royal Philharmonic Orchestra.

AV20 Mozart, Don Giovanni, Metropolitan Opera, 1990, directed by Brian Large, stage production Franco Zeffirelli, conductor James Levine, with Kurt Moll as the Commendatore.

AV23 Tchaikovsky Ouverture solennelle “1812”, Op.49, by Berliner Philharmoniker. Conductor Claudio Abbado. Retrieved online on January 9, 2018 at

AV24 Puccini, Madama Butterfly. Asheville Lyric Opera. Pinkerton - Brian Cheney; Sharpless - Mark Owen Davis. Retrieved online on January 9, 2018 at

AV25, AV28 Musicanth materials, retrieved online on January 9, 2018 at Pianists: Vivian Troon, Roderick Elms. Conductor: Andrea Licata. Royal Philharmonic Orchestra.

AV31 Mary Ellen Bute (visuals) and Edwin Gerschefski (piano accompaniment), 1940, Tarantella (excerpt starting at 1:09). Retrieved online on January 13, 2018 at

AV32 Daniel Barenboim, Mozart: Complete Piano Sonatas and Variations, Piano Sonata No. 1 in C, K.279: I. Allegro.

AV36 Leonard Bernstein: Young People’s Concerts Vol. 2. Charles Ives American Pioneer (excerpt starting at 44:33). Retrieved online on January 14, 2018 at

AV37 Charles Ives, The Unanswered Question. James Sinclair, Conductor. Northern Sinfonia (exceprt starting at 1:00). Retrieved online on January 14, 2018 at

AV38 Verdi, Simon Boccanegra, Teatro La Fenice 2014–2015, conductor Myung-Whun Chung, RAI, with Simone Piazzola as Simon.

AV42Psycho, 1960. Film directed and produced by Alfred Hitchcock, and written by Joseph Stefano. Music by Bernard Herrmann.

AV44 Verdi, Simon Boccanegra, Teatro La Fenice 2014–2015, conductor Myung-Whun Chung, RAI, with Simone Piazzola as Simon.

AV46 Daniel Barenboim, Mozart: Complete Piano Sonatas and Variations, Piano Sonata No. 1 in C, K.279: I. Allegro.

AV47 Haydn: Symphony #104 In D, H 1/104, “London” - 1. Adagio - Allegro. Adam Fischer: Austro-Hungarian Haydn Orchestra (starting at 2:29).

AV48 Margarete Babinsky. Piano Sonata in A, K.331: I. Andante Grazioso. Mozart Piano Sonatas.

AV49 Todd Mostak, Twitter Heatmapper. Retrieved online on January 14, 2018 from

AV50 Musicanim, graphical score generated for Stravinsky’s The Rite of Spring, on the basis of Jay Bacal’s synthetic MIDI recording. Retrieved online on January 14, 2018 from

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schlenker, P. Prolegomena to Music Semantics. Rev.Phil.Psych. 10, 35–111 (2019).

Download citation