1 Introduction

The goal of this paper is to confront and explore the larger implications of a problem that we have repeatedly observed in our ongoing work on the semantics of modification within noun phrases, which is one instantiation of concept combination. The problem is that, in the absence of context, sometimes the default interpretation for the modifier-noun combination is so strong as to make other possible interpretations seem impossible, whereas in context any interpretation—even the seemingly impossible—is possible. Here is just one example, involving so-called ethnic adjectives , which provide information about the ethnic origin, nationality or other locational origin of individuals.Footnote 1 Kayne (1984) and many others have claimed that when ethnic adjectives like Canadian combine with eventuality-denoting nouns, the adjective must contribute information about the most external argument of that eventuality, typically the agent. When it does not, a prepositional phrase expressing the corresponding participant role must be used. Thus, in (1), where the context does not previously mention Yeltsin visiting Canada, the PP to Canada rather than the adjective Canadian is what the author chose, and indeed the adjective sounds very odd.

  1. (1)

    Yeltsin met the prospective Democratic presidential candidate Bill Clinton on June 18. His itinerary also included an official visit to Canada/??an official Canadian visit. (BNC)

However, one does not have to go far to find counterexamples to Kayne’s claim. When context or background knowledge make salient that some role other than agent is assigned to the location/ethnicity, the adjective is perfectly felicitous and attested, as in (2).

  1. (2)

    Prince Edward and wife begin Canadian visit (http://metronews.ca/news/canada/365325/prince-edward-and-wife-begin-canadian-visit/)

Confronted with the contrast between the strong default interpretation and the possibility of any interpretation in context, linguists have tended to follow one of two routes, both of which we will discuss and exemplify below. The first involves taking the default interpretation as the crucial fact to account for, leaving the non-default interpretations in context unexplained. The second involves providing an analysis that is weak enough to capture all interpretations, and saying little or nothing about the strength of the default interpretation. In this paper, we argue that, in effect, both routes must be taken because two fundamentally different interpretative processes can be appealed to in the composition of modified noun phrases or, more generally, in concept composition . Specifically, we take default interpretations to be the result of what we will call conceptually afforded concept composition, and non-default interpretations to be the result of referentially afforded concept composition. We borrow the term affordance loosely from the psychology literature, specifically the interpretation of the term in Chemero (2003) , as we discuss in further detail in Sect. 3.1.

This distinction builds on the long-standing observation that language mediates between concepts in our mind and the things they refer to in the world (Ogden and Richards 1923 , among many others). We take these connections to concepts and to the world to be distinct aspects of language, each of which facilitates a different process of concept composition .Footnote 2 Take for instance the phrase red box in the examples in (3). In the absence of any context, red, when modifying box (or indeed any noun denoting a physical object), refers to its color, and so we can usually paraphrase (3-a) as “Identify a box that is red in color and put the relevant scarf inside it”. However, it may also refer to other properties of the box referent, such as the intended color of its contents, if the discourse context makes the relevant property clear (3-b).

  1. (3)

     

    1. a.

      Put the scarf in the red box.

    2. b.

      (Context: For a fundraising sale, Adam and Barbara are sorting donated scarves according to color in different, identical, brown cardboard boxes. Barbara distractedly puts a red scarf in the box containing blue scarves.)

      Adam: Hey, this one belongs in the red box!

We call cases like (3-a) conceptually afforded. In these cases, some component(s) of the concepts contributed by two expressions in a phrase match in a way that indicates how they should be composed, and interlocutors avail themselves of such a suggestion. This matching invites the hearer to identify red as the color of the box in (3-a).Footnote 3 In contrast, in referentially afforded cases like (3-b), specific, independently available information about the referent described by the phrase is used to guide the way in which the concepts in question are composed.

This paper has three goals. First we develop this distinction, which has a precedent in Asher (2011) , in an explicit manner and support it with empirical evidence we gathered in previous work. Second, we suggest modeling conceptually-afforded concept composition via (compositional) distributional semantics, which represents meaning as a function of the contexts in which words and phrases appear in naturally occurring language data, usually a large text corpus (Landauer and Dumais 1997 ; Turney and Pantel 2010) . We consider this way of modeling concepts to be similar in some of its basic properties to the view of concepts espoused, for example, in Barsalou (2017). A fundamental hypothesis of some work in distributional semantics (e.g. Lenci 2008) is that the resulting semantic representations can be used to model the concepts associated with words. For this reason, we will present a brief introduction to distributional methods in Sect. 4. Finally, we propose a way to formally distinguish the two kinds of concept composition and integrate them into a more general framework for semantic analysis.

2 Two Approaches to Analyzing Modification

We start by discussing previous approaches to the problem outlined in the introduction. As mentioned, the existence of strong, but overridable, defaults in the interpretation of modifiers has led to two lines of analysis. The first involves the proposal of an inventory of primitive semantic relations to capture the defaults; the latter, the use of an underspecified modification relation which gets resolved in context or via an appeal to indexicality . We consider these in turn.

The use of primitive semantic relations to mediate in modification has a long history. We cite two representative and well-known examples here. The first involves Levi’s (1978) analysis of relational adjectives such as microscopic or tropical (ethnic adjectives like Canadian are also considered a subclass of the relational adjectives). Relational adjectives (as their name indicates) are morphologically adjectives, but they are also noun-like in several respects: They are synchronically or diachronically derived from nouns; they are generally defined as introducing a relation between an individual of the sort described by the adjective’s nominal stem and that described by the modified noun (Bally 1944) ; and they have a more restricted syntactic distribution than other types of adjectives , occupying in English essentially the same position in nominal syntax as do noun modifiers of nouns, very close to the head noun (e.g. computer in computer store). Some examples from [Levi 1978, pp. 27–28] are provided in (4), with typical paraphrases:

  1. (4)

     

    1. a.

      microscopic analysis—analysis carried out using a microscope

    2. b.

      tropical butterflies—butterflies found in the tropics

    3. c.

      planetary mass—mass of a planet

    4. d.

      editorial comment—comment by an editor

    5. e.

      dramatic criticism—criticism of drama

Levi proposed that such examples are derived from an underlying structure that makes the relation in question explicit. She further proposed that an inventory of primitive relations could be specified: cause, have, make, use, be, in, for, from, and about. For the derivation of examples involving deverbal nominalizations, as in (4-d, e), she proposed somewhat more complex derivations that nonetheless also availed themselves of primitives, including in some cases agent and patient.

A second example of an appeal to primitive relations emerged in part from the strong tendencies in the interpretation of (non-relational) adjectives described in Pustejovsky (1995) .

  1. (5)

     

    1. a.

      red pen—pen that writes in red or that is red on the surface

    2. b.

      red apple—apple whose skin is red

    3. c.

      quick meal—meal that is quick to eat or to prepare

To account for these interpretations Pustejovsky argues that the lexical entry for content words (including nouns) should include what he called a Qualia Structure with four features, each corresponding to a quale: formal, constitutive, agentive, and telic. The formal quale characterizes the general ontological properties of an object; the constitutive, its parts; the agentive, how it comes into existence; and the telic, its function. Pustejovsky proposes that adjectives can restrict the denotation of a noun by placing conditions on the values of the different qualia in the noun’s semantic representation .

The logical representations in (6) illustrate how this approach can be used in the sorts of modification that interest us here. In (6-a), the primitive agent specifies the relation between Canada and the visit; in (6-b), the primitive constitutive acts as an operator on the representation of apple to retrieve indirectly a part of the apple to which the property denoted by red can be ascribed.

  1. (6)
    1. a.

      Canadian visit: \(\lambda e[\mathbf{visit}(e)\, \wedge \,\) agent \((e, \mathbf{Canada})]\)

    2. b.

      red apple: \(\lambda x\exists y[\mathbf{apple}(x) \wedge \) constitutive \((\mathbf{apple}) =\) part-of \((y,x) \wedge \mathbf{red}(y)]\)

The use of semantic primitives to capture modification relations has two main advantages. First, it speaks to the very strong intuitions that the literature reports about default or productive interpretation processes (see e.g. Levi 1978 , pp. 84–86). Second, similar defaults are observed cross-linguistically—for example, Pustejovsky’s theory has been applied to various languages, and Levi observes that she found evidence for a similar set of primitives in a study of Modern Hebrew (Levi 1978, p. 86). Clearly, there is something to be captured in these data.

However, the use of primitives of any sort, at least as the only compositional strategy, has also long been argued to be problematic. On the one hand, it is clearly too strong insofar as no set of necessary and sufficient primitives can be provided to account for all cases.Footnote 4 Levi herself observes (p. 84; see also p. 238ff.) that the goal of her study is to account for patterns of modification that are productive, as opposed to possible: In other words, her aim was a theory of why, even if we can interpret, for example, a phrase such as Korean passengers as ‘passengers on Korean Airlines’,Footnote 5 our first inclination is arguably not to do so but rather to interpret it as ‘passengers from Korea’. On the other hand, the use of primitives is too weak. As e.g. Clark (1992) and Murphy (2002) observe, even when such primitives might apply, they are insufficiently granular: There are cases in which they provide too little information about the exact nature of the relation instantiated by any given primitive. This is already apparent in the analysis of red apple in (6). The constitutive quale introduces a part of the apple, but it does not specify which part, and so the inference that it is the skin of the apple (or, more generally, its surface) is not directly accounted for. A representative example involving a relational adjective is an electrical fire, which could be paraphrased as ‘a fire caused by electricity’: This case is even more problematic than the apple example insofar as the paraphrase does not capture the fact that the term is used to refer to fires caused by malfunctions in electrical systems and not, for example, by lightning.

Given these problems, a second line of approach to modification has involved sacrificing the coarse generalizations embodied in primitives in favor of broader empirical coverage. On one version of the approach, modification is mediated by a maximally underspecified relation whose value, much like that of a pronoun, is resolved in context (examples include McNally and Boleda 2004 ; Kennedy and McNally 2010) . On another (see e.g. Bosch 1983 ; Rothschild and Segal 2009) adjectives have as their lexical content functions from contexts to contents, that is, Kaplanian characters (Kaplan 1989) . These analyses are respectively illustrated in (7).

  1. (7)

     

    1. a.

      Canadian visit: \(\lambda e[\mathbf{visit}(e) \wedge R_{i}(e, \mathbf{Canada})]\)

    2. b.

      red apple: \(\lambda x[(\mathbf{red}_i(\mathbf{apple}))(x)]\)

Again, this approach has both advantages and disadvantages. On the positive side, it is appropriately flexible: There is no interpretation that cannot be accommodated under such an analysis. However, its flexibility is arguably also a disadvantage: It has nothing to say about the strength of default interpretations or the fact that we tend to generalize them to new examples (a point made, as noted above, by Levi). Moreover, these analyses have provided no substantive theory of how context intervenes to yield the interpretations that arise.

We know of only one explicit proposal that contemplates the possibility of combining these two general approaches to resolving modification, namely that in Asher (2011) . Asher combines a classical, model-theoretically interpreted intensional logic with a separate, proof-theoretic logic of types that is intended to mirror language users’ systems of concepts . The latter is used to compute and resolve the basic relations between predicates in composition—for example, it will allow us to determine that, in principle, it must be possible to infer that red picks out a type that, when combined with the type picked out by pen, yields a type that corresponds to a pen that writes in red.Footnote 6 Though he does not develop the possibility in detail, he suggests (p. 226) that Pustejovsky’s qualia could be introduced into his system as type-shifting operators that mediate in this process: For example, write could be the output of a general type coercion operator telic applied to the type pen, and this information could then be exploited in the semantic composition process. In addition, alongside the possibility of such operators, Asher’s system contemplates the possibility of contextually-valued type coercion operators for cases where the discourse structure makes it clear that default value operators such as telic would not apply.

The proposal we develop in the rest of this paper shares with Asher’s the intuition that there are (at least) two distinct sorts of composition processes involved in computing the interpretation of a sentence. Our contributions will consist in laying out the proposal in more explicit terms, providing new empirical support for this dual system, the use of distributional semantics an alternative to Asher’s logic of types, and a specific proposal for formalizing the distinction using Discourse Representation Theory (Kamp 1981) .

3 A Dual System for Semantic Composition

3.1 Conceptually Versus Referentially Afforded Composition

We begin with a very programmatic proposal concerning two ways in which the construction of meaning can be mediated. Our proposal is based on the following assumption.

Assumption

The construction of meaning draws on connections we make between linguistic expressions and our conceptual structure , on the one hand, and the world, on the other.

This assumption is of course familiar from traditional semiotic models and also resonates with the “dual content” model recently proposed in Del Pinal (2015) , which provides (p. 44ff.) a useful overview of the different ways in which language, conceptual structure, and the world have been related to each other both in the philosophy of language and cognitive psychology literature. The assumption also underlies the classic Fregean model that distinguishes sense (Sinn), which Frege suggests forms part of the ‘common treasure of thoughts that [humanity] transmits from one generation to another’, and reference (Bedeutung) (Frege 1892 , p. 29Footnote 7). However, in modern formal semantics in the Montagovian tradition, despite its Fregean roots, conceptual structure has largely been set aside. In this latter tradition, Fregean sense has largely been substituted for the notion of intension, modeled non-psychologistically as, for example, a function from possible worlds to truth values.

We recover the classically Fregean notion of sense as including conceptual-like information and propose that both conceptual and referential aspects of meaning play a role in composition. Specifically, we can think of them as affording concept combination in different ways. Our use of the term affordance is based on Chemero’s (2003) development of the notion, originally due to Gibson (1979) ; it is also inspired in Rietveld’s (2008) extension of the notion to higher cognition. Chemero defines affordance as a relation between features of situations and abilities of organisms, and argues that to perceive an affordance is to recognize that the feature in question facilitates an action by the organism. The classic example is a mug with a handle: If a person who has never seen a mug gets to interact with it, it is very likely that she will grab it by the handle. The mug, by its shape, affords the grabbing-by-the-handle action on the part of the person.

Our extension of this idea to the case of language is very simple. We take the connection to concepts, on the one hand, and to the world, on the other, to be distinct features of language, each of which facilitates—that is, affords—a distinct composition process. If we posit that language users have access to both of these features and the corresponding processes that they facilitate, the tension we observed between default and highly context dependent interpretations in Sect. 2 disappears.

The default interpretations , we argue, can be understood as the result of conceptually-afforded concept composition . These are the interpretations that are immediately available in the absence of discourse context, and they are productive, suggesting that they build on regularities in our lexical knowledge —that is, the connections between words and concepts . For instance, the fact that physical objects typically have colored surfaces will afford the interpretation of a color term modifying a noun denoting a physical object as describing surface color, as in red box (see example (3-a)) and red apple (5-b). The fact that a writing instrument produces text or images with a particular color, and that this color may vary from one writing instrument to the next—part of our concept of what a writing instrument is like—affords the interpretation of red pen as a pen that writes in red (example 5-a). Note that the use of a color term with pen is easily extended to other writing instruments with the same general properties, such as pencil, crayon, or marker. Similarly, the fact that analyses are carried out using instruments, and that microscopes are instruments, affords the interpretation of microscopic analysis provided in (4-a) above. Different species of animals tend to require different climates, so again, the interpretation of (4-b), with tropical describing a climate, is on our view conceptually afforded.

Note that these interpretations arise from very detailed conceptual knowledge, presumably accessible from the words involved. The primitive-based analyses discussed in the previous section are too coarse-grained to allow for these interactions; the lack of conceptual information in typical formal semantic approaches doesn’t allow for them either. Thus we need a richer and more nuanced lexical representation ; in Sect. 4 we show how distributional semantics can serve this purpose.

The notion of conceptual affordance also allows us to make predictions about combinations of modifiers and nouns that will be infelicitous out of the blue. Interestingly, Vecchi et al. (2011) developed a computational model (using, it is worth noting, distributional semantics) that was able to partially distinguish between (out of the blue) deviant versus possible adjective-noun phrases. Vecchi et al. randomly selected a set of phrases that were unattested in a very large corpus and tested whether their model would group them in ways that correlated with whether or not the phrase was acceptable to human judges. Examples of unattested but semantically acceptable phrases included vulnerable gunman, huge joystick, and blind cook; deviant phrases included, for instance, blind pronunciation, parliamentary potato, and sharp glue. The acceptable phrases are similar to those we have hypothesized above to involve conceptually afforded composition. For instance, joysticks are physical objects and have a size dimension that can be modified by huge. In contrast, it is not obvious in the absence of a specific context along which conceptual dimension a pronunciation could be blind, what kind of relation might exist between potatoes and parliaments, or what it would mean for glue, which is not rigid, to be sharp.

Now, it is possible to find a semantic interpretation for the allegedly deviant phrases. For instance, imagine that potatoes were thrown at parliamentary members in a protest concerning the recent economic crisis in Spain, and that one of the potatoes knocked out the president of the parliament and was retrieved and put on display. This object could well be dubbed the parliamentary potato. We submit that such interpretations are the result of referentially-afforded concept composition: they are retrievable only once we have a specific candidate (or small set of candidates) for who or what is being referred to with the phrase, along with a salient set of candidate properties that could be described by the modifier.

To further illustrate referentially-afforded composition, let us return to the use of “red box” in (3-b) and the “Canadian visit” example in (2). In the first case, the situation presents the hearer with two brown cardboard boxes. The speaker can assume that the hearer knows that the boxes each have a context-specific property of being destined to hold objects of a specific color. The use of red to modify the box color in this case is incongruent with what we can arguably consider the basic concept associated with box—the concept cannot afford any meaningful interpretation of the modifier—but the box referents and their context-specific properties can. In the case of (2), recall that the problem is that ethnic adjectives tend to express the agent when combined with eventive nouns. Thus, by default we expect Canadian visit to describe a visit made by Canadians. However, in (2) the interpretation on which Canada is the location visited is afforded by specific information about individuals in the context, namely, that Prince Edward and his wife are members of the British royalty, that Canada denotes a place, and that Prince Edward and his wife are the agents of the action of beginning a (Canadian) visit.

In order for the distinction between conceptually and referentially afforded concept composition to have bite, we should have independent criteria for identifying the components of the specific concepts being combined. For now, we limit ourselves to the claim that once such criteria are established, it should be possible to predict when a combination of modifier and noun is easily interpreted in the absence of a specific discourse context.

3.2 Empirical Evidence Supporting the Distinction

Despite the caveat made in the preceding paragraph, we have been encouraged by the fact that the distinction between conceptually and referentially afforded concept composition gives us insight into puzzling data that we gathered in previous work and for which we had no explanation at the time. We now summarize these data and explain how our proposal predicts them.

First, the modification data reviewed so far point to the fact that modifier-noun combinations can have very plastic interpretations. Our proposal suggests that a large part of this plasticity corresponds to referentially afforded composition. This prediction is supported by empirical data we gathered about relational adjectives , which we introduced in Sect. 2.

As noted above, relational adjectives are typically denominal and, crucially, the adjective-forming morphology has been claimed to be essentially transparent (e.g. Spencer 1999) . The only contribution of the adjectival morphology, then, would be to make explicit that there is some relation between the referent of the noun from which the adjective is derived and the referent of the noun that the relational adjective modifies.Footnote 8

By hypothesis, relational adjectives provide a way to pick up on, in a maximally condensed fashion, the myriad possible relations between the referent of the modified noun and the referent of the adjective’s nominal stem (e.g. ‘Canada’, in the case of Canadian). These relations can be identified thanks to general knowledge (e.g. national anthem) or be inferred from the meaning of the modified noun (particularly when the noun is deverbal, as in chemical reaction); however, we posit that in many cases the relations are in fact afforded by specific information we have about the referents in question in the discourse (e.g. Canadian visit in example (2) above).

Boleda (2007) reported that, in Catalan, relational adjectives appear much more often in definite noun phrases than do other types of adjectives: Specifically, in an analysis of a 16.5 million-word, balanced Catalan corpus, relational adjectives appeared almost 60% of the time in definite noun phrases (59%, with a standard deviation of 15%), while other types of adjectives did so a little over 30% of the time. Definite noun phrases are used to refer to individuals that are familiar either from the context or from prior discourse, and referentially afforded concept composition is only possible when the referent is known. Thus, the high proportion of uses of relational adjectives inside definite noun phrases suggests a tendency towards referential affordance in the composition of relational adjectives and nouns.Footnote 9 Without a distinction between the two kinds of composition, it is far from clear how to account for the data in Boleda (2007).

Boleda and colleagues provided more data in the same direction in a statistical study of the British National Corpus (Boleda et al. 2012) . The study compared nominal modification using ethnic adjectives (Canadian) to modification using prepositional phrases (from Canada, to Canada, etc.).Footnote 10 The two types of expressions often seem synonymous: For instance, both Canadian visit and visit to Canada could be used in example (2). However, the results showed that ethnic adjectives are used especially when the discourse makes the semantic relationship between the head noun and the adjective explicit, that is, in contexts where previous information about the referent is available. Factors correlated with the use of these adjectives in the corpus (as opposed to their prepositional phrase counterparts) included, again, the definiteness of the DP containing the ethnic adjective, and also others like the occurrence of visited Canada before Canadian visit in a given discourse. To summarize, both studies constitute evidence for referentially afforded concept composition and show some effects that the use of this composition strategy by language users has on their linguistic output.

A second piece of evidence concerns another prediction of our proposal, namely, that the more context dependent (or referentially afforded) concept composition is, the more difficult it can be expected to be to reconstruct out of context. The results of a study involving computational modeling of adjectival modification , reported in Boleda et al. (2013) , are in line with this prediction.

Boleda et al. (2013) used computational semantic methods to produce meaning representations for adjective-noun phrases. They built representations for phrases like former commentator in two ways. On the one hand, they constructed a representation of the entire phrase directly from linguistic data, extracting statistics from a large textual corpus. We will call this representation the observed representation. On the other, they combined the representations for the individual words in the phrase (also obtained from a corpus) using a computational algorithm. For example, this algorithm took the representation for former and that for commentator to build a semantic representation for former commentator. Boleda et al. then compared this “artificial” or predicted representation with the observed one, to see how accurate the prediction was.Footnote 11

Their results showed that the more typical the property denoted by the adjective is for the entity described by the noun, the easier it is to model the semantics of the phrase. Specifically, the resulting predicted representation of the phrase is more similar to the observed one when the property is more typical. For instance, former can be said to be a typical attribute for role-denoting nouns such as commentator, father-in-law, teacher, or president, insofar as the concepts associated with these nouns arguably include the specification that the role has a potentially limited duration. Information about duration is supplied by adjectives like former, current, or future. And indeed, the predicted representation for former commentator obtained by combining the representations for its two component words was very similar to the observed one. In contrast, the representation for phrases like former colour was more difficult to predict from the component words alone: Colour does not denote a concept with a clear temporal specification, and the relationship between former and colour will depend on the object whose colour is being referred to.Footnote 12

We conclude that modification of nouns by adjectives describing typical attributes corresponds to conceptually afforded composition, and at least some uses of atypical modifiers correspond to referentially afforded composition. In the latter cases, without additional evidence from the specific discourse context it is hard to make sense of the semantic relationship between the adjective and the noun.Footnote 13 Therefore, the distinction between conceptual and referential affordance in modification helps explain the results in Boleda et al. (2013).

Classical formal analyses of semantic composition involving adjectives are not well-equipped to take into account the degree of fit or typicality relation between the property denoted by the adjective and general features of the concept associated with the noun. Theories like Pustejovsky’s Generative Lexicon were designed to do this to some extent; however, as noted in Sect. 2, such theories cannot help with highly context-dependent meaning relations. Thus, the challenge is to find a way to incorporate the distinction between conceptually and referentially afforded concept composition into semantic theory, so as to broaden the theory’s empirical coverage. As a first step in addressing this challenge, we turn to distributional semantics .

4 Conceptually Afforded Composition with Distributional Semantics

We propose distributional semantics as a framework to account for conceptually afforded composition because we do not consider other approaches (e.g. standard formal semantics or primitive-based approaches such as the Generative Lexicon) to offer a rich enough representation of a word’s meaning to account for the range of effects discussed. However, as it is beyond the scope of this paper to exhaustively compare these different approaches, we limit ourselves here to simply providing enough background on distributional semantics for the reader to be able to follow the formalization presented in the next section, leaving more thorough discussion for future work.

Distributional semantic analyses (Landauer and Dumais 1997 ; Turney and Pantel 2010 ; Erk 2012) represent the semantics of a word as a function of the contexts it occurs in. Context can be defined in various ways, but the most typical approach is to define context as the words surrounding the target word in a corpus. A distributional representation for a word will then be a list of context counts, aggregated over the whole corpus and suitably transformed, that is, a vector. Figures 1 and 2 offer a toy example. Figure 1 is intended to illustrate how even a small context window reveals repeated examples of co-occurrences between a target word (here, moon) and other words that are suggestive of our knowledge about the target. Figure 2 exemplifies a partial vectorial representation for the words moon, sun, and dog.Footnote 14 The vectors show how the distributional representation mirrors some semantic similarities and differences between these words: All three can appear with shadow, but, while moon and sun appear with words such as planet or shine, dog does not. Moon and sun are similar in representation, but not identical: for example, full and crescent occur primarily with moon, while shine is a more typical context for sun than for moon.

Fig. 1
figure 1

The basic data for distributional semantic representations: contexts

Fig. 2
figure 2

Semantic representation: vectors of context counts

A vector for a word as used in distributional semantics ranges from a few hundred to a few thousand dimensions (that is, contexts or transformations thereof), thus providing a very rich, flexible representation for word meaning . However, this makes it difficult to inspect it manually.Footnote 15 The power of distributional semantics lies in its use of well defined linear algebra techniques to manipulate these vectors , yielding useful information about the semantics of the involved words. We visualize one kind of technique used in Fig. 3, where simple, two-dimensional vectors for the words moon, sun, and dog are visually represented. The two dimensions depicted in the graph (corresponding to word contexts) are shadow and shine, with the values shown in the left part of Fig. 3. The geometric distance (e.g., the Euclidean distance; see discontinuous lines) between the vectors for moon and sun is smaller than the distance between the vectors for moon and dog. Crucially, the algebraic techniques that we can visualize with two dimensions generalize to any number of dimensions. Thus, in distributional semantics , geometric distance corresponds to semantic distance.

Fig. 3
figure 3

Semantic distance as geometric distance

Distributional semantic methods are highly successful at modeling word meaning because they are based on linguistic data naturally produced by humans, as manifest in large text corpora drawn from the internet and other sources. The representations are rich, with hundreds or thousands of dimensions providing different bits of contextual information. Also, distributional representations are naturally graded; for instance, the notion of semantic distance is a continuum, with words being more or less distant. This makes them useful for semantic phenomena such as the typicality effect observed in the previous section.

Recently, researchers have begun to explore compositional distributional semantics , giving a distributional representation not only to words but also to phrases and even sentences (Mitchell and Lapata 2010 ; Coecke et al. 2011 ; Socher et al. 2012 ; Baroni et al. 2014a ; Pham 2016 among many others); the previous work we presented at the end of Sect. 3.2 falls into this line of research. Here, the challenge is typically framed as capturing how composition changes the values of the vectors . For instance, blood is not a relevant context for moon, but when red modifies moon it does become relevant (see Fig. 4). This kind of effect is achieved by applying composition operations to build the meaning representation of the phrase from the representations of its constituents. A very simple but stubbornly effective method is to simply add up the word vectors , as in Fig. 4 (Mitchell and Lapata 2010) , but more sophisticated methods have been designed that sometimes yield better results (Baroni and Zamparelli 2010). Nothing we say in this paper depends on the chosen method for composition, hence we will simply use comp(\(\overrightarrow{\mathbf{red}}\), \(\overrightarrow{\mathbf{moon}}\)) for the distributional representation of the phrase red moon obtained by applying a composition function to its constituent word vectors, \(\overrightarrow{\mathbf{red}}\) and \(\overrightarrow{\mathbf{moon}}\) (we represent word vectors with an overhead arrow).

Fig. 4
figure 4

Vector composition by addition

Note, finally, that there is an alternative method for obtaining a distributional representation for a phrase, namely, to directly extract it from the corpus, just as representations for words are generated (Fig. 5). Because it is based on counts for actual occurrences of phrases in corpora, this representation should be a faithful rendering of the meaning of the phrase, and this is why we used it as a benchmark in the research in Boleda et al. (2013), to compare to the result of compositionally obtained (predicted) vectors . However, this technique can only be used for sufficiently frequent phrases. Since of course many possible phrases will not occur frequently or even at all, composition is still needed to build a representation for many phrases.

Fig. 5
figure 5

Corpus-extracted distributional representation for phrase red moon

Because of their data-driven nature and their rich representation of meaning, compositional distributional representations for phrases are able to account for subtle nuances of meaning arising from the combination of modifiers and nouns. For instance, Baroni and Zamparelli (2010) report that the most similar element in a large semantic space to the phrase historical introduction is historical background; to small drop, droplet; to common understanding, common vision. Though crude and incomplete as an approximation of what concepts are (as the discussion in Barsalou , 2017, will make apparent), these representations have the advantage of being easy to construct and incorporable into a testable interpretive model. We therefore adopt them for modeling conceptually afforded composition in the formalization we offer in the next section.

5 A Mixed Model for Two Types of Semantic Composition

We next sketch how the mechanisms of conceptual and referential affordance can both be incorporated into a single, mixed interpretive model (see Boleda and Herbelot 2016 for a review of previous work combining formal and distributional semantics) .Footnote 16

We will use Discourse Representation Theory (DRT) as the scaffolding for our semantics. We use DRT because (1) the notion of discourse referent is crucial for implementing referentially afforded composition, and (2) the most recent research on compositional distributional semantics has not yet been able to show how such models can provide effective analyses of referential grounding or discourse dynamics (Bernardi et al. 2015; Sadrzadeh and Purver 2015) . This latter state of affairs leads us to tentatively hypothesize that compositional distributional semantics is best used to model only those parts of semantic composition that are, in our terms, conceptually afforded.

For reasons of space, we must assume basic familiarity with DRT ; the reader is referred to e.g. Kamp (1981) or Kamp and Reyle (1993) for background. Our implementation of DRT will be entirely standard, with just three exceptions. First, we need a means of connecting distributional semantic representations to Discourse Representation Structures (DRSs) . Second, as a result of doing this we will introduce minor modifications in our treatment of nominal and adjectival predication with respect to what is more generally assumed. Finally, we will need a way to distinguish conceptually afforded from referentially afforded composition.

We incorporate distributional semantics by building on the idea in Zamparelli (1995) that nouns (and not just certain kinds of generic noun phrases) denote Carlsonian kinds (Carlson 1977) .Footnote 17 The crucial step is to use distributional semantic representations rather than atomic abstract entities as models for kinds . However, as with the classic treatment of kinds as abstract entities, these distributional representations will be coded in the DRSs as constants. Since distributional representations are, mathematically-speaking, vectors , the constants we use for them will be indicated with an overhead arrow (e.g. \(\overrightarrow{\mathbf{box}}\)), as noted in the previous section.

We further extend Zamparelli’s idea to adjectives , also interpreting them as vectors (e.g. \(\overrightarrow{\mathbf{red}}\)). Since adjectives are not assumed to denote natural kinds but rather to pick out properties, this proposal can be seen as generalizing Zamparelli’s, substituting concepts for both kinds and properties along the way. Distributional vectors will thus serve as very crude representations for concepts.Footnote 18 The crucial step will be to allow the distributional representations for nouns and adjectives to combine with each other to yield new representations of the same type, whose role in the DRT part of the semantics is exactly analogous to the role of the representations for unmodified nouns.

In the previous section we briefly sketched how the composition of two vectors works. We assume that the grammar of a language indicates when semantic composition for certain phrases involves the composition of vectors , as opposed to other sorts of semantic operations. The composition of vectors happens outside of the DRT model, but as the result is also a vector, it can, like the component vectors, be associated with a constant in a DRS , which we will represent as e.g. comp(\(\overrightarrow{\mathbf{red}}\), \(\overrightarrow{\mathbf{box}}\)). In other words, constants of the form \(\overrightarrow{\mathbf{red}}\), \(\overrightarrow{\mathbf{box}}\), and comp(\(\overrightarrow{\mathbf{red}}\), \(\overrightarrow{\mathbf{box}})\) are all of the same type. Thus, distributional semantics will give us a relatively concrete algebraic model for simple and complex concepts on which both sorts of concepts are of fundamentally the same nature, much in the way lattice-theoretic structures serve as models for treating atomic entities and pluralities as fundamentally similar types of objects (Link 1983) .

The next piece we need is a way to exploit nouns and adjectives with such interpretations within DRT , so that referents can be associated with the concepts that nouns and adjectives pick out. Zamparelli used Carlson’s (1977) realization relation, which we represent here as Realize, aims to do this: This relation holds between an object and a kind just in case the object constitutes an instance of the kind.Footnote 19 Again following Zamparelli, we assume that the Realize relation is introduced by (possibly abstract) functional morphosyntax that turns a noun into an expression that denotes a set of entities. As a first approximation, then, we can represent a referential expression such as a box as in (8), where u is the discourse referent introduced by the phrase, which must satisfy the condition that it is a realization of the concept \(\overrightarrow{\mathbf{box}}\).

  1. (8)

Now consider modification . Prior to the point in the syntax at which the Realize relation is introduced, the composition operations at work will combine vector-denoting expressions; this corresponds to concept composition . We model conceptually-afforded composition as the result of composing adjective and noun vectors directly into a new vector, corresponding to a complex concept, which can then stand in the Realize relation to a discourse referent, as in (9).

  1. (9)

The syntactic rules of the language will have to make it clear when this sort of composition can be appealed to and when not; interestingly, studies of the syntax of modification clearly indicate that syntax could, indeed, encode this kind of information (see, e.g. McNally and Boleda 2004 and Bouchard 2005 on adjective ordering constraints of the sort exemplified by relational adjectives).

Now let us consider referentially-afforded concept composition. As mentioned in Sect. 3, this is attested only when the referent of the nominal is already familiar in the discourse. This referent therefore plays a role in the interpretation of the combination of the modifier and noun. We see two ways in which this could be implemented. One would be to take the referent to modulate the composition operation that combines the adjective and noun vectors . This could be represented as in (10), where the subscript u indicates modulation by referent u.

  1. (10)

On this view, the concept associated with red in the context would be exactly the same across all contexts, but its interaction with the concept contributed by the noun would vary from one context to the next, for example by the use of varying weights on the sums or products of the vectors.

Alternatively, the vector corresponding to the adjective could be modified as a function of the referent, i.e. reinterpreted as an ad hoc, referent-mediated property, as could be represented in (11).

  1. (11)

On this view, the composition operation as such is not altered in any way; rather the input to that operation is. In other words, red in this example would simply be associated with a different concept in the context in question. Further research will be needed to determine which of these options constitutes a better analysis of the facts, or, indeed, if they are empirically distinguishable. However, it is worth noting that this latter approach closely resembles the indexical interpretations of adjectives proposed by Bosch (1983) and Rothschild and Segal (2009) , briefly introduced in Sect. 2.

These analyses do not offer an account of how context intervenes to determine the referentially afforded interpretation; in this, unfortunately, we are in good company, as no theory we know of offers such an account, and the area is one in which much more research is needed.

We close this section with some very brief, speculative comments on how the proposed analysis relates to classical analyses of adjective modification of nouns within formal semantics. Such modification has been analyzed in two ways: Either by treating the adjective as a second-order property that takes the noun as its argument, or by treating it as a first-order property that is combined via conjunction or set intersection with the (first-order) property denoted by the noun (see e.g. Kamp 1975 ; Siegel 1976 ; Larson 1998 , among many others, for proposals and discussion). The latter analysis is appropriate specifically for cases of so-called intersective modification, when the adjectival and nominal properties are each entailed to hold of the individual being described. The former is more general and can be used not only for intersective modification but also for non-intersective modification, namely, subsective modification, where the adjectival property does not obviously directly hold of the referent but the noun property does (cf. molecular biologist), and intensional modification, where the nominal property is not entailed to hold (or is entailed not to hold) at the time or world of ascription (former mayor or alleged thief).

All of the above-sketched implementations of concept composition are counterparts of non-intersective modification . In no case is the concept contributed by the adjective directly related to the referent. Moreover, as we have set things up, our analysis of concept composition directly captures the intuition developed in Landman (2001) and Partee (2010) that all adjective-noun combinations , even intensional modification, are, in some sense, subsective, that is, the nominal description is always somehow used to identify the referent, insofar it contributes positively to the eventual complex description that the referent is related to via the Realize relation. Of course, it remains to explore how to reproduce the entailment effects of the world and temporal parameters that have played a role in traditional analyses of intensional adjectives , but we note that one surprising result of the study in Boleda et al. (2013) was that intensional adjectives turned out to be no more difficult to model in distributional semantics than other kinds of adjectives, insofar as, all other things being equal, compositional distributional semantic techniques could predict the semantic representation for phrases containing intensional adjectives from the representations of the component words just as well as they could for phrases containing nonintensional adjectives (see Sect. 4, above).

6 Conclusions

Semantic composition is a dynamic process that cannot be understood without simultaneously considering what we are referring to and the concepts associated with the words we are using. Concepts, and thus the words associated with them, encode significant regularities. At the same time, they are plastic, insofar as we must use a finite vocabulary to describe a potentially infinite variety of situations and generalizations in the world. However, once a word is applied to a referent, that word is grounded in a very specific manner, and the referent can influence the way we understand the word and its associated concept(s) in the context of use. This interplay between our conceptual structure and the world is what motivated the first contribution of this paper, namely to propose that modification works in two ways: It can be conceptually afforded, when the modifier and the head introduce concepts that fit to form a complex concept, and the speaker and the hearer use this fit in their interpretations; or referentially afforded, when the result of combining the modifier with the noun depends on specific properties of the referent. This proposal has an antecedent in Asher (2011) , but we have made it more explicit and have proposed a specific analysis combining distributional semantics and DRT .

Along the way we hope to have made a case for further exploring distributional representations within semantic theory. They are automatically induced (and thus easy to construct and empirically well-founded), have some psychological plausibility (Landauer and Dumais 1997 and subsequent work), and offer a wealth of empirical data. Distributional semantic representations also avoid some of the weaknesses of semantic primitives: Since they generally encode a relatively large number of features with continuous values,Footnote 20 they can express many more nuances of meaning than a small set of discrete features, while at the same time accounting for default interpretations . The key is to recognize their limitations. In this respect, we consider promising the division of labor between distributional semantics and a referential semantic framework like DRT.