Keywords

1 Introduction

Vector semantic models, otherwise known as distributional models, are based on the contextual aspects of language, the company each word keeps, and patterns of use in corpora of documents. Truth conditional models focus on the logical and denotational aspects of language, sets of objects with certain properties and application and composition of functions. Vector semantics and truth conditional models are based on different philosophies; in recent years there has been much effort to bring them together under one umbrella, see for example [13, 8, 9].

In a recent abstract [14], we sketched an approach to semantics that assigned vector meanings to linguistic phrases using a simply typed lambda calculus in the tradition of [10]. Our previous system was guided by a truth conditional interpretation and provided vector semantics very similar to the approaches of [13, 8, 9]. The difference was that the starting points of these latter approaches are categorial logics such as Pregroup Grammars and Combinatorial Categorial Grammar (CCG). Our reasoning for the use of lambda calculus was that it directly relates our semantics to higher order logic and makes standard ways of treating long distance dependencies and coordination accessible to vector-based semantics. In this short account, we follow the same lines as in our previous work. But whereas in previous work we worked with a static interpretation of distributions, here, we focus on a dynamic interpretation.

The lambda calculus approach we use is based on the Lambda Grammars of [11, 12], which were independently introduced as Abstract Categorial Grammars (ACGs) in [5]. The theory developed here, however, can be based on any syntax-semantics interface that works with a lambda calculus based semantics. Our approach is agnostic as to the choice of a syntactic theory. Lambda Grammars/ACGs are just a framework for thinking about type and term homomorphisms and we are using them entirely in semantics here. In a longer paper we will show in more detail how lambda logical forms (the abstract terms) can be obtained: (1) from standard linguistic trees with the help of a procedure that is essentially that of Heim and Kratzer [7]; (2) from LFG f-structures by means of a ‘glue logic’; (3) from Lambek proofs by means of semantic recipes; (4) and from CCG derivations by means of using the combinators associated with CCG rules.

The dynamic interpretation we work with here is the “context change potential” of [6]. We believe other dynamic approaches, such the update semantics of [16] and the continuation-based semantics of [4], can also be used; we aim to do these in future.

2 Heim’s Files and Distributional Contexts

Heim describes her contexts as files that have some kind of information written on (or in) them. Context changes are operations that update these files, e.g. by adding or deleting information from the files. Formally, a context is taken to be a set of sequence-world pairs, in which the sequences come from some domain \(\mathcal {D}_I\) of individuals, as follows:

$$ ctx\subseteq \{(g,w) \mid g :\mathbb {N} \rightarrow \mathcal{D}_I, w \ \text {a possible world}\} $$

(We follow Heim [6] here in letting the sequences in her sequence-world-pairs be infinite, although they are best thought of as finite.)

Sentence meanings are context change potentials (CCPs) in Heim’s work, functions from contexts to contexts. A sentence S comes provided with a sequence of instructions that, given any context ctx, updates its information so that a new context denoted as

$$\begin{aligned} ctx + S \end{aligned}$$

results. The sequence of instructions that brings about this update is derived compositionally from the constituents of S.

In distributional semantics, contexts are words somehow related to each other via their patterns of use, e.g. by co-occurring in a neighbourhood word window of a fixed size or via a dependency relation. In practice, one builds a context matrix M over \(\mathbb {R}^2\), with rows and columns labeled by words from a vocabulary \(\varSigma \) and with entries taking values from \(\mathbb {R}\), for a full description see [15]. M can be seen as the set of its vectors:

$$ \{\overrightarrow{v} \mid \overrightarrow{v} :\varSigma \rightarrow \mathbb {R}\} $$

where each \(\overrightarrow{v}\) is a row or column in M.

If we take Heim’s domain of individuals \(\mathcal{D}_I\) be the vocabulary of a distributional model of meaning, that is \(\mathcal{D}_I := \varSigma \), then a context matrix can be seen as a so-called quantized version of a Heim context:

$$ \{(\overrightarrow{g},w) \mid \overrightarrow{g} :\varSigma \rightarrow \mathbb {R}, w \ \text {a possible world}\} $$

Thus a distributional context matrix is obtainable by endowing Heim’s contexts with \(\mathbb {R}\). In other words, we are assuming that not only a file has a set of individuals, but also that these individuals take some kind of values, e.g. from reals.

The role of possible worlds in a distributional semantics is arguable, as vectors retrieved from a corpus are not naturally truth conditional. Keeping the possible worlds in the picture provides a machinery to assign a proposition to a distributional vector by other means and can become very useful. But for the rest of this abstract, we shall deprive ourselves from this advantage and only work with the following set as our context:

$$ \{\overrightarrow{g} \mid \overrightarrow{g} :\varSigma \rightarrow \mathbb {R}, \overrightarrow{g} \in M\} $$

Distributional versions of Heim’s CCP’s can be defined based on the intuitions and definitions of Heim. In what follows we pan out how these instructions let contexts thread through vectorial semantics in a compositional manner.

3 Vectors, Matrices, Lambdas

Lambda Grammars of [11, 12] were independently introduced as Abstract Categorial Grammars (ACGs) in [5]. An ACG generates two languages, an abstract language and an object language. The abstract language will simply consist of all linear lambda terms (each lambda binder binds exactly one variable occurrence) over a given vocabulary typed with abstract types. The object language has its own vocabulary and its own types. It results from (1) specifying a type homomorphism from abstract types to object types and (2) specifying a term homomorphism from abstract terms to object terms. The term homomorphism must respect the type homomorphism. For more information about the procedure of obtaining an object language from an abstract language, see the papers mentioned or the explanation in [13].

Let the basic abstract types of our setting be D (for determiner phrases), S (for sentences), and N (for nominal phrases). Let the basic object types be I and R. The domain \(\mathcal {D}_I\) corresponding to I can be thought of as a vocabulary, \(\mathcal {D}_R\) models the set of reals \(\mathbb {R}\). The usual operations on \(\mathbb {R}\) can be defined using Tarski’s axioms (in full models that satisfy these axioms \(\mathcal {D}_R=\mathbb {R}\) will hold; in generalised models we get what boils down to a first-order approximation of \(\mathbb {R}\)). Objects of type \(I\rightarrow R\) are abbreviated to IR; these are identified with vectors with a fixed basis.

We will associate simple words like names, nouns and verbs with vectors, i.e. with objects of type IR and will denote these with constants like \(\overrightarrow{{\textsf {woman}}}\), \(\overrightarrow{{\textsf {smoke}}}\), etc. The typed lambda calculus will be used to build certain functions with the help of these vectors that will then function as the meanings of those words. The meanings of content words will typically be functions that are completely given by some vector, but they will not (necessarily) be identified with vectors (see also Table 1 below).

Sentences will be context change potentials. A context for us is a matrix, thus it has type \(I^2R\). A sentence takes the type \((I^2R)(I^2R)\). We abbreviate IR as V, \(I^2 R\) as M and the sentence type MM as U (for ‘update’). Verbs take a vector for each of their arguments, plus an input context, and return a context as their output. For instance, an intransitive verb takes a vector for its subject plus a context and returns a modified context. Thus it takes type \(VMM=VU\). A transitive verb takes a vector for its subject, a vector for its object and a context and returns a context. Thus it has type VVU. Nouns are essentially treated as vectors (V), but, since they must be made capable of dynamic behaviour, they are ‘lifted’ to the higher type (VU)U. Our dynamic type homomorphism \(\rho \) is defined by letting \(\rho (N)=(VU)U\), \(\rho (D)=V\) and \(\rho (S)=U\). Some consequences of this definition can be found in Table 1.

Table 1. Some abstract constants a typed with abstract types \(\tau \) and their term homomorphic images H(a) typed by \(\rho (\tau )\) (where \(\rho \) is a type homomorphism, i.e. \(\rho (AB)=\rho (A)\rho (B)\)). Here Z is a variable of type VU, Q is of type (VU)U, v of type V, c of type M, and p and q are of type U. The functions F, G, I, and J are explained in the main text. In the schematic entry for and, we write \(\rho (\overline{\alpha })\) for \(\rho (\alpha _1)\cdots \rho (\alpha _n)\), if \(\overline{\alpha }=\alpha _1\cdots \alpha _n\).

4 Context Update for Lambda Binders

Object terms corresponding to a content word a may update a context matrix c with the information in \(\overrightarrow{a}\) and the information in the vectors of arguments of a. The result is a new context matrix \(c'\), with different value entries.

$$ \left( \begin{array}{ccc} m_{11} &{} \cdots &{} m_{1k}\\ m_{21} &{} \cdots &{} m_{2k}\\ \vdots &{}&{}\\ m_{n1} &{} \cdots &{} m_{nk} \end{array} \right) + \overrightarrow{a}, u,v, \cdots = \left( \begin{array}{ccc} m'_{11} &{} \cdots &{} m'_{1k}\\ m'_{21} &{} \cdots &{} m'_{2k}\\ \vdots &{}&{}\\ m'_{n1} &{} \cdots &{} m'_{nk} \end{array} \right) $$

An example of a set of elementary update instructions may be as follows.

  • The function denoted by \(\lambda vc.G(\overrightarrow{\mathsf{smoke}}, v, c)\) increases the value entry of \(m_{ij}\) of c, for i and j indices of smoke and its subject v.

  • The function denoted by \(\lambda uv.\lambda c.I(\overrightarrow{\mathsf{love}}, u,v, c)\) increases the value entries of \(m_{ij}\), \(m_{jk}\), and \(m_{ik}\) of c, for ijk indices of loves, its subject u and its object v.

  • The function denoted by \(\lambda vc.F(\overrightarrow{\mathsf{tall}}, v, c)\) increases the value entry of \(m_{ij}\) of c, for i and j indices of tall and its modified noun v. The entry for tall in Table 1 uses this function, but allows for further update of context.

  • The function denoted by \(\lambda vc.J(\overrightarrow{\mathsf{know}}, v, c)\) increases the value entry of \(m_{ij}\) of c, for i and j indices of know and its subject v. The updated matrix is made the input for further update (by the context change potential of the sentence that is known) in Table 1.

Logical words such as every and and are often treated as noise in distributional semantics and not included in the context matrix. We have partly followed this approach here by treating every as the identity function (the noun already has the required ‘quantifier’ type (VU)U). To see this, note that the entry for ‘every’, \(\lambda Q.Q\), is the identity function; it takes a Q and then spits it out again. The alternative would be to have an entry along the lines of that of ‘tall’, but this would not make a lot of sense. It is the content words that seem to be important in a distributional setting, not the function words.

The word and does have a function here though—it is treated as a generalised form of function composition. The entry for the word in Table 1 is schematic, as and does not only conjoin sentences, but also other phrases of any category. So, the type of the abstract constant connected with the word is \({(\overline{\alpha }S)(\overline{\alpha }S)(\overline{\alpha }S)}\), in which \(\overline{\alpha }\) can be any sequence of abstract types. Ignoring this generalisation for the moment, we obtain SSS as the abstract type for sentence conjunction, with a corresponding object type UUU, and meaning \(\lambda pqc.p(qc)\), which is just function composition. This is defined in a way such that the context updated by and’s left argument will be further updated by its right argument. So ‘Sally smokes and John eats bananas’ will, given an initial matrix c, first update c to \(G(\mathsf{Sally}, \mathsf{smoke}, c)\), which is a matrix, and then update this further with ‘John eats bananas’ to \(I(\mathsf{eat}, \mathsf{John}, \mathsf{bananas}, G(\mathsf{smoke}, \mathsf{Sally}, c))\).

This treatment is easily extended to coordination in all categories. For example, the reader may check that and admires loves (which corresponds to loves and admires) has \(\lambda uvc.I(\overrightarrow{\mathsf{admire}},u,v,I(\overrightarrow{\mathsf{love}},u,v,c))\) as its homomorphic image.

The update instructions fall through the semantics of phrases and sentences compositionally. The sentence every tall woman smokes, for example, will be associated with the following lambda expression:

$$ (\mathsf{every\ tall\ woman}) \lambda \zeta .(\mathsf{smokes}\ \zeta ) $$

This in its turn has a term homomorphic image that is \(\beta \)-equivalent with the following:

$$\lambda c.G\left( \overrightarrow{\mathsf{smoke}}, \overrightarrow{\mathsf{woman}},F(\overrightarrow{\mathsf{tall}}, \overrightarrow{\mathsf{woman}},c)\right) $$

which describes a distributional context update for it. This term describes a first update of the context c according to the rule for the constant tall, and then a second update according to the rule for the constant smokes. As a result of these, the value entries at the crossings of \(\langle \) tall, woman\(\rangle \) and \(\langle \) woman, smokes\(\rangle \) get increased. Much longer chains of context updates can be ‘threaded’ in this way.

In the following we give some examples. In each case the a. sentence is followed by an abstract term in b. which captures its syntactic structure. The update potential that follows in c. is the homomorphic image of this abstract term.

figure a

5 Conclusion and Future Directions

In previous work, we showed how a static interpretation of the lambdas will provide vectors for phrases and sentences of language. There, the object type of the vector of a word depended on its abstract type and could be an atomic vector, a matrix, or a cube, or a tensor of higher rank. Means of combinations thereof then varied based on the tensor rank of the type of each word. For instance one could take the matrix multiplication of the matrix of an intransitive verb with the vector of its subject, whereas for a transitive verb the sequence of operations were a contraction between the cube of the verb and the vector of its object followed by a matrix multiplication between the resulting matrix and the vector of the subject. A toolkit of functions needed to perform these operations was defined in previous work. That toolkit can be restated here for the type \(I^2R\), rather than the previous IR, to provide means of combining matrices and their updates, if needed.

In this work, we show how a dynamic interpretation of the lambdas will also provide vectors for phrases and sentences of language. Truth conditional and vector models of language follow two very different philosophies. The vector models are based on contexts, the truth models on denotations. The dynamic interpretations of language, e.g. the approach of Heim, are also based on context update, hence these seem a more appropriate choice. In this paper, we showed how Heim’s files can be turned into vector contexts and how her context change potentials can be used to provide vector interpretations for phrases and sentences. Our context update instructions were defined such that they would let contexts thread through vector semantics in a compositional manner.

Amongst the things that remain to be done in a long paper is to develop a vector semantics for the lambda terms obtained via other syntactic models, e.g. CCG, LFG, and Lambek Grammars, as listed at the end of the introduction section. We also aim to work with other update semantics, such as continuation-based approaches. One could also have a general formalisation wherein both the static approach of previous work and the dynamic one of this work cohabit. This can be done by working out a second pair of type-term homomorphisms that will also work with Heim’s possible world part of the contexts. In this setting, the two concepts of meaning: truth theoretic and contextual, each with its own uses and possibilities, can work in tandem.