1 Introduction

‘Penelope likes eggs for brunch’ is closer in meaning to ‘Penelope likes eggs for lunch’ than ‘Cats mew’ is to ‘I love New York’. Likewise, ‘Tom is 16 years old’ is semantically closer to ‘Tom is 17 years old’ than ‘Dick is 78 years old’ is to ‘Harry is 57 years old’. And ‘Thus passes the glory of the world’ is closer in meaning to the Latin ‘Sic transit gloria mundi’ than the Mel Brooks-inspired alternative translation ‘Gloria is sick’. Such examples suggest that some statements are closer in meaning than others.

Logic, or more precisely formalisation, also relies on the idea that statements are closer or further apart in meaning in this same sense. Among the mostly implicit criteria we appeal to when formalising is that of semantic proximity. This criterion enjoins us to formalise the natural-language sentence s as a formal sentence \(\sigma\) if \(\sigma\) may be interpreted so as to be as close in meaning to s as possible.Footnote 1 For example, \(\exists x Fx\) is on this criterion a better propositional formalisation of \(s =\)‘Someone is French’ than \(\forall x Fx\) is. The reason is that some interpretation of \(\exists x Fx\) is closer in meaning to s than any interpretation of \(\forall x Fx\).

A second philosophical application of the idea that there can be greater or lesser semantic distance between two propositions comes from the philosophy of language.Footnote 2 Some theories of meaning formalise natural-language sentences in some preferred formal language(s) and provide satisfaction- or truth-conditions for the formal vocabulary.Footnote 3 These theories’ success turns partly on whether the resulting sentences’ satisfaction- or truth-conditions are sufficiently close in meaning to those of the original sentences.

A third connection is with the philosophy of science. Ever since Popper (1963, ch. 10), philosophers of science have studied formal models of verisimilitude in an attempt to model the intuitive idea that some theories or propositions are closer to the truth than others. Take the (somewhat hackneyed) example of the number of planets:Footnote 4 the statement that there are 9 of them is closer to the truth than the statement that there are 9 billion of them. The claim ‘\(s_1\) is more verisimilar than \(s_2\)’ is the special case of the claim ‘\(s_1\) is closer to \(s_3\) than \(s_2\) is to \(s_3\)’ in which \(s_3\) is the truth of the matter.

The existence of propositional similarity facts also follows from another consideration. Sharing identical or like properties makes for similarity. For example, a crimson flag is more similar to a scarlet flag than a blue flag is to a green flag, other things being equal. As a consequence, the sentence ‘the flag is crimson’ is truth-conditinally more similar to ‘the flag is scarlet’ than ‘the flag is blue’ is to ‘the flag is green’. The way the world has to be for the first sentence to be true is more similar to how it has to be for the second to be true than the way the world has to be for the third to be true is to how it has to be for the fourth to be true. Similarity of properties or objects thus makes for propositional similarity.

In short, the idea that sentences can be closer or further apart in meaning is highly intuitive. Moreover, it is a pillar of logic, semantic theory and the philosophy of science, and a consequence of other commitments regarding similarity. Our aim here is to define the relation expressed by ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. Although the notion of verisimilitude has been extensively studied (Sect. 4 contains some references), the proposal presented here for comparing the ‘distance’ between two pairs of propositions is novel. Previous discussion in the literature proceeds by assuming an underlying metric that captures the distance between worlds. The present article, in contrast, eschews metric assumptions, which are unrealistic in all but a few applications of interest. We do not, for example, assume that there is a numerical distance between our world and the worldFootnote 5 most like ours in which Hillary Clinton won the 2016 US presidential election.

A point worth clarifying at the outset is that there are several notions of similarity among natural-language sentences. The sentence ‘Hillary Clinton is well’ is in a sense semantically very close to ‘Hillary Clinton is unwell’: both sentences assert something about a particular person’s health, as opposed to say ‘\(e^{i\pi } + 1 =0\)’ which has an entirely different subject matter.Footnote 6 Yet truth-conditionally, ‘Hillary Clinton is well’ and ‘Hillary Clinton is unwell’ are far from close. Likewise, the tautology ‘It’s raining or it’s not raining’ is similar to the contradiction ‘It’s raining and it’s not raining’ in terms of both linguistic expression and subject matter; yet as far as semantic value is concerned, the two sentences could not be further apart. In this paper, we focus on a notion of similarity among propositions based on similarity neither of linguistic expression nor of subject matter but of truth-conditions. The examples thus far have hopefully given readers a sense of the target notion, which it will now be our business to explicate and further clarify.

2 Preliminaries

We wish to account for comparative propositional similarity facts expressed by sentences of the form ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. Our approach will be to derive such facts from similarity facts about worlds, avoiding any metric assumptions (for reasons that will emerge in Sect. 4). Section 3 presents the models and Sect. 4 comments on them. Before that, a few preliminaries about the framework are in order.

It will be useful to model worlds as sets of propositions and propositions as sets of worlds. We construe worlds as maximally consistent sets of literals (atoms/sentence letters or negations thereof) from a fixed language. This propositional language, which we call \({\mathcal {L}}\), has \(\{p_0, p_1 \ldots , p_n, \ldots \}\) as its set of atoms/sentence letters, is truth-functionally complete and has standard formation rules.Footnote 7 For example, under this construal

\(w^* = \{p_i:\) is even \(\} \cup \{\lnot p_i:\) is odd \(\} = \{p_0, \lnot p_1, p_2, \lnot p_3, \ldots \}\)

is a world.

Construing worlds as sets of propositions is a matter of formal convenience only: we do not assume that worlds really are sets of propositions. The assumption is only that, for certain purposes, worlds may be modelled as sets of propositions. To put it another way, we are interested in worlds only in as much as they do or don’t share some salient properties, expressed by the propositions \(p_0\), \(p_1\), \(p_2\), .... That is, when presented with a world w, we ask whether it satisfies \(p_0\), \(p_1\), \(p_2\), ...; the answer to these questions is all the information about w we are interested in.Footnote 8 Worlds in our models are thus really equivalence classes of worlds in which the given literals hold. The set of worlds may be thought of as the set (or class) of all possible worlds quotiented by \({\mathcal {L}}\)-equivalence, where \({\mathcal {L}}\) is the propositional language we are interested in. Two worlds are in the same \({\mathcal {L}}\)-equivalence class just when they agree on the truth-values of \({\mathcal {L}}\)-atoms, i.e. similarity respects of interest.

Our models also construe propositions as sets of possible worlds. Thus \(p_3 \vee p_4 = \{w: p_3 \vee p_4\) is true at \(w\}\) is a proposition, and if \(w^* = \{p_i:\) is even \(\} \cup \{\lnot p_i:\) is odd \(\}\) as above, \(w^* \in p_3 \vee p_4\), since \(p_4\) (and hence \(p_3 \vee p_4\)) is true at \(w^*\). Modelling propositions as sets of possible worlds is a familiar technique in analytic philosophy, linguistics and logic. As above, we are not really assuming that propositions really are sets of worlds. We could, if we were less sparing with notation, mark the relation of a proposition holding at a world with a notational primitive. But it will be simpler to construe worlds as sets of propositions, as long as we keep in mind what this means.Footnote 9

So: we use the identity sign when representing propositions as sets of worlds and worlds as sets of propositions for brevity, not out of philosophical commitment. Naturally, modelling propositions as sets of worlds is adequate for certain purposes but too coarse-grained for others. For example, modelling all mathematical, necessary and logical truths as the same proposition entails identifying them all. However, these limitations are as familiar as they are general. They no more threaten our comparative model(s) than they do similar models in metaphysics, the philosophy of language, etc. Finally, though natural, we need not think of the worlds as metaphysically possible; we may equally think of them as epistemically possible, or in some other way. Our models are neutral between various such interpretations.

3 Comparative Models

To introduce our comparative models, recall two canonical versions of counterfactual logic, due to Stalnaker (1968) and Lewis (1973) respectively. To spell out the truth-conditions of ‘If A were the case then B would be the case’, Stalnaker and Lewis assume the existence of a three-place comparative similarity relation among worlds. This may be glossed as ‘\(w_2\) is more similar to \(w_1\) than \(w_3\) is to \(w_1\), as judged from the perspective of \(w_1\)’. Stalnaker and Lewis also make further, slightly different,Footnote 10 assumptions on worlds, and base their semantics for propositional counterfactual logic (which need not detain us here) on the metaphysical assumption that worlds stand in the similarity relation just specified. Note that the clause ‘as judged from the perspective of \(w_1\)’ indicates which standards of similarity determine the relative proximity of \(w_2\) and \(w_3\) to \(w_1\); but if we prescind from applications in counterfactual logic, there is no reason to suppose that these standards are somehow fixed by \(w_1\), the world of comparison. Taking the standards of similarity as independently fixed for now (i.e. fixed in a way we are not concerned with), we symbolise a three-place comparative similarity relation between worlds as

\((w_2, w_1) < (w_3, w_1)\),

read as ‘\(w_2\) is more similar to \(w_1\) than \(w_3\) is to \(w_1\)’. This three-place relation can be recaptured from the four-place relation

\((w_2, w_1) < (w_3, w_4)\),

read as ‘\(w_2\) is closer to \(w_1\) than \(w_3\) is to \(w_4\)’ by setting \(w_4 = w_1\).

Our comparative models exploit the existence of a four-place comparative similarity relation among worlds to define a four-place comparative similarity relation among sets of worlds or propositions. The latter may be written as:

\((\alpha , \beta ) < (\gamma , \delta )\),

read as ‘proposition \(\alpha\) is more similar to proposition \(\beta\) than proposition \(\gamma\) is to proposition \(\delta\)’, relative to some fixed standards of similarity. For expository clarity, we present a special case of the model in Sect. 3.1 before generalising it in Sects. 3.23.3.

3.1 A special case: the lexicographic model

Suppose there is a set of similarity respects, the first of which makes for greatest similarity, the second of which is next-most important, and so on. As always, we may think of these respects as the \({\mathcal {L}}\)-propositions \(p_0, p_1, \ldots\) (renumbering if necessary). Thus suppose we are interested in the outcome of a Formula 1 race, and we base our similarity comparisons among worlds first and foremost on who wins the race, second, on who comes second, and so on. In that case, world \(w_1\) is more similar than \(w_2\) is to the actual world w iff the highest-placed divergence between the drivers’ final positions in \(w_1\) compared to their final positions in the actual world is higher than the highest-placed divergence between the drivers’ final positions in \(w_2\) compared to their actual final positions. In the general case, \(p_0\) is the similarity respect that matters most, \(p_1\) matters second-most, etc.Footnote 11 Let’s see how to make formal sense of this idea.

As above, we take a world to be a maximally consistent set of \({\mathcal {L}}\)-literals, and an \({\mathcal {L}}\)-proposition to be the set of worlds at which it is true. (We assume in this subsection that the \({\mathcal {L}}\)-atoms \(p_0, p_1, \ldots\) are countably infinite.) We also order the subsets of the non-negative integers \(\omega = \{0, 1, \ldots , n, \cdots \}\) lexicographically, using the standard ordering on \(\omega\). More precisely, a set \(s_1 \subseteq \omega\) is smaller than \(s_2 \subseteq \omega\) if the least element of \(s_1\) is larger than the least element of \(s_2\) or \(s_1\) has no least element (i.e. \(s_1\) is empty) but \(s_2\) does (i.e. \(s_2\) is non-empty); or, if the least elements of \(s_1\) and \(s_2\) both exist and are equal, the second-least element of \(s_1\) is larger than the second-least element of \(s_2\) or \(s_1\) has no second-least element but \(s_2\) does; and so on. For example, under this ordering, which we label \({\mathbb {LEX}}\), the seven subsets \(\emptyset\), \(\{0\}\), \(\{1\}\), \(\{2\}\), \(\{1, 2\}\), \(\{2, 4\}\) and \(\omega\) of \(\omega\) are ordered as follows

\(\emptyset<_{\mathbb {LEX}} \{2\}<_{\mathbb {LEX}} \{2, 4\}<_{\mathbb {LEX}} \{1\}<_{\mathbb {LEX}} \{1, 2\}<_{\mathbb {LEX}} \{0\} <_{\mathbb {LEX}}\omega\)

Now if \(w_i\) and \(w_j\) are worlds—maximally consistent sets of \({\mathcal {L}}\)-literals—we set

\(DIST_{WW} (w_i, w_j) =_{def} \{n \in \omega : p_n \notin w_i \cap w_j \vee \lnot p_n \notin w_i \cap w_j \}\).

In other words, \(DIST_{WW} (w_i, w_j)\) is the set of indices n such that \(w_i\) and \(w_j\) disagree on \(p_n\)’s truth-value. Here as elsewhere we are implicitly thinking of \(l \in w\) as l’s being true in w, for l a literal. Thinking also of \(\{p_0, p_1 \ldots , p_n, \ldots \}\) as the (labelled) set of comparison respects in descending order of importance, the higher \(DIST_{WW}(w_i, w_j)\) is in the \({\mathbb {LEX}}\)-ordering, the more \(w_i\) and \(w_j\) diverge; conversely, the lower \(DIST_{WW}(w_i, w_j)\) is in the \({\mathbb {LEX}}\)-ordering, the more the two worlds are similar. Note that if \(w_i = w_j\), then \(DIST_{WW} (w_i, w_j) = \emptyset\), the least element in the \({\mathbb {LEX}}\)-ordering; this is as it should be, since \(w_i\) and \(w_j\) are then as similar as possible since they’re identical. \(DIST_{WW}\) is thus a qualitative (rather than quantitative) measure of the ‘distance’ between worlds.Footnote 12

Now let \(\alpha\) be a (non-contradictory) \({\mathcal {L}}\)-proposition, i.e. a (non-empty) set of worlds. We let

\(DIST_{WP} (w, \alpha ) =_{def}\) the \({\mathbb {LEX}}\)-least element of the set \(\{DIST_{WW}(w, w^*): w^* \in \alpha \}\).

The function \(DIST_{WP}\) is thus a measure of the ‘distance’ of a world to a set of worlds corresponding to a proposition.Footnote 13 The subscript ‘WP’ in ‘\(DIST_{WP}\)’ indicates that the distance in question is that of a world to a proposition; similarly, mutatis mutandis, for \(DIST_{WW}\) as well as for \(DIST_{PP}\) (to follow). According to \(DIST_{WP}\), then, w’s distance to \(\alpha\) is its limiting closest distance to an element of \(\alpha\), a limit that is always attained. In particular, if w is a member of \(\alpha\) then its distance is as small as possible (in the \({\mathbb {LEX}}\)-ordering), since it is the empty set. If \(\alpha\) were an arbitrary set of worlds, \(DIST_{WP} (w, \alpha )\) would not necessarily have a \({\mathbb {LEX}}\)-least element; but since \(\alpha\) is an \({\mathcal {L}}\)-proposition, it is guaranteed to have one. The proof is in Appendix 1.1.

Now let \(\alpha\) and \(\beta\) both be (non-contradictory) \({\mathcal {L}}\)-propositions. We set

\(DIST_{PP} (\alpha , \beta ) =_{def}\) the \({\mathbb {LEX}}\)-largest element of the set \(\{DIST_{WP} (w, \alpha ): w \in \beta \} \cup \{DIST_{WP} (w, \beta ): w \in \alpha \}\)

As above, if \(\alpha\) and \(\beta\) were arbitrary sets of worlds, \(DIST_{WP} (\alpha , \beta )\) would not necessarily have a \({\mathbb {LEX}}\)-largest element; but since they are \({\mathcal {L}}\)-propositions, it is guaranteed to have one. The proof may be found in Appendix 1.2.

Let’s illustrate our definitions thus far with an example. Consider the \({\mathcal {L}}\)-propositions \(p_0 \vee p_1\) and \(p_1 \wedge \lnot p_2\), construed as sets of worlds, which are in turn construed as maximal consistent sets of \({\mathcal {L}}\)-literals. Since any world in \(p_1 \wedge \lnot p_2\) is in \(p_0 \vee p_1\), \(DIST_{WP} (w, p_0 \vee p_1) = \emptyset\) for any world w in \(p_1 \wedge \lnot p_2\). Let us next consider how far a world w in \(p_0 \vee p_1\) may be from \(p_1 \wedge \lnot p_2\). By considering the value of \(p_0\) in w, we see that in order to minimise w’s proximity to \(p_1 \wedge \lnot p_2\) (equivalently: maximise its distance from \(p_1 \wedge \lnot p_2\)), the literals \(p_0\), \(\lnot p_1\) and \(p_2\) must all be true in w; but for any \(p_i\) with \(i \ge 3\), there is a world in \(p_1 \wedge \lnot p_2\) that \(p_i\)-wise matches any given world w in \(p_0 \vee p_1\). Thus for example, if \(w = \{p_0, \lnot p_1, p_2\} \cup \{p_i: i \ge 3\}\) then the nearest world to w in \(p_1 \wedge \lnot p_2\) is the world \(\{p_0, p_1, \lnot p_2\} \cup \{p_i: i \ge 3\}\). Hence for this w, \(DIST_{WP} (w, p_1 \wedge \lnot p_2) = \{1, 2\}\), and the same goes for any world w in which \(p_1\) is false and \(p_2\) is true (and in which, consequently, \(p_0\) is true, since w is in \(p_0 \vee p_1\)). \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2)\) is then the \({\mathbb {LEX}}\)-maximum of \(\emptyset\) and \(\{1, 2\}\), and so \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2) = \{1, 2\}\).

Putting everything together, we may define a four-place similarity relation \(\prec\) among \({\mathcal {L}}\)-propositions:

\((\alpha , \beta ) \prec (\gamma , \delta ) =_{def} DIST_{PP} (\alpha , \beta ) <_{\mathbb {LEX}} DIST_{PP} (\gamma , \delta )\),

with the left-hand side read as ‘proposition \(\alpha\) is more similar to proposition \(\beta\) than proposition \(\gamma\) is to proposition \(\delta\)’. Similarly,

\((\alpha , \beta ) \sim (\gamma , \delta ) =_{def} DIST_{PP} (\alpha , \beta ) = DIST_{PP} (\gamma , \delta )\),

and

\((\alpha , \beta ) \preceq (\gamma , \delta ) =_{def} (\alpha , \beta ) \prec (\gamma , \delta ) \vee (\alpha , \beta ) \sim (\gamma , \delta )\).

(\(\preceq\) is the weak counterpart of the strict relation \(\prec\).) An easy check shows that ordered pairs of propositions under \(\preceq\) form a linear pre-order, i.e. a transitive, reflexive, linear,Footnote 14 but not necessarily antisymmetric,Footnote 15 order. Readers familiar with the Hausdorff metric will recognise \(DIST_{PP}\) as a sort of qualitative—metricless—version of it; this point will be explained in Sect. 4.2, when we stress the importance of doing without metric assumptions.

3.2 A general comparative model

Section 3.1 assumed that similarity among worlds is determined lexicographically with reference to a countable infinity of respects of similarity: the first respect matters most, the second matters second-most, and so on. From the order \(<_{\mathbb {LEX}}\) on the subsets of \(\omega\) thereby generated, we derived a four-place comparative relation \(\prec\) (or \(\preceq\) for the weak relation) on propositions.

The lexicographic model makes \(p_0\) an absolute dictator, in the sense that dissimilarity in this respect cannot be compensated for by similarity in others; likewise for similarity in the \(p_i\)-respect versus similarity in lesser-ranked respects. To avoid this consequence, a more general comparative model does away with the assumption that similarity of worlds is determined lexicographically; it simply generates \(\prec\) from an arbitrary linear pre-order. We may also relax the assumption that the number of atoms is countably infinite, since nothing turns on it. Our more general model thus assumes that \(\kappa\) is a (finite or infinite) cardinal and that worlds are maximally consistent sets of literals drawn from the atom set \(\{p_i: i \in \kappa \}\). Propositions are now well-formed formulas in a (truth-functionally complete) propositional language whose set of atoms is \(\{p_i: i \in \kappa \}\). We assume that \(\le _{\mathbb {ORD}}\) is a weak linear pre-order on \({\mathbb {P}}(\kappa )\), i.e. that \(\le _{\mathbb {ORD}}\) is reflexive, transitive, and linear, but not necessarily antisymmetric. If \(\le _{\mathbb {ORD}}\) happens to also be antisymmetric, we proceed as in Sect. 3.1, simply replacing \(\le _{\mathbb {LEX}}\) with \(\le _{\mathbb {ORD}}\). In the more general case in which \(\le _{\mathbb {ORD}}\) may not be antisymmetric, having defined \(DIST_{WW}\) as in Sect. 3.1, we define \(DIST_{WP} (w, \alpha )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-least member of

\(\{s \subseteq \kappa : s = DIST_{WW}(w, w^*) \text { for some } w^* \in \alpha \}\).

and \(DIST_{PP} (\alpha , \beta )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-largest element of

\(\{DIST_{WP} (w, \alpha ): w \in \beta \} \cup \{DIST_{WP} (w, \beta ): w \in \alpha \}\)

That \(\alpha\) and \(\beta\) are propositions rather than arbitrary sets of worlds guarantees that there are \({\mathbb {ORD}}\)-least and \({\mathbb {ORD}}\)-largest elements in the respective definitions of \(DIST_{WP}\) and \(DIST_{PP}\).Footnote 16 When \(\le _{\mathbb {ORD}}\) is not antisymmetric, we pick any one from an equivalence class of subsets of \(\kappa\); and if \(\preceq\) happens to be a linear order (and so antisymmetric), then each equivalence class contains exactly one element, in which case we pick its only member.

Another example will help illustrate the more general account. Suppose as in Sect. 3.1 that worlds are represented as maximally consistent sets of \({\mathcal {L}}\)-literals, that the set of atoms of \({\mathcal {L}}\) is once more \(\{p_i: i \in \omega \}\), and that the distance of \(w_1\) to \(w_2\) is the set of atomic indices on which they differ. This time, however, the order on subsets of \(\omega\) is determined by the number of propositions on which worlds differ, the smaller the number the more the two subsets are similar; when this number is the same, there is a tie, represented by \(\sim\) (equivalently the obtaining of both \(\preceq\) and \(\succeq\)). Under this ordering, which we label \(<_{\mathbb {NUM}}\), our illustrative subsets \(\emptyset\), \(\{0\}\), \(\{1\}\), \(\{2\}\), \(\{1, 2\}\), \(\{2, 4\}\) and \(\omega\) of the set \(\omega\) are ordered as follows:

\(\emptyset<_{\mathbb {NUM}} \{0\} \sim _{\mathbb {NUM}} \{1\} \sim _{\mathbb {NUM}} \{2\}<_{\mathbb {NUM}} \{2, 4\} \sim _{\mathbb {NUM}} \{1, 2\} <_{\mathbb {NUM}}\omega\)

It is easy to see that the two-place comparative relation on pairs of propositions induced by the \({\mathbb {NUM}}\)-ordering on \(DIST_{PP}\) is a linear pre-order, but not a linear order. The semantic proximity of any two worlds which differ over the status of \(p_1\) only is, for instance, the same as the semantic proximity of any two worlds which differ over the status of \(p_2\) only.

3.3 An Even More General Comparative Model

The models in Sects. 3.13.2 assumed a four-place similarity relation among worlds. In fact, the relation took a special form: in abstract terms, we assumed the existence of a two-place function DIST on the product of the space of worlds with codomain a linear pre-order, whose order relation we may call \(<_{\mathbb {ORD}}\), such that

\((w_1, w_2) \prec (w_3, w_4) =_{def} DIST(w_1, w_2) <_{\mathbb {ORD}} DIST(w_3, w_4)\),

and similarly for the weak relation \(\preceq\). A general model based on the above lines simply assumes the existence of such a function \(DIST: W^2 \rightarrow {\mathbb {ORD}}\), where W is the space of worlds and \(\langle {\mathbb {ORD}}, <_{\mathbb {ORD}} \rangle\) is a linear pre-order. In this case, we define \(DIST_{WP} (w, \alpha )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-least member of

\(\{s: s = DIST(w, w^*) \text { for some } w^* \in \alpha \}\).

and \(DIST_{PP} (\alpha , \beta )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-largest element of

\(\{DIST_{WP} (w, \alpha ): w \in \beta \} \cup \{DIST_{WP} (w, \beta ): w \in \alpha \}\)

Now \(DIST_{WP}\) and \(DIST_{PP}\) are well-defined only if the relevant sets are guaranteed to have, respectively, an \({\mathbb {ORD}}\)-least member and an \({\mathbb {ORD}}\)-largest element, for all propositions \(\alpha\) and \(\beta\); we may call these combined assumptions \({\mathbb {ORD}}\)-min-and-max. If \({\mathbb {ORD}}\)-min-and-max obtains, the present model is more general than either of those in Sects. 3.13.2, because we need no longer assume that the distance between two worlds is determined by the set of \({\mathcal {L}}\)-atoms on which they differ. Indeed, in that case worlds need no longer be construed as maximally consistent sets of \({\mathcal {L}}\)-literals; they can be anything whatsoever. In general, of course, without some background assumptions, there is no reason to suppose that \({\mathbb {ORD}}\)-min-and-max holds.Footnote 17

It’s worth illustrating this third, most general, model with an example. An interesting use of similarity in metaphysics is the apparently lexicographic explication of similarity among worlds found in Lewis (1979). This account is intended to underpin his possible-worlds account of counterfactuals and thus ultimately of causation. Lewis’s famous four conditions are:

(1) It is of the first importance to avoid big, widespread, diverse violations of law.

(2) It is of the second importance to maximize the spatiotemporal region throughout which perfect match of particular fact prevails.

(3) It is of the third importance to avoid even small, localized, simple violations of law.

(4) It is of little or no importance to secure approximate similarity of particular fact, even in matters that concern us greatly. (Lewis 1979, pp. 47–48)

Here is a natural way of making these remarks more precise. Assume that each of the four quantities in (1)–(4) can be measured; e.g. \(M_1\) may be a linear order such that if a world \(w_1\)’s violations of law are small, narrowly confined, and of the same kind, then \(w_1\)’s measure in the \(M_1\)-dimension is \(m_1\), a smaller element of \(M_1\) than \(m_2\), which represents the amount of \(w_2\)’s violations of law, which are big, widespread and diverse. Similarly for \(M_2\), \(M_3\) and \(M_4\), mutatis mutandis. The order \(\langle {\mathbb {ORD}}, <_{\mathbb {ORD}} \rangle\) is then a lexicographic order on \(M_1 \times M_2 \times M_3 \times M_4\). In other words, \(\langle m_1, m_2, m_3, m_4 \rangle < \langle m^*_1, m^*_2, m^*_3, m^*_4 \rangle\) if \(m_1 < m^*_1\), or if \(m_1 = m^*_1\) and \(m_2 < m^*_2\), etc. Each world is then assigned a quadruple \(DIST(w, w_@) = \langle m_1, m_2, m_3, m_4 \rangle\) that measures w’s divergence from the reference world \(w_@\). If we assume \({\mathbb {ORD}}\)-min-and-max in this context, our definition formally captures Lewis’s comparative similarity relation. (Naturally, there may be other ways of interpreting Lewis’s four conditions.)

Finally, all the material in Sect. 3 may be amended so as to define the three-place ‘\(\alpha\) is more similar to \(\beta\) than \(\gamma\) is’. The ‘distance’ between \(\alpha\) and \(\beta\) can be defined as above in terms of the atomic indices worlds in these sets differ over, and this ‘distance’ may then be compared to that between \(\beta\) and \(\gamma\). We have opted for a four-place relation here not out of any strong convinction, but mainly because as we saw in Sect. 1, we intuitively grasp four-place comparative similarity facts of the form ‘\(\alpha\) is more similar to \(\beta\) than \(\gamma\) is to \(\delta\)’, and from this four-place relation the three-place one is immediately definable. The literature also contains arguments that the four-place relation is methodologically preferable to the three-place one.Footnote 18 Whether or not our treatment based on a four-place relation is best recast in terms of a three-place relation is a question we will not further address. We note merely that nothing in Sect. 3 respectively precludes or prejudges a positive answer to it.

This completes the comparative model’s exposition. We now turn to some philosophical remarks.

4 Philosophical Commentary

In the previous section, we proposed an analysis of the four-place relation expressed by ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. As well as propositional similarity, the account potentially has other applications, for example to the question of higher-order resemblances amongst properties. We saw in Sect. 1 that comparative similarity facts between properties can give rise to comparative similarity facts between propositions: our example turned on crinsom being more like scarlet than blue is to green. One could explore the Sect. 3 ideas to ground an account of comparative similarity, replacing worlds with individuals and propositions with properties. That way, higher-order resemblance between properties could be reduced to ordinary resemblances between particulars.

To keep this paper to manageable length, however, we cannot here embark on this or any other applications. Nor is a detailed comparison with other accounts feasible, simply because there are too many of them. In lieu of that, Sect. 4.1 concisely situates the account vis-à-vis the verisimilitude literature. In Sect. 4.2, we explain why our comparative models are preferable to metric approaches. In Sect. 4.3, we consider another constraint that might be imposed on the models. Section 4.4 concludes with a brief explanation of why we take world similarity as primitive.

4.1 Generic Problems with Likeness Accounts

Approaches to truthlikenessFootnote 19 are standardly divided into three categories: the content, consequence and likeness approach. Ours is akin to the last of these, the likeness approach. The focus of these approaches is on defining what it is for X to be at least as close to the truth as Y, where X and Y are variously defined as propositions or theories or models, and ‘the truth’ may be a proposition or world or model. Hilpinen (1976) and Tichý (1974) are pioneers of the approach and Niiniluoto (1987) a classic synthesis advocating a possible-words likeness approach.Footnote 20

Now our topic is propositional similarity rather than verisimilitude, so our aims and these writers’ are correspondingly different. Many of the constraints on acceptable measures of verisimilitude, for instance, mention truth and falsity (e.g. ‘some false statements may be more truthlike than some true statements’).Footnote 21 These cannot sensibly be imposed on an account of propositional similarity, which must be independent of which world is the actual one: for which world is actual makes no difference to whether \(s_1\) is more similar to \(s_2\) than \(s_3\) is to \(s_4\). By the same token, accounts which make essential use of the class of true models, or correct consequences, or true theories, or the like, are different in both spirit and implementation from ours. The verisimilitude literature does contain discussion of how to assign a measure to pairs of sets of worlds (or models).Footnote 22 However, these treatments generally presuppose that the distance between such sets is represented by a scalar quantity, typically a real number, and for that reason differ significantly from our non-metric approach.

All that said, despite the differences, the present proposal does have several points of affinity with likeness accounts and some comparisons may be drawn.Footnote 23 Rather than attempt to review the large literature, we situate the account by seeing how it deals with the main generic difficulties associated with such approaches. Oddie (2014, sec. 1.4) usefully summarises these as:Footnote 24 how is likeness determined? What is the correct functional dependence of the truthlikeness of a proposition on the likeness of its members to the actual word (the extension problem)? Finally, does the account of likeness depend on which language is used to formulate propositions (the problem of language dependence)?

In answer to the first question, likeness in the Sect. 3 models is constituted by agreement over propositional atoms and unlikeness by divergence over them. Agreement over which atoms or collection thereof counts most is left completely open, as it is determined by the order relation (Sect. 3.3’s \(<_{\mathbb {ORD}}\)), which can be anything we like. Our account is thus inherently flexible. Furthermore, unlike metric accounts, which are constrained by the size of the real numbers (the continuum) and their metric properties (e.g. reals are separable, so admit only countable ascending or descending chains), our model allows similarity to be defined on propositional languages of arbitrary cardinality. (See Sect. 4.2 for more on this point.)

Moving on to the second question, functional dependence in our model is given by the measure \(DIST_{PP}\). We do not claim that this measure of ‘distance’ is uniquely motivated. It is, however, a natural one in our metricless setting. As the literature attests, likeness approaches based on metrics have to make an arbitrary choice from a wide range of functions that map collections of distances between worlds into a single distance between sets of worlds. Take Tichý’s famous (1974) toy example, in which there are three atomic propositions h (‘it’s hot’), r (‘it’s rainy’) and w (‘it’s windy’), and in which the state of the world is \(h \wedge r \wedge w\). The distance between conjunctions of literals drawn from these three atoms is then given by the taxicab metric, so that e.g. the distance between \(h \wedge r \wedge w\) and \(h \wedge \lnot r \wedge w\) is 1 because these two worlds disagree only on the truth-value of a single atom, namely r. Suppose now that we wish to assign a distance from the set of h-worlds to the set of r-worlds. There are four of each, and 15 distances to aggregate into a single output that represents the distance between h and r.Footnote 25 Any from a wide range of such aggregating functions is compatible with intuitive constraints. More generally, as the verisimilitude literature demonstrates, in trying to define the distance from a world w to a set of worlds \(\alpha\), given a distance function on worlds, one may use a wide array of measures: the average distance between w and the elements of \(\alpha\), or the infimum of such distances, or the supremum, or some weighted average of the last two. The choices are endless, and have familiar pros and cons. As Schurz and Weingartner remark, extant approaches have not successfully solved the problem of extending truthlikeness of worlds to truthlikeness of propositions because this extrapolation is ‘intuitively underdetermined’ (2010, p. 423). In our non-metric setting, far fewer technical options are available—because there are no real values for measures to take as inputs. In fact, we know of no other non-metric account to rival that presented in Sect. 3.

By not assuming distances between worlds, we avoid the extension problem in its starkest form. The distance between propositions is simply a function of the atoms over which their constituents differ, which function exactly depending on the order. In a sense, our account does not so much avoid the problem as openly embrace it. Propositional similarity is ultimately grounded in atomic difference; how these atomic differences are then weighed entirely depends on the order (\(\le _{\mathbb {ORD}}\) in the most general setting).

Finally—in answer to the third question—our account is not problematically language-sensitive. Choosing similarity respects and an order on sets of these respects fixes the propositional similarity facts. As an illustration, suppose we let \(q_0\) be \(p_0\), \(q_1\) be \(p_0 \vee p_1\) and more generally define \(q_n\) as \(p_0 \vee p_1 \vee \cdots \vee p_n\). Clearly, the \(p_i\) and the \(q_i\) are interdefinable using Boolean operations. If the similarity respects are the \(p_i\), as in the lexicographic account say, whether we express propositions using the \(q_i\) or the \(p_i\) makes no difference to their similarity. The similarity between \(q_0\) and \(q_1\) is the same whether we express these as \(q_0\) and \(q_1\) or alternatively as \(p_0\) and \(p_0 \vee p_1\). One might contend that we have avoided language dependence only by fixing similarity respects at the outset. But it is hard to see how any similarity judgements could get off the ground without privileging some similarity respects over others, since everything is identical to anything else in infinitely many different ways.

This, at any rate, is a sketch of how our account fares with respect to the main charges brought against likeness accounts. To further motivate the avoidance of metric assumptions, we now explain why the very move from a non-metric to a metric account saddles us with problems of its own making.

4.2 Metric and Pseudometric Accounts

Our account in Sect. 3 avoided any metric assumptions. Those who are familiar with the Hausdorff metric from real analysis will recognise, however, that our comparative similarity relation is Hausdorff-like. It is a sort of non-metric implementation of the idea behind the Hausdorff metric. A comparison of our non-metric approach with this metric will help bring out the former’s virtues. To keep this paper self-contained, we include a definition and exposition of the Hausdorff metric; but since it would obtrude too much if laid out here, we relegate the material to Appendix 1.4.

On a thumbnail (Appendix 1.4 has the full details), the Hausdorff distance \({\mathbb {D}}\) between two sets of worlds \(\alpha\) and \(\beta\) measures the largest shortest distance between a world in one set and the other set. From a metric d on the space of worlds W we may thereby define a pseudometric \({\mathbb {D}}\) (see Appendix 1.4) on the set of non-contradictory propositions construed as worlds. We may exploit a pseudometric \({\mathbb {D}}\) to define a four-place comparative similarity relation \(\preceq _{{\mathbb {D}}}\) on propositions as follows:

\((\alpha , \beta ) \preceq (\gamma , \delta ) =_{def}{\mathbb {D}}(\alpha , \beta ) \le {\mathbb {D}}(\gamma , \delta )\).

We thereby recover a four-place comparative similarity relation, this time based on metric assumptions. As Appendix 1.4 explains, if propositions are closed and bounded subsets of W then \({\mathbb {D}}\) turns out to be a metric on all propositions save the contradiction. But even if they are not, there’s no harm done; for our purposes, that \({\mathbb {D}}\) is a pseudometric is enough.Footnote 26

We note that the Hausdorff metric \({\mathbb {D}}\) has received a bad press in the verisimilitude literature. Niiniluoto (1987, p. 245) for instance points out that in the special case in which \(\beta = \{w\}\), \({\mathbb {D}}(\alpha , \{w\})\) is the supremum of the distances from w to \(\alpha\), so that only the maximum distance (supremum) between a world \(\alpha\) and the set \(\{w\}\) is taken into account. This is an unwelcome consequence for a measure of world-to-proposition distance. But as a criticism of a proposition-to-proposition measure, it draws a blank, since the metric takes into account all worlds in \(\alpha\) and \(\beta\), and in our setting no propositions correspond to a single world.

We provide four reasons for preferring our non-metric comparative model in Sect. 3 to the metric/pseudometric model one suggested in this subsection. Some of these reasons generalise to objections against any metric approach to propositional similarity.

1. In the general case, the pseudometric \({\mathbb {D}}\) may not be a metric. If it isn’t, \({\mathbb {D}}\) will elide the difference between two sets of worlds being identical and being as close to one another as possible. For example, it is natural to suppose that a proposition such as ‘John is no more than 2 m tall’ is the closure in the space of worlds of the proposition ‘John is less than 2 m tall’. Now under any sensible notion of set distance in a metric space, the distance of any set from its closure should always be 0. So the \({\mathbb {D}}\)-difference between these propositions should be 0, and will be 0 if \({\mathbb {D}}\) is the Hausdorff pseudometric, despite the propositions’ distinctness. Consequently, on this account the two propositions are just as similar to one another as the first is to itself. But this is the wrong result: ‘John is no more than 2 m tall’ and ‘John is less than 2 m tall’ may be infinitesimally close to one another, as we might put it, but they are less similar to one another than either of them is to itself. There is a difference between zero and infinitesimal proximity, a difference which the pseudometric approach obliterates.Footnote 27 Our Sect. 3 models, however, can respect this difference.

2. Second, the pseudometric \({\mathbb {D}}\) is highly sensitive to the underlying world metric d. It is easy to come up with examples in which \((X, \tau )\) is a topological space, \(d_1\) and \(d_2\) are metrics compatible with the topology \(\tau\) on X, and yet there are points xyz and w of X such that \(d_1(x, y) < d_1 (z, w)\) and \(d_2(x, y) > d_2 (z, w)\), i.e. \(d_1\) and \(d_2\) disagree on four-place comparative facts of the form ‘x is closer to y than z is to w’. In the same way, the four-place relation \(\prec _{{\mathbb {D}}}\) derived from \({\mathbb {D}}\) depends sensitively on the world metric d.Footnote 28

Metrics \(d_1\) and \(d_2\) may thus be very similar yet generate respective pseudometrics \({\mathbb {D}}_1\) and \({\mathbb {D}}_2\) that disagree on four-place comparative propositional similarity facts. The second problem for metric/pseudometric models, then, stems from the fact that data about world similarity is hardly ever quantitative. Looking back at the motivating examples in Sect. 1 we see that the facts about propositional similarity encountered there were all comparative. They were facts of the form ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’, or three-placed ones of the form, ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is’, rather than quantitative facts of the form ‘the distance in meaning between \(s_1\) and \(s_2\) is twice that between \(s_3\) and \(s_4\)’. Now there may be cases of this kind in which we have metric information; but they will be rare. To assume a metric on worlds and use it to derive a pseudo-metric on propositions is therefore to impose a spurious precision on the subject matter, at least for most applications of interest. Propositional metrics depend overly sensitively on world metrics.

3. The third problem with pseudometric models is the cardinality constraint they impose. Thought of as a binary relation on pairs of worlds, the derived relation \(\prec _{{\mathbb {D}}}\) cannot, for example, admit uncountable-ascending or uncountable-descending chains, since the real numbers admit no such chains.Footnote 29 But we would like to allow such possibilities rather than rule them out through the model’s formal features. We may, for example, wish to allow for uncountably many similarity respects.Footnote 30 The Sect. 3 models comfortably handle arbitrarily large numbers of worlds or similarity respects.

4. Our comparative models are defined for all propositions other than the contradiction. To plug this gap, one fix would be to set \(DIST_{PP} (\alpha , \bot )\) to be some subset of indices, when \(\bot\) is any contradictory \({\mathcal {L}}\)-sentence. Setting \(DIST_{PP} (\bot , \bot ) = \emptyset\) is forced if we wish to respect the condition that a proposition is no further in meaning to itself than any two propositions are to one another.Footnote 31 A natural choice for \(DIST_{PP} (\alpha , \bot )\) when \(\alpha\) is not contradictory is the full set of indices, so that \(DIST_{PP} (\alpha , \bot ) = \omega\) when this index is \(\omega\) (as in Sect. 3.1’s lexicographic example) and \(\alpha\) is not a contradiction. This latter clause captures the thought that, as a classical logician would put it, a contradiction disagrees with any other sentence about the value of all atomic propositions, since it takes all of them to be both true and false. Naturally, from a relevance logic perspective (which there is no room to explore), a different choice might be made here.

Contrast this with a pseudometric model, for which there is no technical fix. For example, we cannot just set the \({\mathbb {D}}\)-distance from a contradiction to any other proposition to 0 on pain of violating the triangle inequality for \({\mathbb {D}}\).Footnote 32 In a metric space, the Hausdorff distance \({\mathbb {D}}(S, F)\) between a fixed closed, bounded, non-empty subset F and a closed, bounded and non-empty subset S which tends towards the empty set will not in general tend to a particular limit.Footnote 33 So unlike the Sect. 3 models, the pseudometric model seems essentially incomplete. The wisdom of adopting a non-metric account of propositional similarity is once more apparent.

For all the reasons just outlined, we take our comparative models to be superior to metric accounts. Of course, when there is metric information relating worlds to worlds, one might essay a metric account of propositional similarity. But such cases tend to be the exception rather than the norm.

4.3 The Negation Constraint

Here is a natural-sounding constraint one might imagine applies to propositional similarity, truth-conditionally understood: the degree of similarity between \(\alpha\) and \(\beta\) should match that between \(\lnot \alpha\) and \(\lnot \beta\). Call this the negation constraint.

There are two readings of the negation constraint, depending on how the word ‘match’ is understood. The stronger reading construes match as identity, i.e. insists that the degree of similarity of \(\alpha\) and \(\beta\) equal that of \(\lnot \alpha\) and \(\lnot \beta\). The weaker reading does not insist on identity, but requires that the respective degrees be close. Though the weaker reading is vague, Sect. 3’s comparative models respect it, since \(DIST_{PP}(\alpha , \beta )\) is in general close in the ordering to \(DIST_{PP}(\lnot \alpha , \lnot \beta )\). Of course, how close will depend on \({\mathbb {ORD}}\); but the fact that \(DIST_{PP}(\alpha , \beta )\) is a subset of the indices of atoms in \(\alpha\) and \(\beta\) sets an upper bound on the difference between them. And when \(\alpha\) and \(\beta\) are literals, as well as in certain other special cases, \(DIST_{PP}(\alpha , \beta )\) simply equals \(DIST_{PP}(\lnot \alpha , \lnot \beta )\).

The constraint’s stronger reading is that \(DIST_{PP}(\alpha , \beta ) = DIST_{PP}(\lnot \alpha , \lnot \beta )\) for all \(\alpha\) and \(\beta\). This constraint is not respected by our Sect. 3 models. For example, as we saw in connection with Sect. 3.1’s lexicographic model, \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2) = \{1, 2\}\); yet

\(DIST_{PP} (\lnot (p_0 \vee p_1), \lnot (p_1 \wedge \lnot p_2)) = DIST_{PP} (\lnot p_0 \wedge \lnot p_1, \lnot p_1 \vee p_2) = \{0, 1\}\),

as the reader may verify. In general, whether \(DIST_{PP}(\alpha , \beta ) = DIST_{PP}(\lnot \alpha , \lnot \beta )\) will depend on \(\alpha\) and \(\beta\) as well as the underlying order \({\mathbb {ORD}}\).

One way to motivate the stronger constraint is by thinking of propositions as functions from worlds to truth-values. Models in which degrees of similarity are invariant under the permutation of truth-values are formally attractive. When it comes to applications to possible worlds, however, truth-values may not be permutable. Truth-value symmetry may be desirable in a purely mathematical setting; but as soon as intended interpretations—applied models—are in play, any reason to demand such symmetry fades away.

In fact, the constraint’s stronger reading is problematic. Propositions \(\alpha\) and \(\beta\) may be contradictory while their negations are jointly consistent. For example, if we let \(\alpha = p \wedge q\) and \(\beta = \lnot p\), then \(\alpha\) and \(\beta\) are inconsistent; but \(\lnot \alpha\) and \(\lnot \beta\) are respectively equivalent to \(\lnot p \vee \lnot q\) and p, which are jointly consistent. The motivation for our models also clashes directly with the negation constraint’s stronger reading. Using the lexicographic model as an illustration, suppose \(s_1\) is \(p_0 \wedge p_1\) and \(s_2\) is \(\lnot p_0 \wedge p_2\). Sentence \(s_1\) is very dissimilar to \(s_2\), as it differs from it along the most important dimension of similarity; \(DIST_{PP} (p_0 \wedge p_1), \lnot p_0 \wedge p_2) = \max (\{0, 1\}, \{0, 2\}) = \{0, 1\}\). However, not-\(s_1\)’s similarity to not-\(s_2\) is greater than \(s_1\)’s to \(s_2\), the reason being that not-\(s_1\) and not-\(s_2\) are compatible along the most important similarity dimension. For any way the world has to be for not-\(s_1\) to be true there is a way the world is in which not-\(s_2\) is true that is more similar to this first world than any \(s_1\)-world is to any \(s_2\)-world. More succinctly, \(s_1\) and \(s_2\) are constrained to differ along the most important dimension of similarity, whereas not-\(s_1\) and not-\(s_2\) are not; from a lexicographic perspective, that means that not-\(s_1\) is more similar to not-\(s_2\) than \(s_1\) is to \(s_2\). This is precisely the result our comparative models deliver.

In sum, our comparative models respect the negation constraint’s weaker reading. They do not respect the stronger reading, for good reason.

4.4 More About Similarity?

Finally, a few words about taking similarity as primitive. Our models assumed, rather than derived, similarity respects between different worlds; these respects are ‘exogenous’ to the models. To think this problematic—an omission—is to misunderstand our aims. The models derived a four-place similarity relation from a four-place comparative similarity relation on worlds. There is no reason to suppose that the relevant similarity respects will be the same for all applications of interest. As David Lewis once noted,Footnote 34 it seems unlikely that the same ordering of respects of comparison will underlie facts about verisimilitude as well as facts about counterfactuals. Various notions of similarity vary from context to context, which is why our models allowed different orderings. Moreover, all talk of similarity of worlds is neutral between similarity in a particular respect (which itself perhaps aggregates sub-respects of similarity) or overall similarity; either is compatible with our models.Footnote 35

5 Conclusion

The task this paper set itself was to derive comparative propositional similarity facts from world similarity facts. We sought to avoid a metric approach, common in the literature, but unrealistic in many contexts. The models presented in Sect. 3 did just that. They help make formal sense of the notion, central to philosophy, of propositional similarity in the usual case where there is no metric to be had.Footnote 36