Non-metric Propositional Similarity

Paseau, A. C.

doi:10.1007/s10670-020-00303-7

Non-metric Propositional Similarity

Original Research
Open access
Published: 14 August 2020

Volume 87, pages 2307–2328, (2022)
Cite this article

Download PDF

You have full access to this open access article

Erkenntnis Aims and scope Submit manuscript

Non-metric Propositional Similarity

Download PDF

A. C. Paseau¹

3113 Accesses
Explore all metrics

Abstract

The idea that sentences can be closer or further apart in meaning is highly intuitive. Not only that, it is also a pillar of logic, semantic theory and the philosophy of science, and follows from other commitments about similarity. The present paper proposes a novel way of comparing the ‘distance’ between two pairs of propositions. We define ‘\(p_1\) is closer in meaning to \(p_2\) than \(p_3\) is to \(p_4\)’ and thereby give a precise account of comparative propositional similarity facts. Notably, our definition eschews metric assumptions, which are unrealistic in most applications of interest.

Dimensions of Semantic Similarity

Comparison of Methods to Assess Similarity between Phrases

Similarity Measures Between Arguments Revisited

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

‘Penelope likes eggs for brunch’ is closer in meaning to ‘Penelope likes eggs for lunch’ than ‘Cats mew’ is to ‘I love New York’. Likewise, ‘Tom is 16 years old’ is semantically closer to ‘Tom is 17 years old’ than ‘Dick is 78 years old’ is to ‘Harry is 57 years old’. And ‘Thus passes the glory of the world’ is closer in meaning to the Latin ‘Sic transit gloria mundi’ than the Mel Brooks-inspired alternative translation ‘Gloria is sick’. Such examples suggest that some statements are closer in meaning than others.

Logic, or more precisely formalisation, also relies on the idea that statements are closer or further apart in meaning in this same sense. Among the mostly implicit criteria we appeal to when formalising is that of semantic proximity. This criterion enjoins us to formalise the natural-language sentence s as a formal sentence \(\sigma\) if \(\sigma\) may be interpreted so as to be as close in meaning to s as possible.^{Footnote 1} For example, \(\exists x Fx\) is on this criterion a better propositional formalisation of \(s =\)‘Someone is French’ than \(\forall x Fx\) is. The reason is that some interpretation of \(\exists x Fx\) is closer in meaning to s than any interpretation of \(\forall x Fx\).

A second philosophical application of the idea that there can be greater or lesser semantic distance between two propositions comes from the philosophy of language.^{Footnote 2} Some theories of meaning formalise natural-language sentences in some preferred formal language(s) and provide satisfaction- or truth-conditions for the formal vocabulary.^{Footnote 3} These theories’ success turns partly on whether the resulting sentences’ satisfaction- or truth-conditions are sufficiently close in meaning to those of the original sentences.

A third connection is with the philosophy of science. Ever since Popper (1963, ch. 10), philosophers of science have studied formal models of verisimilitude in an attempt to model the intuitive idea that some theories or propositions are closer to the truth than others. Take the (somewhat hackneyed) example of the number of planets:^{Footnote 4} the statement that there are 9 of them is closer to the truth than the statement that there are 9 billion of them. The claim ‘\(s_1\) is more verisimilar than \(s_2\)’ is the special case of the claim ‘\(s_1\) is closer to \(s_3\) than \(s_2\) is to \(s_3\)’ in which \(s_3\) is the truth of the matter.

The existence of propositional similarity facts also follows from another consideration. Sharing identical or like properties makes for similarity. For example, a crimson flag is more similar to a scarlet flag than a blue flag is to a green flag, other things being equal. As a consequence, the sentence ‘the flag is crimson’ is truth-conditinally more similar to ‘the flag is scarlet’ than ‘the flag is blue’ is to ‘the flag is green’. The way the world has to be for the first sentence to be true is more similar to how it has to be for the second to be true than the way the world has to be for the third to be true is to how it has to be for the fourth to be true. Similarity of properties or objects thus makes for propositional similarity.

In short, the idea that sentences can be closer or further apart in meaning is highly intuitive. Moreover, it is a pillar of logic, semantic theory and the philosophy of science, and a consequence of other commitments regarding similarity. Our aim here is to define the relation expressed by ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. Although the notion of verisimilitude has been extensively studied (Sect. 4 contains some references), the proposal presented here for comparing the ‘distance’ between two pairs of propositions is novel. Previous discussion in the literature proceeds by assuming an underlying metric that captures the distance between worlds. The present article, in contrast, eschews metric assumptions, which are unrealistic in all but a few applications of interest. We do not, for example, assume that there is a numerical distance between our world and the world^{Footnote 5} most like ours in which Hillary Clinton won the 2016 US presidential election.

A point worth clarifying at the outset is that there are several notions of similarity among natural-language sentences. The sentence ‘Hillary Clinton is well’ is in a sense semantically very close to ‘Hillary Clinton is unwell’: both sentences assert something about a particular person’s health, as opposed to say ‘\(e^{i\pi } + 1 =0\)’ which has an entirely different subject matter.^{Footnote 6} Yet truth-conditionally, ‘Hillary Clinton is well’ and ‘Hillary Clinton is unwell’ are far from close. Likewise, the tautology ‘It’s raining or it’s not raining’ is similar to the contradiction ‘It’s raining and it’s not raining’ in terms of both linguistic expression and subject matter; yet as far as semantic value is concerned, the two sentences could not be further apart. In this paper, we focus on a notion of similarity among propositions based on similarity neither of linguistic expression nor of subject matter but of truth-conditions. The examples thus far have hopefully given readers a sense of the target notion, which it will now be our business to explicate and further clarify.

2 Preliminaries

We wish to account for comparative propositional similarity facts expressed by sentences of the form ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. Our approach will be to derive such facts from similarity facts about worlds, avoiding any metric assumptions (for reasons that will emerge in Sect. 4). Section 3 presents the models and Sect. 4 comments on them. Before that, a few preliminaries about the framework are in order.

It will be useful to model worlds as sets of propositions and propositions as sets of worlds. We construe worlds as maximally consistent sets of literals (atoms/sentence letters or negations thereof) from a fixed language. This propositional language, which we call \({\mathcal {L}}\), has \(\{p_0, p_1 \ldots , p_n, \ldots \}\) as its set of atoms/sentence letters, is truth-functionally complete and has standard formation rules.^{Footnote 7} For example, under this construal

\(w^* = \{p_i:\) is even \(\} \cup \{\lnot p_i:\) is odd \(\} = \{p_0, \lnot p_1, p_2, \lnot p_3, \ldots \}\)

is a world.

Construing worlds as sets of propositions is a matter of formal convenience only: we do not assume that worlds really are sets of propositions. The assumption is only that, for certain purposes, worlds may be modelled as sets of propositions. To put it another way, we are interested in worlds only in as much as they do or don’t share some salient properties, expressed by the propositions \(p_0\), \(p_1\), \(p_2\), .... That is, when presented with a world w, we ask whether it satisfies \(p_0\), \(p_1\), \(p_2\), ...; the answer to these questions is all the information about w we are interested in.^{Footnote 8} Worlds in our models are thus really equivalence classes of worlds in which the given literals hold. The set of worlds may be thought of as the set (or class) of all possible worlds quotiented by \({\mathcal {L}}\)-equivalence, where \({\mathcal {L}}\) is the propositional language we are interested in. Two worlds are in the same \({\mathcal {L}}\)-equivalence class just when they agree on the truth-values of \({\mathcal {L}}\)-atoms, i.e. similarity respects of interest.

Our models also construe propositions as sets of possible worlds. Thus \(p_3 \vee p_4 = \{w: p_3 \vee p_4\) is true at \(w\}\) is a proposition, and if \(w^* = \{p_i:\) i is even \(\} \cup \{\lnot p_i:\) i is odd \(\}\) as above, \(w^* \in p_3 \vee p_4\), since \(p_4\) (and hence \(p_3 \vee p_4\)) is true at \(w^*\). Modelling propositions as sets of possible worlds is a familiar technique in analytic philosophy, linguistics and logic. As above, we are not really assuming that propositions really are sets of worlds. We could, if we were less sparing with notation, mark the relation of a proposition holding at a world with a notational primitive. But it will be simpler to construe worlds as sets of propositions, as long as we keep in mind what this means.^{Footnote 9}

So: we use the identity sign when representing propositions as sets of worlds and worlds as sets of propositions for brevity, not out of philosophical commitment. Naturally, modelling propositions as sets of worlds is adequate for certain purposes but too coarse-grained for others. For example, modelling all mathematical, necessary and logical truths as the same proposition entails identifying them all. However, these limitations are as familiar as they are general. They no more threaten our comparative model(s) than they do similar models in metaphysics, the philosophy of language, etc. Finally, though natural, we need not think of the worlds as metaphysically possible; we may equally think of them as epistemically possible, or in some other way. Our models are neutral between various such interpretations.

3 Comparative Models

To introduce our comparative models, recall two canonical versions of counterfactual logic, due to Stalnaker (1968) and Lewis (1973) respectively. To spell out the truth-conditions of ‘If A were the case then B would be the case’, Stalnaker and Lewis assume the existence of a three-place comparative similarity relation among worlds. This may be glossed as ‘\(w_2\) is more similar to \(w_1\) than \(w_3\) is to \(w_1\), as judged from the perspective of \(w_1\)’. Stalnaker and Lewis also make further, slightly different,^{Footnote 10} assumptions on worlds, and base their semantics for propositional counterfactual logic (which need not detain us here) on the metaphysical assumption that worlds stand in the similarity relation just specified. Note that the clause ‘as judged from the perspective of \(w_1\)’ indicates which standards of similarity determine the relative proximity of \(w_2\) and \(w_3\) to \(w_1\); but if we prescind from applications in counterfactual logic, there is no reason to suppose that these standards are somehow fixed by \(w_1\), the world of comparison. Taking the standards of similarity as independently fixed for now (i.e. fixed in a way we are not concerned with), we symbolise a three-place comparative similarity relation between worlds as

\((w_2, w_1) < (w_3, w_1)\),

read as ‘\(w_2\) is more similar to \(w_1\) than \(w_3\) is to \(w_1\)’. This three-place relation can be recaptured from the four-place relation

\((w_2, w_1) < (w_3, w_4)\),

read as ‘\(w_2\) is closer to \(w_1\) than \(w_3\) is to \(w_4\)’ by setting \(w_4 = w_1\).

Our comparative models exploit the existence of a four-place comparative similarity relation among worlds to define a four-place comparative similarity relation among sets of worlds or propositions. The latter may be written as:

\((\alpha , \beta ) < (\gamma , \delta )\),

read as ‘proposition \(\alpha\) is more similar to proposition \(\beta\) than proposition \(\gamma\) is to proposition \(\delta\)’, relative to some fixed standards of similarity. For expository clarity, we present a special case of the model in Sect. 3.1 before generalising it in Sects. 3.2–3.3.

3.1 A special case: the lexicographic model

Suppose there is a set of similarity respects, the first of which makes for greatest similarity, the second of which is next-most important, and so on. As always, we may think of these respects as the \({\mathcal {L}}\)-propositions \(p_0, p_1, \ldots\) (renumbering if necessary). Thus suppose we are interested in the outcome of a Formula 1 race, and we base our similarity comparisons among worlds first and foremost on who wins the race, second, on who comes second, and so on. In that case, world \(w_1\) is more similar than \(w_2\) is to the actual world w iff the highest-placed divergence between the drivers’ final positions in \(w_1\) compared to their final positions in the actual world is higher than the highest-placed divergence between the drivers’ final positions in \(w_2\) compared to their actual final positions. In the general case, \(p_0\) is the similarity respect that matters most, \(p_1\) matters second-most, etc.^{Footnote 11} Let’s see how to make formal sense of this idea.

As above, we take a world to be a maximally consistent set of \({\mathcal {L}}\)-literals, and an \({\mathcal {L}}\)-proposition to be the set of worlds at which it is true. (We assume in this subsection that the \({\mathcal {L}}\)-atoms \(p_0, p_1, \ldots\) are countably infinite.) We also order the subsets of the non-negative integers \(\omega = \{0, 1, \ldots , n, \cdots \}\) lexicographically, using the standard ordering on \(\omega\). More precisely, a set \(s_1 \subseteq \omega\) is smaller than \(s_2 \subseteq \omega\) if the least element of \(s_1\) is larger than the least element of \(s_2\) or \(s_1\) has no least element (i.e. \(s_1\) is empty) but \(s_2\) does (i.e. \(s_2\) is non-empty); or, if the least elements of \(s_1\) and \(s_2\) both exist and are equal, the second-least element of \(s_1\) is larger than the second-least element of \(s_2\) or \(s_1\) has no second-least element but \(s_2\) does; and so on. For example, under this ordering, which we label \({\mathbb {LEX}}\), the seven subsets \(\emptyset\), \(\{0\}\), \(\{1\}\), \(\{2\}\), \(\{1, 2\}\), \(\{2, 4\}\) and \(\omega\) of \(\omega\) are ordered as follows

\(\emptyset<_{\mathbb {LEX}} \{2\}<_{\mathbb {LEX}} \{2, 4\}<_{\mathbb {LEX}} \{1\}<_{\mathbb {LEX}} \{1, 2\}<_{\mathbb {LEX}} \{0\} <_{\mathbb {LEX}}\omega\)

Now if \(w_i\) and \(w_j\) are worlds—maximally consistent sets of \({\mathcal {L}}\)-literals—we set

\(DIST_{WW} (w_i, w_j) =_{def} \{n \in \omega : p_n \notin w_i \cap w_j \vee \lnot p_n \notin w_i \cap w_j \}\).

In other words, \(DIST_{WW} (w_i, w_j)\) is the set of indices n such that \(w_i\) and \(w_j\) disagree on \(p_n\)’s truth-value. Here as elsewhere we are implicitly thinking of \(l \in w\) as l’s being true in w, for l a literal. Thinking also of \(\{p_0, p_1 \ldots , p_n, \ldots \}\) as the (labelled) set of comparison respects in descending order of importance, the higher \(DIST_{WW}(w_i, w_j)\) is in the \({\mathbb {LEX}}\)-ordering, the more \(w_i\) and \(w_j\) diverge; conversely, the lower \(DIST_{WW}(w_i, w_j)\) is in the \({\mathbb {LEX}}\)-ordering, the more the two worlds are similar. Note that if \(w_i = w_j\), then \(DIST_{WW} (w_i, w_j) = \emptyset\), the least element in the \({\mathbb {LEX}}\)-ordering; this is as it should be, since \(w_i\) and \(w_j\) are then as similar as possible since they’re identical. \(DIST_{WW}\) is thus a qualitative (rather than quantitative) measure of the ‘distance’ between worlds.^{Footnote 12}

Now let \(\alpha\) be a (non-contradictory) \({\mathcal {L}}\)-proposition, i.e. a (non-empty) set of worlds. We let

\(DIST_{WP} (w, \alpha ) =_{def}\) the \({\mathbb {LEX}}\)-least element of the set \(\{DIST_{WW}(w, w^*): w^* \in \alpha \}\).

The function \(DIST_{WP}\) is thus a measure of the ‘distance’ of a world to a set of worlds corresponding to a proposition.^{Footnote 13} The subscript ‘WP’ in ‘\(DIST_{WP}\)’ indicates that the distance in question is that of a world to a proposition; similarly, mutatis mutandis, for \(DIST_{WW}\) as well as for \(DIST_{PP}\) (to follow). According to \(DIST_{WP}\), then, w’s distance to \(\alpha\) is its limiting closest distance to an element of \(\alpha\), a limit that is always attained. In particular, if w is a member of \(\alpha\) then its distance is as small as possible (in the \({\mathbb {LEX}}\)-ordering), since it is the empty set. If \(\alpha\) were an arbitrary set of worlds, \(DIST_{WP} (w, \alpha )\) would not necessarily have a \({\mathbb {LEX}}\)-least element; but since \(\alpha\) is an \({\mathcal {L}}\)-proposition, it is guaranteed to have one. The proof is in Appendix 1.1.

Now let \(\alpha\) and \(\beta\) both be (non-contradictory) \({\mathcal {L}}\)-propositions. We set

\(DIST_{PP} (\alpha , \beta ) =_{def}\) the \({\mathbb {LEX}}\)-largest element of the set \(\{DIST_{WP} (w, \alpha ): w \in \beta \} \cup \{DIST_{WP} (w, \beta ): w \in \alpha \}\)

As above, if \(\alpha\) and \(\beta\) were arbitrary sets of worlds, \(DIST_{WP} (\alpha , \beta )\) would not necessarily have a \({\mathbb {LEX}}\)-largest element; but since they are \({\mathcal {L}}\)-propositions, it is guaranteed to have one. The proof may be found in Appendix 1.2.

Let’s illustrate our definitions thus far with an example. Consider the \({\mathcal {L}}\)-propositions \(p_0 \vee p_1\) and \(p_1 \wedge \lnot p_2\), construed as sets of worlds, which are in turn construed as maximal consistent sets of \({\mathcal {L}}\)-literals. Since any world in \(p_1 \wedge \lnot p_2\) is in \(p_0 \vee p_1\), \(DIST_{WP} (w, p_0 \vee p_1) = \emptyset\) for any world w in \(p_1 \wedge \lnot p_2\). Let us next consider how far a world w in \(p_0 \vee p_1\) may be from \(p_1 \wedge \lnot p_2\). By considering the value of \(p_0\) in w, we see that in order to minimise w’s proximity to \(p_1 \wedge \lnot p_2\) (equivalently: maximise its distance from \(p_1 \wedge \lnot p_2\)), the literals \(p_0\), \(\lnot p_1\) and \(p_2\) must all be true in w; but for any \(p_i\) with \(i \ge 3\), there is a world in \(p_1 \wedge \lnot p_2\) that \(p_i\)-wise matches any given world w in \(p_0 \vee p_1\). Thus for example, if \(w = \{p_0, \lnot p_1, p_2\} \cup \{p_i: i \ge 3\}\) then the nearest world to w in \(p_1 \wedge \lnot p_2\) is the world \(\{p_0, p_1, \lnot p_2\} \cup \{p_i: i \ge 3\}\). Hence for this w, \(DIST_{WP} (w, p_1 \wedge \lnot p_2) = \{1, 2\}\), and the same goes for any world w in which \(p_1\) is false and \(p_2\) is true (and in which, consequently, \(p_0\) is true, since w is in \(p_0 \vee p_1\)). \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2)\) is then the \({\mathbb {LEX}}\)-maximum of \(\emptyset\) and \(\{1, 2\}\), and so \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2) = \{1, 2\}\).

Putting everything together, we may define a four-place similarity relation \(\prec\) among \({\mathcal {L}}\)-propositions:

\((\alpha , \beta ) \prec (\gamma , \delta ) =_{def} DIST_{PP} (\alpha , \beta ) <_{\mathbb {LEX}} DIST_{PP} (\gamma , \delta )\),

with the left-hand side read as ‘proposition \(\alpha\) is more similar to proposition \(\beta\) than proposition \(\gamma\) is to proposition \(\delta\)’. Similarly,

\((\alpha , \beta ) \sim (\gamma , \delta ) =_{def} DIST_{PP} (\alpha , \beta ) = DIST_{PP} (\gamma , \delta )\),

and

\((\alpha , \beta ) \preceq (\gamma , \delta ) =_{def} (\alpha , \beta ) \prec (\gamma , \delta ) \vee (\alpha , \beta ) \sim (\gamma , \delta )\).

(\(\preceq\) is the weak counterpart of the strict relation \(\prec\).) An easy check shows that ordered pairs of propositions under \(\preceq\) form a linear pre-order, i.e. a transitive, reflexive, linear,^{Footnote 14} but not necessarily antisymmetric,^{Footnote 15} order. Readers familiar with the Hausdorff metric will recognise \(DIST_{PP}\) as a sort of qualitative—metricless—version of it; this point will be explained in Sect. 4.2, when we stress the importance of doing without metric assumptions.

3.2 A general comparative model

Section 3.1 assumed that similarity among worlds is determined lexicographically with reference to a countable infinity of respects of similarity: the first respect matters most, the second matters second-most, and so on. From the order \(<_{\mathbb {LEX}}\) on the subsets of \(\omega\) thereby generated, we derived a four-place comparative relation \(\prec\) (or \(\preceq\) for the weak relation) on propositions.

The lexicographic model makes \(p_0\) an absolute dictator, in the sense that dissimilarity in this respect cannot be compensated for by similarity in others; likewise for similarity in the \(p_i\)-respect versus similarity in lesser-ranked respects. To avoid this consequence, a more general comparative model does away with the assumption that similarity of worlds is determined lexicographically; it simply generates \(\prec\) from an arbitrary linear pre-order. We may also relax the assumption that the number of atoms is countably infinite, since nothing turns on it. Our more general model thus assumes that \(\kappa\) is a (finite or infinite) cardinal and that worlds are maximally consistent sets of literals drawn from the atom set \(\{p_i: i \in \kappa \}\). Propositions are now well-formed formulas in a (truth-functionally complete) propositional language whose set of atoms is \(\{p_i: i \in \kappa \}\). We assume that \(\le _{\mathbb {ORD}}\) is a weak linear pre-order on \({\mathbb {P}}(\kappa )\), i.e. that \(\le _{\mathbb {ORD}}\) is reflexive, transitive, and linear, but not necessarily antisymmetric. If \(\le _{\mathbb {ORD}}\) happens to also be antisymmetric, we proceed as in Sect. 3.1, simply replacing \(\le _{\mathbb {LEX}}\) with \(\le _{\mathbb {ORD}}\). In the more general case in which \(\le _{\mathbb {ORD}}\) may not be antisymmetric, having defined \(DIST_{WW}\) as in Sect. 3.1, we define \(DIST_{WP} (w, \alpha )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-least member of

\(\{s \subseteq \kappa : s = DIST_{WW}(w, w^*) \text { for some } w^* \in \alpha \}\).

and \(DIST_{PP} (\alpha , \beta )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-largest element of

\(\{DIST_{WP} (w, \alpha ): w \in \beta \} \cup \{DIST_{WP} (w, \beta ): w \in \alpha \}\)

That \(\alpha\) and \(\beta\) are propositions rather than arbitrary sets of worlds guarantees that there are \({\mathbb {ORD}}\)-least and \({\mathbb {ORD}}\)-largest elements in the respective definitions of \(DIST_{WP}\) and \(DIST_{PP}\).^{Footnote 16} When \(\le _{\mathbb {ORD}}\) is not antisymmetric, we pick any one from an equivalence class of subsets of \(\kappa\); and if \(\preceq\) happens to be a linear order (and so antisymmetric), then each equivalence class contains exactly one element, in which case we pick its only member.

Another example will help illustrate the more general account. Suppose as in Sect. 3.1 that worlds are represented as maximally consistent sets of \({\mathcal {L}}\)-literals, that the set of atoms of \({\mathcal {L}}\) is once more \(\{p_i: i \in \omega \}\), and that the distance of \(w_1\) to \(w_2\) is the set of atomic indices on which they differ. This time, however, the order on subsets of \(\omega\) is determined by the number of propositions on which worlds differ, the smaller the number the more the two subsets are similar; when this number is the same, there is a tie, represented by \(\sim\) (equivalently the obtaining of both \(\preceq\) and \(\succeq\)). Under this ordering, which we label \(<_{\mathbb {NUM}}\), our illustrative subsets \(\emptyset\), \(\{0\}\), \(\{1\}\), \(\{2\}\), \(\{1, 2\}\), \(\{2, 4\}\) and \(\omega\) of the set \(\omega\) are ordered as follows:

\(\emptyset<_{\mathbb {NUM}} \{0\} \sim _{\mathbb {NUM}} \{1\} \sim _{\mathbb {NUM}} \{2\}<_{\mathbb {NUM}} \{2, 4\} \sim _{\mathbb {NUM}} \{1, 2\} <_{\mathbb {NUM}}\omega\)

It is easy to see that the two-place comparative relation on pairs of propositions induced by the \({\mathbb {NUM}}\)-ordering on \(DIST_{PP}\) is a linear pre-order, but not a linear order. The semantic proximity of any two worlds which differ over the status of \(p_1\) only is, for instance, the same as the semantic proximity of any two worlds which differ over the status of \(p_2\) only.

3.3 An Even More General Comparative Model

The models in Sects. 3.1–3.2 assumed a four-place similarity relation among worlds. In fact, the relation took a special form: in abstract terms, we assumed the existence of a two-place function DIST on the product of the space of worlds with codomain a linear pre-order, whose order relation we may call \(<_{\mathbb {ORD}}\), such that

\((w_1, w_2) \prec (w_3, w_4) =_{def} DIST(w_1, w_2) <_{\mathbb {ORD}} DIST(w_3, w_4)\),

and similarly for the weak relation \(\preceq\). A general model based on the above lines simply assumes the existence of such a function \(DIST: W^2 \rightarrow {\mathbb {ORD}}\), where W is the space of worlds and \(\langle {\mathbb {ORD}}, <_{\mathbb {ORD}} \rangle\) is a linear pre-order. In this case, we define \(DIST_{WP} (w, \alpha )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-least member of

\(\{s: s = DIST(w, w^*) \text { for some } w^* \in \alpha \}\).

and \(DIST_{PP} (\alpha , \beta )\) to be any (arbitrarily chosen) \({\mathbb {ORD}}\)-largest element of

\(\{DIST_{WP} (w, \alpha ): w \in \beta \} \cup \{DIST_{WP} (w, \beta ): w \in \alpha \}\)

Now \(DIST_{WP}\) and \(DIST_{PP}\) are well-defined only if the relevant sets are guaranteed to have, respectively, an \({\mathbb {ORD}}\)-least member and an \({\mathbb {ORD}}\)-largest element, for all propositions \(\alpha\) and \(\beta\); we may call these combined assumptions \({\mathbb {ORD}}\)-min-and-max. If \({\mathbb {ORD}}\)-min-and-max obtains, the present model is more general than either of those in Sects. 3.1–3.2, because we need no longer assume that the distance between two worlds is determined by the set of \({\mathcal {L}}\)-atoms on which they differ. Indeed, in that case worlds need no longer be construed as maximally consistent sets of \({\mathcal {L}}\)-literals; they can be anything whatsoever. In general, of course, without some background assumptions, there is no reason to suppose that \({\mathbb {ORD}}\)-min-and-max holds.^{Footnote 17}

It’s worth illustrating this third, most general, model with an example. An interesting use of similarity in metaphysics is the apparently lexicographic explication of similarity among worlds found in Lewis (1979). This account is intended to underpin his possible-worlds account of counterfactuals and thus ultimately of causation. Lewis’s famous four conditions are:

(1) It is of the first importance to avoid big, widespread, diverse violations of law.

(2) It is of the second importance to maximize the spatiotemporal region throughout which perfect match of particular fact prevails.

(3) It is of the third importance to avoid even small, localized, simple violations of law.

(4) It is of little or no importance to secure approximate similarity of particular fact, even in matters that concern us greatly. (Lewis 1979, pp. 47–48)

Here is a natural way of making these remarks more precise. Assume that each of the four quantities in (1)–(4) can be measured; e.g. \(M_1\) may be a linear order such that if a world \(w_1\)’s violations of law are small, narrowly confined, and of the same kind, then \(w_1\)’s measure in the \(M_1\)-dimension is \(m_1\), a smaller element of \(M_1\) than \(m_2\), which represents the amount of \(w_2\)’s violations of law, which are big, widespread and diverse. Similarly for \(M_2\), \(M_3\) and \(M_4\), mutatis mutandis. The order \(\langle {\mathbb {ORD}}, <_{\mathbb {ORD}} \rangle\) is then a lexicographic order on \(M_1 \times M_2 \times M_3 \times M_4\). In other words, \(\langle m_1, m_2, m_3, m_4 \rangle < \langle m^*_1, m^*_2, m^*_3, m^*_4 \rangle\) if \(m_1 < m^*_1\), or if \(m_1 = m^*_1\) and \(m_2 < m^*_2\), etc. Each world is then assigned a quadruple \(DIST(w, w_@) = \langle m_1, m_2, m_3, m_4 \rangle\) that measures w’s divergence from the reference world \(w_@\). If we assume \({\mathbb {ORD}}\)-min-and-max in this context, our definition formally captures Lewis’s comparative similarity relation. (Naturally, there may be other ways of interpreting Lewis’s four conditions.)

Finally, all the material in Sect. 3 may be amended so as to define the three-place ‘\(\alpha\) is more similar to \(\beta\) than \(\gamma\) is’. The ‘distance’ between \(\alpha\) and \(\beta\) can be defined as above in terms of the atomic indices worlds in these sets differ over, and this ‘distance’ may then be compared to that between \(\beta\) and \(\gamma\). We have opted for a four-place relation here not out of any strong convinction, but mainly because as we saw in Sect. 1, we intuitively grasp four-place comparative similarity facts of the form ‘\(\alpha\) is more similar to \(\beta\) than \(\gamma\) is to \(\delta\)’, and from this four-place relation the three-place one is immediately definable. The literature also contains arguments that the four-place relation is methodologically preferable to the three-place one.^{Footnote 18} Whether or not our treatment based on a four-place relation is best recast in terms of a three-place relation is a question we will not further address. We note merely that nothing in Sect. 3 respectively precludes or prejudges a positive answer to it.

This completes the comparative model’s exposition. We now turn to some philosophical remarks.

4 Philosophical Commentary

In the previous section, we proposed an analysis of the four-place relation expressed by ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. As well as propositional similarity, the account potentially has other applications, for example to the question of higher-order resemblances amongst properties. We saw in Sect. 1 that comparative similarity facts between properties can give rise to comparative similarity facts between propositions: our example turned on crinsom being more like scarlet than blue is to green. One could explore the Sect. 3 ideas to ground an account of comparative similarity, replacing worlds with individuals and propositions with properties. That way, higher-order resemblance between properties could be reduced to ordinary resemblances between particulars.

To keep this paper to manageable length, however, we cannot here embark on this or any other applications. Nor is a detailed comparison with other accounts feasible, simply because there are too many of them. In lieu of that, Sect. 4.1 concisely situates the account vis-à-vis the verisimilitude literature. In Sect. 4.2, we explain why our comparative models are preferable to metric approaches. In Sect. 4.3, we consider another constraint that might be imposed on the models. Section 4.4 concludes with a brief explanation of why we take world similarity as primitive.

4.1 Generic Problems with Likeness Accounts

Approaches to truthlikeness^{Footnote 19} are standardly divided into three categories: the content, consequence and likeness approach. Ours is akin to the last of these, the likeness approach. The focus of these approaches is on defining what it is for X to be at least as close to the truth as Y, where X and Y are variously defined as propositions or theories or models, and ‘the truth’ may be a proposition or world or model. Hilpinen (1976) and Tichý (1974) are pioneers of the approach and Niiniluoto (1987) a classic synthesis advocating a possible-words likeness approach.^{Footnote 20}

Now our topic is propositional similarity rather than verisimilitude, so our aims and these writers’ are correspondingly different. Many of the constraints on acceptable measures of verisimilitude, for instance, mention truth and falsity (e.g. ‘some false statements may be more truthlike than some true statements’).^{Footnote 21} These cannot sensibly be imposed on an account of propositional similarity, which must be independent of which world is the actual one: for which world is actual makes no difference to whether \(s_1\) is more similar to \(s_2\) than \(s_3\) is to \(s_4\). By the same token, accounts which make essential use of the class of true models, or correct consequences, or true theories, or the like, are different in both spirit and implementation from ours. The verisimilitude literature does contain discussion of how to assign a measure to pairs of sets of worlds (or models).^{Footnote 22} However, these treatments generally presuppose that the distance between such sets is represented by a scalar quantity, typically a real number, and for that reason differ significantly from our non-metric approach.

All that said, despite the differences, the present proposal does have several points of affinity with likeness accounts and some comparisons may be drawn.^{Footnote 23} Rather than attempt to review the large literature, we situate the account by seeing how it deals with the main generic difficulties associated with such approaches. Oddie (2014, sec. 1.4) usefully summarises these as:^{Footnote 24} how is likeness determined? What is the correct functional dependence of the truthlikeness of a proposition on the likeness of its members to the actual word (the extension problem)? Finally, does the account of likeness depend on which language is used to formulate propositions (the problem of language dependence)?

In answer to the first question, likeness in the Sect. 3 models is constituted by agreement over propositional atoms and unlikeness by divergence over them. Agreement over which atoms or collection thereof counts most is left completely open, as it is determined by the order relation (Sect. 3.3’s \(<_{\mathbb {ORD}}\)), which can be anything we like. Our account is thus inherently flexible. Furthermore, unlike metric accounts, which are constrained by the size of the real numbers (the continuum) and their metric properties (e.g. reals are separable, so admit only countable ascending or descending chains), our model allows similarity to be defined on propositional languages of arbitrary cardinality. (See Sect. 4.2 for more on this point.)

Moving on to the second question, functional dependence in our model is given by the measure \(DIST_{PP}\). We do not claim that this measure of ‘distance’ is uniquely motivated. It is, however, a natural one in our metricless setting. As the literature attests, likeness approaches based on metrics have to make an arbitrary choice from a wide range of functions that map collections of distances between worlds into a single distance between sets of worlds. Take Tichý’s famous (1974) toy example, in which there are three atomic propositions h (‘it’s hot’), r (‘it’s rainy’) and w (‘it’s windy’), and in which the state of the world is \(h \wedge r \wedge w\). The distance between conjunctions of literals drawn from these three atoms is then given by the taxicab metric, so that e.g. the distance between \(h \wedge r \wedge w\) and \(h \wedge \lnot r \wedge w\) is 1 because these two worlds disagree only on the truth-value of a single atom, namely r. Suppose now that we wish to assign a distance from the set of h-worlds to the set of r-worlds. There are four of each, and 15 distances to aggregate into a single output that represents the distance between h and r.^{Footnote 25} Any from a wide range of such aggregating functions is compatible with intuitive constraints. More generally, as the verisimilitude literature demonstrates, in trying to define the distance from a world w to a set of worlds \(\alpha\), given a distance function on worlds, one may use a wide array of measures: the average distance between w and the elements of \(\alpha\), or the infimum of such distances, or the supremum, or some weighted average of the last two. The choices are endless, and have familiar pros and cons. As Schurz and Weingartner remark, extant approaches have not successfully solved the problem of extending truthlikeness of worlds to truthlikeness of propositions because this extrapolation is ‘intuitively underdetermined’ (2010, p. 423). In our non-metric setting, far fewer technical options are available—because there are no real values for measures to take as inputs. In fact, we know of no other non-metric account to rival that presented in Sect. 3.

By not assuming distances between worlds, we avoid the extension problem in its starkest form. The distance between propositions is simply a function of the atoms over which their constituents differ, which function exactly depending on the order. In a sense, our account does not so much avoid the problem as openly embrace it. Propositional similarity is ultimately grounded in atomic difference; how these atomic differences are then weighed entirely depends on the order (\(\le _{\mathbb {ORD}}\) in the most general setting).

Finally—in answer to the third question—our account is not problematically language-sensitive. Choosing similarity respects and an order on sets of these respects fixes the propositional similarity facts. As an illustration, suppose we let \(q_0\) be \(p_0\), \(q_1\) be \(p_0 \vee p_1\) and more generally define \(q_n\) as \(p_0 \vee p_1 \vee \cdots \vee p_n\). Clearly, the \(p_i\) and the \(q_i\) are interdefinable using Boolean operations. If the similarity respects are the \(p_i\), as in the lexicographic account say, whether we express propositions using the \(q_i\) or the \(p_i\) makes no difference to their similarity. The similarity between \(q_0\) and \(q_1\) is the same whether we express these as \(q_0\) and \(q_1\) or alternatively as \(p_0\) and \(p_0 \vee p_1\). One might contend that we have avoided language dependence only by fixing similarity respects at the outset. But it is hard to see how any similarity judgements could get off the ground without privileging some similarity respects over others, since everything is identical to anything else in infinitely many different ways.

This, at any rate, is a sketch of how our account fares with respect to the main charges brought against likeness accounts. To further motivate the avoidance of metric assumptions, we now explain why the very move from a non-metric to a metric account saddles us with problems of its own making.

4.2 Metric and Pseudometric Accounts

Our account in Sect. 3 avoided any metric assumptions. Those who are familiar with the Hausdorff metric from real analysis will recognise, however, that our comparative similarity relation is Hausdorff-like. It is a sort of non-metric implementation of the idea behind the Hausdorff metric. A comparison of our non-metric approach with this metric will help bring out the former’s virtues. To keep this paper self-contained, we include a definition and exposition of the Hausdorff metric; but since it would obtrude too much if laid out here, we relegate the material to Appendix 1.4.

On a thumbnail (Appendix 1.4 has the full details), the Hausdorff distance \({\mathbb {D}}\) between two sets of worlds \(\alpha\) and \(\beta\) measures the largest shortest distance between a world in one set and the other set. From a metric d on the space of worlds W we may thereby define a pseudometric \({\mathbb {D}}\) (see Appendix 1.4) on the set of non-contradictory propositions construed as worlds. We may exploit a pseudometric \({\mathbb {D}}\) to define a four-place comparative similarity relation \(\preceq _{{\mathbb {D}}}\) on propositions as follows:

\((\alpha , \beta ) \preceq (\gamma , \delta ) =_{def}{\mathbb {D}}(\alpha , \beta ) \le {\mathbb {D}}(\gamma , \delta )\).

We thereby recover a four-place comparative similarity relation, this time based on metric assumptions. As Appendix 1.4 explains, if propositions are closed and bounded subsets of W then \({\mathbb {D}}\) turns out to be a metric on all propositions save the contradiction. But even if they are not, there’s no harm done; for our purposes, that \({\mathbb {D}}\) is a pseudometric is enough.^{Footnote 26}

We note that the Hausdorff metric \({\mathbb {D}}\) has received a bad press in the verisimilitude literature. Niiniluoto (1987, p. 245) for instance points out that in the special case in which \(\beta = \{w\}\), \({\mathbb {D}}(\alpha , \{w\})\) is the supremum of the distances from w to \(\alpha\), so that only the maximum distance (supremum) between a world \(\alpha\) and the set \(\{w\}\) is taken into account. This is an unwelcome consequence for a measure of world-to-proposition distance. But as a criticism of a proposition-to-proposition measure, it draws a blank, since the metric takes into account all worlds in \(\alpha\) and \(\beta\), and in our setting no propositions correspond to a single world.

We provide four reasons for preferring our non-metric comparative model in Sect. 3 to the metric/pseudometric model one suggested in this subsection. Some of these reasons generalise to objections against any metric approach to propositional similarity.

1. In the general case, the pseudometric \({\mathbb {D}}\) may not be a metric. If it isn’t, \({\mathbb {D}}\) will elide the difference between two sets of worlds being identical and being as close to one another as possible. For example, it is natural to suppose that a proposition such as ‘John is no more than 2 m tall’ is the closure in the space of worlds of the proposition ‘John is less than 2 m tall’. Now under any sensible notion of set distance in a metric space, the distance of any set from its closure should always be 0. So the \({\mathbb {D}}\)-difference between these propositions should be 0, and will be 0 if \({\mathbb {D}}\) is the Hausdorff pseudometric, despite the propositions’ distinctness. Consequently, on this account the two propositions are just as similar to one another as the first is to itself. But this is the wrong result: ‘John is no more than 2 m tall’ and ‘John is less than 2 m tall’ may be infinitesimally close to one another, as we might put it, but they are less similar to one another than either of them is to itself. There is a difference between zero and infinitesimal proximity, a difference which the pseudometric approach obliterates.^{Footnote 27} Our Sect. 3 models, however, can respect this difference.

2. Second, the pseudometric \({\mathbb {D}}\) is highly sensitive to the underlying world metric d. It is easy to come up with examples in which \((X, \tau )\) is a topological space, \(d_1\) and \(d_2\) are metrics compatible with the topology \(\tau\) on X, and yet there are points x, y, z and w of X such that \(d_1(x, y) < d_1 (z, w)\) and \(d_2(x, y) > d_2 (z, w)\), i.e. \(d_1\) and \(d_2\) disagree on four-place comparative facts of the form ‘x is closer to y than z is to w’. In the same way, the four-place relation \(\prec _{{\mathbb {D}}}\) derived from \({\mathbb {D}}\) depends sensitively on the world metric d.^{Footnote 28}

Metrics \(d_1\) and \(d_2\) may thus be very similar yet generate respective pseudometrics \({\mathbb {D}}_1\) and \({\mathbb {D}}_2\) that disagree on four-place comparative propositional similarity facts. The second problem for metric/pseudometric models, then, stems from the fact that data about world similarity is hardly ever quantitative. Looking back at the motivating examples in Sect. 1 we see that the facts about propositional similarity encountered there were all comparative. They were facts of the form ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’, or three-placed ones of the form, ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is’, rather than quantitative facts of the form ‘the distance in meaning between \(s_1\) and \(s_2\) is twice that between \(s_3\) and \(s_4\)’. Now there may be cases of this kind in which we have metric information; but they will be rare. To assume a metric on worlds and use it to derive a pseudo-metric on propositions is therefore to impose a spurious precision on the subject matter, at least for most applications of interest. Propositional metrics depend overly sensitively on world metrics.

3. The third problem with pseudometric models is the cardinality constraint they impose. Thought of as a binary relation on pairs of worlds, the derived relation \(\prec _{{\mathbb {D}}}\) cannot, for example, admit uncountable-ascending or uncountable-descending chains, since the real numbers admit no such chains.^{Footnote 29} But we would like to allow such possibilities rather than rule them out through the model’s formal features. We may, for example, wish to allow for uncountably many similarity respects.^{Footnote 30} The Sect. 3 models comfortably handle arbitrarily large numbers of worlds or similarity respects.

4. Our comparative models are defined for all propositions other than the contradiction. To plug this gap, one fix would be to set \(DIST_{PP} (\alpha , \bot )\) to be some subset of indices, when \(\bot\) is any contradictory \({\mathcal {L}}\)-sentence. Setting \(DIST_{PP} (\bot , \bot ) = \emptyset\) is forced if we wish to respect the condition that a proposition is no further in meaning to itself than any two propositions are to one another.^{Footnote 31} A natural choice for \(DIST_{PP} (\alpha , \bot )\) when \(\alpha\) is not contradictory is the full set of indices, so that \(DIST_{PP} (\alpha , \bot ) = \omega\) when this index is \(\omega\) (as in Sect. 3.1’s lexicographic example) and \(\alpha\) is not a contradiction. This latter clause captures the thought that, as a classical logician would put it, a contradiction disagrees with any other sentence about the value of all atomic propositions, since it takes all of them to be both true and false. Naturally, from a relevance logic perspective (which there is no room to explore), a different choice might be made here.

Contrast this with a pseudometric model, for which there is no technical fix. For example, we cannot just set the \({\mathbb {D}}\)-distance from a contradiction to any other proposition to 0 on pain of violating the triangle inequality for \({\mathbb {D}}\).^{Footnote 32} In a metric space, the Hausdorff distance \({\mathbb {D}}(S, F)\) between a fixed closed, bounded, non-empty subset F and a closed, bounded and non-empty subset S which tends towards the empty set will not in general tend to a particular limit.^{Footnote 33} So unlike the Sect. 3 models, the pseudometric model seems essentially incomplete. The wisdom of adopting a non-metric account of propositional similarity is once more apparent.

For all the reasons just outlined, we take our comparative models to be superior to metric accounts. Of course, when there is metric information relating worlds to worlds, one might essay a metric account of propositional similarity. But such cases tend to be the exception rather than the norm.

4.3 The Negation Constraint

Here is a natural-sounding constraint one might imagine applies to propositional similarity, truth-conditionally understood: the degree of similarity between \(\alpha\) and \(\beta\) should match that between \(\lnot \alpha\) and \(\lnot \beta\). Call this the negation constraint.

There are two readings of the negation constraint, depending on how the word ‘match’ is understood. The stronger reading construes match as identity, i.e. insists that the degree of similarity of \(\alpha\) and \(\beta\) equal that of \(\lnot \alpha\) and \(\lnot \beta\). The weaker reading does not insist on identity, but requires that the respective degrees be close. Though the weaker reading is vague, Sect. 3’s comparative models respect it, since \(DIST_{PP}(\alpha , \beta )\) is in general close in the ordering to \(DIST_{PP}(\lnot \alpha , \lnot \beta )\). Of course, how close will depend on \({\mathbb {ORD}}\); but the fact that \(DIST_{PP}(\alpha , \beta )\) is a subset of the indices of atoms in \(\alpha\) and \(\beta\) sets an upper bound on the difference between them. And when \(\alpha\) and \(\beta\) are literals, as well as in certain other special cases, \(DIST_{PP}(\alpha , \beta )\) simply equals \(DIST_{PP}(\lnot \alpha , \lnot \beta )\).

The constraint’s stronger reading is that \(DIST_{PP}(\alpha , \beta ) = DIST_{PP}(\lnot \alpha , \lnot \beta )\) for all \(\alpha\) and \(\beta\). This constraint is not respected by our Sect. 3 models. For example, as we saw in connection with Sect. 3.1’s lexicographic model, \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2) = \{1, 2\}\); yet

\(DIST_{PP} (\lnot (p_0 \vee p_1), \lnot (p_1 \wedge \lnot p_2)) = DIST_{PP} (\lnot p_0 \wedge \lnot p_1, \lnot p_1 \vee p_2) = \{0, 1\}\),

as the reader may verify. In general, whether \(DIST_{PP}(\alpha , \beta ) = DIST_{PP}(\lnot \alpha , \lnot \beta )\) will depend on \(\alpha\) and \(\beta\) as well as the underlying order \({\mathbb {ORD}}\).

One way to motivate the stronger constraint is by thinking of propositions as functions from worlds to truth-values. Models in which degrees of similarity are invariant under the permutation of truth-values are formally attractive. When it comes to applications to possible worlds, however, truth-values may not be permutable. Truth-value symmetry may be desirable in a purely mathematical setting; but as soon as intended interpretations—applied models—are in play, any reason to demand such symmetry fades away.

In fact, the constraint’s stronger reading is problematic. Propositions \(\alpha\) and \(\beta\) may be contradictory while their negations are jointly consistent. For example, if we let \(\alpha = p \wedge q\) and \(\beta = \lnot p\), then \(\alpha\) and \(\beta\) are inconsistent; but \(\lnot \alpha\) and \(\lnot \beta\) are respectively equivalent to \(\lnot p \vee \lnot q\) and p, which are jointly consistent. The motivation for our models also clashes directly with the negation constraint’s stronger reading. Using the lexicographic model as an illustration, suppose \(s_1\) is \(p_0 \wedge p_1\) and \(s_2\) is \(\lnot p_0 \wedge p_2\). Sentence \(s_1\) is very dissimilar to \(s_2\), as it differs from it along the most important dimension of similarity; \(DIST_{PP} (p_0 \wedge p_1), \lnot p_0 \wedge p_2) = \max (\{0, 1\}, \{0, 2\}) = \{0, 1\}\). However, not-\(s_1\)’s similarity to not-\(s_2\) is greater than \(s_1\)’s to \(s_2\), the reason being that not-\(s_1\) and not-\(s_2\) are compatible along the most important similarity dimension. For any way the world has to be for not-\(s_1\) to be true there is a way the world is in which not-\(s_2\) is true that is more similar to this first world than any \(s_1\)-world is to any \(s_2\)-world. More succinctly, \(s_1\) and \(s_2\) are constrained to differ along the most important dimension of similarity, whereas not-\(s_1\) and not-\(s_2\) are not; from a lexicographic perspective, that means that not-\(s_1\) is more similar to not-\(s_2\) than \(s_1\) is to \(s_2\). This is precisely the result our comparative models deliver.

In sum, our comparative models respect the negation constraint’s weaker reading. They do not respect the stronger reading, for good reason.

4.4 More About Similarity?

Finally, a few words about taking similarity as primitive. Our models assumed, rather than derived, similarity respects between different worlds; these respects are ‘exogenous’ to the models. To think this problematic—an omission—is to misunderstand our aims. The models derived a four-place similarity relation from a four-place comparative similarity relation on worlds. There is no reason to suppose that the relevant similarity respects will be the same for all applications of interest. As David Lewis once noted,^{Footnote 34} it seems unlikely that the same ordering of respects of comparison will underlie facts about verisimilitude as well as facts about counterfactuals. Various notions of similarity vary from context to context, which is why our models allowed different orderings. Moreover, all talk of similarity of worlds is neutral between similarity in a particular respect (which itself perhaps aggregates sub-respects of similarity) or overall similarity; either is compatible with our models.^{Footnote 35}

5 Conclusion

The task this paper set itself was to derive comparative propositional similarity facts from world similarity facts. We sought to avoid a metric approach, common in the literature, but unrealistic in many contexts. The models presented in Sect. 3 did just that. They help make formal sense of the notion, central to philosophy, of propositional similarity in the usual case where there is no metric to be had.^{Footnote 36}

Notes

As Benson Mates writes: ‘...to formulate precise and workable rules for symbolizing sentences of the natural language is a hopeless task. In the more complicated cases, at least we are reduced to giving the empty-sounding advice: ask yourself what the natural language sentence means, and then try to find a sentence of [the formal language] \({\mathfrak {L}}\) which, relative to the given interpretation, has as nearly as possible the same meaning.’ (Mates 1972, p. 84) Another noteworthy articulation of the criterion of semantic proximity may be found in Sainsbury (2001, pp. 52 and 372).
We speak of propositions and sentences interchangeably, it being understood in the latter case that we mean the propositions expressed by these sentences in a context.
Davidson (1976) generated a notable such tradition.
Borrowed from Graham Oddie (2014, Sect. 1).
Or worlds in the case of a tie.
For a recent philosophical discussion of the notion of subject matter, see Yablo (2014).
The atom set \(\{p_0, p_1 \ldots , p_n, \ldots \}\) is typically (but not necessarily) countably infinite.
We could model the holding true of a proposition at a world by introducing a two-place relation H between propositions and worlds, and write Hpw when p is true at w. For no other reason than notational simplicity, we write \(w \in p\) instead.
Our models are compatible with all accounts of possible worlds. All we need for our purposes are the assumptions that there are propositions, that there are worlds, and that propositions are true or false (but not both) at worlds. Thus, pretty much any account of propositions—Fregean, Russellian, or other—will do. That possible worlds are consistent follows by definition.
The main difference being that Stalnaker accepts, whereas Lewis rejects, the so-called limit assumption, viz. that if there is a world in which some proposition holds then there is a closest such world (relative to a given world). For a texbook account, see ch. 8 of Sider (2010).
The indices are implicitly understood as coding the order of interest; strictly speaking the lexicographic set \(\{p_0, \ldots , p_n, \ldots \}\) is \(\{\langle p_0, 0 \rangle , \ldots , \langle p_n, n \rangle , \ldots \}\). Of course, a real Formula 1 race has only finitely many drivers, in which case the set of atoms would be finite; see Sect. 3.2 for how to extend the model to this more realistic case.
Readers familiar with the Cantor space might recognise that we are thinking of the space of worlds as \(2^\omega\), the Cantor space’s domain. The function \(DIST_{WW}\) is a way of encoding the comparative information contained in a metric on the space of worlds without going so far as to assume one.
Henceforth quotation marks around the word ‘distance’ are implicitly understood.
Some authors write ‘complete’ instead of ‘linear’; the condition is that \(x \le y \vee y \le x\) for all elements x and y. Although linearity implies reflexivity (set \(x = y\)), we nevertheless include reflexivity as a separate requirement, for perspicuousness.
Meaning that \((\alpha , \beta ) \sim (\gamma , \delta )\) does not imply that \((\alpha , \beta ) = (\gamma , \delta )\).
The argument is analogous to that in Appendices 1.1 and 1.2.
Although it may look as if we assumed a little less than a four-place similarity relation on worlds, since we only assumed the existence of a relation on pairs of worlds, the two conditions are equivalent given plausible assumptions. See Appendix 1.3 for the argument.
Tim Williamson has argued that, in general, this three-place relation is expressively inadequate for formulating comparative similarity claims, his example being ‘there are three things of which the first does not differ from anything by as much as the other two differ from each other’ (Williamson 1988, p. 471). As Williamson shows (1988, p. 467), a four-place comparative similarity relation cannot be defined from a three-place one. Moreover, the logic of the four-place comparative similarity relation is finitely axiomatisable, whereas that of the three-place comparative similarity relation is not (Williamson 1988, section 4).
We take ‘truthlikeness’ and ‘verisimilitude’ as synonyms, unlike some writers such as e.g. Zwart (2001, p. 27).
Other papers outside the verisimilitude literature directly emerging from Popper (1963) and that relate to the present paper in different ways include Bigelow (1976), Blumson (2018) and Williamson (1988).
See Niiniluoto (1987, pp. 232–233) for a list of such constraints.
An example being Niiniluoto (1987, pp. 242–256).
According to the definion on page 25 of Zwart (2001), the lexicographic account with infinitely many propositional atoms is not strictly speaking a likeness approach. When the set of atoms is finite, the lexicographic account is a likeness one according to Zwart’s definition, though this does not extend to the general orderings discussed in Sect. 3.3.
Chapters 1–3 of Zwart (2001) offer an accessible and mathematically precise overview; Niiniluoto (2018) is a very recent review.
Since there are 4 h-worlds and 4 r-worlds, there are 4 \(\times\) 4 = 16 h-world and r-world pairs; one of these, the distance between \(h \wedge r \wedge w\) and \(h \wedge r \wedge \lnot w\), is counted twice over by the product.
As should be clear, a pseudometric \({\mathbb {D}}\) uniquely defines a four-place relation \(\preceq\), but not vice versa. For example, if the pseudometric \({\mathbb {D}}\) satisfies \((\alpha , \beta ) \preceq (\gamma , \delta )\) iff \({\mathbb {D}}(\alpha , \beta ) \le {\mathbb {D}}(\gamma , \delta )\) then so does, say, the pseudometric \(2 {\mathbb {D}}\) i.e. 2 times \({\mathbb {D}}\), which multiplies all \({\mathbb {D}}\)-distances by 2. The assumption that a particular pseudometric exists is thus a stronger hypothesis than that a particular comparative strict relation \(\prec\) exists. (Or that a particular such weak relation \(\preceq\) exists.) That said, the existence of a four-place comparative relation \(\prec\) on a countable (finite or countably infinite) set satisfying some natural assumptions is equivalent to the existence of a metric. A version of this argument may be found in Suppes et al. (1989, p. 162).
Note that if the worlds in our models are not possible worlds but equivalence classes of possible worlds up to \({\mathcal {L}}\)-equivalence then no distinction can be drawn between (1) a pair \(\langle w, w \rangle\) whose first member equals its second member, and (2) the pair \(\langle w, w' \rangle\) with the same w and \(w'\) a distinct world from w’s same equivalence class. In contrast to the objection just raised, this is a perfectly acceptable consequence of thinking of \({\mathcal {L}}\)-propositions as the only similarity respects that matter.
Here is an illustration of \(\prec _{{\mathbb {D}}}\)’s sensitive dependence on d (via \({\mathbb {D}}\)). Assume that we assign each similarity respect \(p_0, p_1, p_2 \ldots\) a weight, i.e. a positive real number; as usual, we imagine that context fixes these respects and weights. Thus if two worlds differ with respect to \(p_0\), this contributes \(x_0\) to their distance (or dissimilarity), if they differ with respect to \(p_1\) this contributes \(x_1\), and so on, with the \(x_i\) all positive reals (where i is a natural number). In other words, if \(w_i\) and \(w_j\) are worlds, i.e. maximally consistent sets of \({\mathcal {L}}\)-literals, and \(x_n\) is a positive real number for each n, we may set \(d(w_i, w_j) =_{def} \Sigma x_n\), for n such that \(p_n \notin w_i \cap w_j\) and \(\lnot p_n \notin w_i \cap w_j\). Suppose now that \(w_1 = \{p_n: n \ge 0 \}\), \(w_2 = \{\lnot p_0\} \cup \{p_n: n \ge 1 \}\) and \(w_3 = \{p_0, \lnot p_1, \lnot p_2 \} \cup \{p_n: n \ge 3 \}\). And suppose further that \(d_1\) is the metric defined by taking each \(x_n\) as \(2^{-n}\), and that the metric \(d_2\) is instead obtained by setting \(x_n\) as \(1.5^{-n}\). Then \(1 = d_1(w_1, w_2) > d_1(w_1, w_3) = 1/2 + 1/4 = 3/4\), whereas \(1 = d_2(w_1, w_2) < d_2(w_1, w_3) = 2/3 + 4/9 = 10/9\). Evidently, this difference in the four-place comparative relation on propositions derived from the metrics \(d_1\) or \(d_2\) carries over to differences in \({\mathbb {D}}_1\) or \({\mathbb {D}}_2\) respectively derived from these metrics. A quick way to see this is that when \({\mathbb {D}}\) is derived from d as above, \({\mathbb {D}} (\{w_1 \}, \{ w_2 \}) = d(w_1, w_2)\), so that \({\mathbb {D}}_1(\{w_1\}, \{w_2\}) > {\mathbb {D}}_1(\{w_1\}, \{w_3\})\) whereas \({\mathbb {D}}_2(\{w_1\}, \{w_2\}) < {\mathbb {D}}_2(\{w_1\}, \{w_3\})\). A similar sort of argument can be also run for propositional inputs to \({\mathbb {D}}\).
Because the reals are separable, i.e. they admit a countable order-dense subset (e.g. the rationals).
Here we are in agreement with Blumson (2018), who presents a forceful dilemma for those who would represent dissimilarity among particulars by distance in a metric space (d above). If comparative dissimilarity is a relation between actual particulars, then the metric d is severely underdetermined (see the previous objection). But if it is supposed to be a relation among possible particulars, such a metric is unlikely to exist for the kind of reason just given. See Blumson (2018) for more details.
Assuming \(\emptyset\) is the order’s minimal element.
The \({\mathbb {D}}\)-distance from \(\alpha\) to \(\gamma\), if positive, would then be less than the sum of the \({\mathbb {D}}\)-distance from \(\alpha\) to \(\beta\) and \(\beta\) to \(\gamma\) (0 + 0), where \(\beta\) is a contradiction. So if the \({\mathbb {D}}\)-distance from any contradiction were 0, the only consistent \({\mathbb {D}}\)-distance for all pairs of propositions would be 0.
‘S tends towards the empty set’ here abbreviates: the diameter of S tends to 0, where S’s diameter is defined by diam(S) = \(\sup _{x\in S, y \in S} d(x,y)\). The value of \({\mathbb {D}}(S, F)\) as S tends to the empty set will in general depend on how S tends to the empty set.
Lewis (1986, p. 21).
See Morreau (2011) and Kroedel and Huber (2014) for discussion of how, if at all, one might aggregate respects of similarity into overall comparative similarity. For relevant work on counterfactuals, see Kratzer (1989) and Veltman (2005).
Thanks to Ben Blumson, Robert Leek, Tim Williamson and several anonymous referees for comments. I’m also grateful to the audience at a conference in honour of Dan Isaacson’s retirement in June 2013, where an embryonic version of these ideas was first presented.
A closed subset S of a metric space is the complement of an open set. If x is an element of an open set S then there is a positive real number r such that all points whose distance from x is less than r are also in S. And if S is a bounded subset of a metric space then there is a fixed positive real number r such that the distance between any two of S’s elements is less than r. For more on the Hausdorff metric, see e.g. (Čech 1969, pp. 121–124).
The Hausdorff metric thus differs from the function which takes the distance between A and B to be the d-distance between their ‘centres’, 1.5 in this instance since the horizontal distance between A’s centre point and B’s centre point is 1.5. This alternative measure has the unacceptable consequence for the application we have in mind that any two sets with the same centre have distance 0. Moreover, the notion of a ‘centre’ may also be undefinable for certain shapes.
For example, if \(x \notin \alpha\) and \(\alpha\) is unbounded, then for any \(r > 0\), some element of \(\alpha\) has d-distance greater than r from x. (Otherwise all points of \(\alpha\) would be within r of x and so by the triangle inequality for d, the d-distance between any two elements of \(\alpha\) would be no more than 2r, contrary to the assumption of \(\alpha\)’s unboundedness.) Elaborated a little, this argument shows that if \(\alpha\) is unbounded and \(\beta\) is bounded then the \({\mathbb {D}}\)-distance between them cannot be finite and hence is undefined.
Consider for example the open subset of the two-dimensional plane (with Euclidean metric) consisting of points \(\langle x, y \rangle\) such that \(x^2 + y^2 < 1\), whose closure is the set of points satisfying \(x^2 + y^2 \le 1\). The \({\mathbb {D}}\)-distance between these two subsets is 0 yet they are not identical.
There are more subsets of W than there are subsets of W corresponding to \({\mathcal {L}}\)-sentences; e.g. if \({\mathcal {L}}\) is countably infinite (of size \(\aleph _0\)) the set W is of size \(2^{\aleph _0}\) and the set of its subsets is of size \(2^{2^{\aleph _0}}\). Strictly speaking, then, we consider not \({\mathbb {D}}\) itself but \({\mathbb {D}}\)’s restriction to \({\mathcal {L}}\)-sentences.
An \({\mathcal {L}}\)-contradiction is any sentence such as \(p_0 \wedge \lnot p_0\) that is unsatisfiable in the standard semantics for propositional logic.
It’s not hard to check that if \({\mathbb {D}} (\alpha , \beta ) + {\mathbb {D}} (\beta , \gamma ) < {\mathbb {D}} (\alpha , \gamma )\) then there would be \(a \in \alpha\), \(b \in \beta\) and \(c \in \gamma\) such that \(d(a, b) + d(b,c) < d(a, c)\), contravening the triangle inequality for d.

References

Bigelow, J. (1976). Semantics of probability. Synthese, 36, 459–72.
Article Google Scholar
Blumson, B. (2018). Distance and dissimilarity. Philosophical Papers (vol. 48, pp. 211–239).
Čech, E. (1969). Point sets (2nd ed.). San Francisco: Academia.
Google Scholar
Davidson, D. (1976). Inquiries into truth and interpretation. Oxford: Clarendon Press.
Google Scholar
Hilpinen, R. (1976). Approximate truth and truthlikeness. In M. Przelecki, K. Szaniawski, & R. Wojcicki (Eds.), Formal methods in the methodology of the empirical sciences (pp. 19–42). Kufstein: Reidel.
Chapter Google Scholar
Kratzer, A. (1989). An investigation of the lumps of thought. Linguistics and Philosophy, 12, 607–653.
Article Google Scholar
Kroedel, T., & Huber, F. (2014). Counterfactual dependence and arrow. Noûs, 47, 453–66.
Article Google Scholar
Lewis, D. K. (1973). Counterfactuals. Oxford: Blackwell.
Google Scholar
Lewis, D.K. (1979). Counterfactual dependence and Time’s arrow. Noûs 13, repr. in his Philosophical Papers (Vol. II, pp. 32–52).
Lewis, D. K. (1986). On the plurality of worlds. Oxford: Blackwell.
Google Scholar
Mates, B. (1972). Elementary logic (2nd ed.). Oxford: Oxford University Press.
Google Scholar
Morreau, M. (2011). It simply does not add up: Trouble with overall similarity. Journal of Philosophy, 107, 469–90.
Article Google Scholar
Niiniluoto, I. (1987). Truthlikeness. Kufstein: Reidel.
Book Google Scholar
Niiniluoto, I. (2018). Truthlikeness: Old and new debates. Synthese, 197(4), 1581–1599.
Article Google Scholar
Oddie, G. (2014). Truthlikeness. Stanford Encyclopedia of Philosophy.
Popper, K. R. (1963). Conjectures and eefutations. Abingdon: Routledge.
Google Scholar
Sainsbury, M. (2001). Logical forms (2nd ed.). Oxford: Blackwell.
Google Scholar
Schurz, G., & Weingartner, P. (2010). Zwart and Franssen’s impossibility theorem holds for possible-world-accounts but not for consequence-accounts to verisimilitude. Synthese, 172, 415–36.
Article Google Scholar
Sider, T. (2010). Logic for philosophy. Oxford: Oxford University Press.
Google Scholar
Stalnaker, R. (1968). A theory of conditionals. In N. Rescher (ed.), Studies in logical theory: American philosophical quarterly monograph series (Vol. 2, pp. 98–112). Oxford: Blackwell.
Suppes, P., Krantz, D., Duncan Luce, R., & Tversky, A. (1989). Foundations of measurement, Vol. 2: Geometrical, threshold and probabilistic representations. Cambridge: Academic Press.
Google Scholar
Tichý, P. (1974). On Popper’s definitions of verisimilitude. The British Journal for the Philosophy of Science, 25, 155–60.
Article Google Scholar
Veltman, F. (2005). Making counterfactual assumptions. Journal of Semantics, 22, 159–180.
Article Google Scholar
Williamson, T. (1988). First-order logics for comparative similarity. Notre Dame Journal of Formal Logic, 29, 457–481.
Article Google Scholar
Yablo, S. (2014). Aboutness. Princeton: Princeton University Press.
Book Google Scholar
Zwart, S. D. (2001). Refined Verisimilitude, Synthese Library (vol. 307). Berlin: Springer.

Download references

Author information

Authors and Affiliations

Wadham College, University of Oxford, Oxford, UK
A. C. Paseau

Authors

A. C. Paseau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. C. Paseau.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

The four appendices supply arguments mentioned in the main text.

1.1 (relevant to Sect. 3.1)

We show that \(DIST_{WP} (w, \alpha )\) has a \({\mathbb {LEX}}\)-least element, where \(\alpha\) is an \({\mathcal {L}}\)-proposition. Write \(\alpha\) in disjunctive normal form as a finite disjunction of finite conjunctions of literals. The distance between w and a finite conjunction of literals \(\bigwedge _{i \le N} l_i\) is the index set of the set of atoms on which they differ (i.e. the set of indices of literals in \(\bigwedge _{i \le N} l_i\) whose negation appears in w, or are the negation of an atom appearing in w). It follows that the distance between w and a finite disjunction of such conjunctions is the \({\mathbb {LEX}}\)-least of a finite set of (finite) subsets of \(\omega\).

1.2 (relevant to Sect. 3.1)

We show that \(DIST_{PP} (\alpha , \beta )\) has a \({\mathbb {LEX}}\)-largest element (where \(\alpha\) and \(\beta\) are \({\mathcal {L}}\)-propositions). Let \(IndAtom(\alpha )\) be the (finite) set of indices of the \({\mathcal {L}}\)-atoms that appear in \(\alpha\); similarly for \(IndAtom(\beta )\). It is easy to see that \(DIST_{WP} (w, \alpha ) \subseteq IndAtom(\alpha )\) for any w, and a fortiori for \(w \in \beta\); the argument is as in Appendix 1.1. Similarly, \(DIST_{WP} (w, \beta ) \subseteq IndAtom(\beta )\). Thus \(DIST_{PP} (\alpha , \beta )\) is the \({\mathbb {LEX}}\)-largest element of some subset of the finite set \({\mathbb {P}}(IndAtom(\alpha ) \cup IndAtom(\beta ))\).

1. 3 (relevant to footnote 17 in Sect. 3.3)

The existence of a four-place similarity relation \(\preceq\) on worlds is equivalent to the existence of a function \(DIST: W^2 \rightarrow {\mathbb {ORD}}\), given some plausible conditions on \(\preceq\) and DIST. The conditions on \(\preceq\) are:

1
\((w_1, w_2) \preceq (w_1, w_2)\);
2
\((w_1, w_2) \preceq (w_3, w_4)\) iff \((w_2, w_1) \preceq (w_3, w_4)\) iff \((w_1, w_2) \preceq (w_4, w_3)\);
3
if \((w_1, w_2) \preceq (w_3, w_4)\) and \((w_3, w_4) \preceq (w_5, w_6)\) then \((w_1, w_2) \preceq (w_5, w_6)\);
4
\((w_1, w_2) \preceq (w_3, w_4)\) or \((w_3, w_4) \preceq (w_1, w_2)\).

There is a four-place relation \(\preceq\) satisfying conditions (1), (2), (3) and (4) iff there is a function \(DIST: W^2 \rightarrow {\mathbb {ORD}}\), where \(\langle {\mathbb {ORD}}, \le _{\mathbb {ORD}} \rangle\) is some linear pre-order (i.e. is reflexive, transitive and linear) and DIST satisfies \(DIST(w_1, w_2) = DIST(w_2, w_1)\). The proof is immediate, and follows simply by reading the following equivalence from left to right and right to left respectively:

\((w_1, w_2) \preceq (w_3, w_4)\) iff \(DIST(w_1, w_2) \le _{\mathbb {ORD}} DIST(w_3, w_4)\)

1. 4 (relevant to Sect. 4.2)

A metric on a set X is a particular type of function \(d:X \times X \rightarrow {\mathbb {R}}^+\), where \({\mathbb {R}}^+\) is the set of non-negative real numbers. d is a metric just when the following conditions are satisfied, for all \(x,y \in X\):

Positivity: i. \(d(x,y) = 0 \text { if } x = y\); ii. \(d(x,y) = 0 \text { only if } x = y\).
Symmetry \(d(x,y) = d(y, x)\)
Triangle Inequality \(d(x,z) \le d(x, y) + d(y,z)\)

A pseudometric is a function \(d:X \times X \rightarrow {\mathbb {R}}^+\) that satisfies these conditions with the possible exception of the second Positivity clause.

One of the most familiar metrics is the Euclidean one on three-dimensional Euclidean space (\({\mathbb {R}}^3\)): the Euclidean distance between two three-tuples \({\underline{x}} = \langle x_1, x_2 , x_3 \rangle\) and \({\underline{y}} = \langle y_1, y_2 , y_3 \rangle\) in \({\mathbb {R}}^3\) is given by: \(d({\underline{x}}, {\underline{y}}) = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 - y_3)^2}\). The function \(d^*({\underline{x}}, {\underline{y}}) = |x_1 - y_1|\), which maps a pair of such points to their first coordinates’ absolute difference, is an example of a pseudometric that is not a metric; for instance, the \(d^*\)-distance of any two points in \({\mathbb {R}}^3\) lying on a plane with constant x-coordinate is 0.

Suppose that d is a metric on the underlying space X. This metric induces a metric \({\mathbb {D}}\) on the non-empty, closed and bounded subsets of the space—the so-called Hausdorff metric, familiar to mathematical analysts and topologists. Denoting elements of X suggestively as \(w, w_1, w_2\) etc.—because we are interested in the case in which X is the set of worlds—and letting Greek letters stand for sets of worlds (think of these as propositions), the Hausdorff metric’s definition is:

\({\mathbb {D}} (\alpha , \beta ) = \max \{\sup _{w_1\in \alpha } d(w_1, \beta ), \sup _{w_2\in \beta } d(w_2, \alpha )\},\)

where

\(d(w, \xi ) = \inf _{w^*\in \xi } d(w,w^*).\)

In words: the distance \(d(w, \xi )\) from a point w to a set \(\xi\) of points is the infimum of the distance from w to the members of \(\xi\); and the distance \({\mathbb {D}} (\alpha , \beta )\) between two sets of points is the maximum (limiting) ‘greatest separation’ between a point in the first subset and the second subset or between a point in the second subset and the first subset. The function \({\mathbb {D}}\) thus defined on closed, bounded and non-empty subsets of X is a metric.^{Footnote 37}

A simple example will help illustrate the definition. Suppose A is a square and B is a rectangle of twice the length but the same breadth as A, and whose leftmost edge coincides with A’s rightmost edge:

Say A has side length 1, whereas B has length 2 and breadth 1. Assuming the usual Euclidean metric, \(\sup _{a \in A}\) \(d(a, B) = 1\), attained when a is any point on A’s left edge (the edge furthest from B); similarly, \(\sup _{b \in B}\) \(d(b, A) = 2\), attained when b is any point on B’s right edge (the edge furthest from A). Hence \({\mathbb {D}} (A, B) = \max \{1, 2 \} = 2\).^{Footnote 38}

The Hausdorff metric is defined only on subsets of X that are non-empty, bounded and closed. For if \(\alpha\) were empty, all the sets involved in the definition of its \({\mathbb {D}}\)-distance from another subset \(\beta\) would be empty, so that \(\beta\)’s \({\mathbb {D}}\)-distance from the empty set \(\alpha\) could not be assigned a real number. And if \(\alpha\) were unbounded, its \({\mathbb {D}}\)-distance from a non-empty bounded subset \(\beta\) would be undefined because, intuitively, it would be infinite.^{Footnote 39} Finally, if \(\alpha\) were not closed, its \({\mathbb {D}}\)-distance from its closure would if defined be 0, so that \({\mathbb {D}}\) could not be a metric.^{Footnote 40}

Given a metric d on worlds, we define the Hausdorff metric \({\mathbb {D}}\) on propositions, i.e. sets of worlds, with underlying set X the set W of possible worlds.^{Footnote 41} However, as observed, the Hausdorff metric is defined only on closed, bounded and non-empty subsets of the underlying space X. So we must ask whether propositions construed as sets of worlds are closed, bounded and non-empty subsets of W.

The first sticking point is that contradictions are empty subsets of W since they contain no worlds.^{Footnote 42} The problem may be obviated by excluding the contradiction from the domain of propositions on which \({\mathbb {D}}\) is defined (but see Sect. 4.3). The second question is whether the sets of worlds propositions correspond to are closed and bounded. This is less of a worry than it might initially seem. \({\mathbb {D}}\) is a pseudometric even if \({\mathcal {L}}\)-propositions (i.e. propositions expressible by \({\mathcal {L}}\)-sentences) are closed or bounded subsets of W. Symmetry for \({\mathbb {D}}\) is immediate from its definition, which is symmetric for any sets \(\alpha\) and \(\beta\) of the underlying space. The triangle inequality for \({\mathbb {D}}\) follows by a routine argument from the triangle inequality for d.^{Footnote 43} The first half of Positivity for \({\mathbb {D}}\) is also immediate: any subset of the original space has \({\mathbb {D}}\)-distance 0 from itself. Thus \({\mathbb {D}}\) is at least a pseudometric, i.e. a metric save that it may not obey the condition that if \({\mathbb {D}}(\alpha , \beta ) = 0\) then \(\alpha = \beta\). And as Sect. 4.3 shows, that \({\mathbb {D}}\) is a pseudometric is all that’s needed.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Paseau, A.C. Non-metric Propositional Similarity. Erkenn 87, 2307–2328 (2022). https://doi.org/10.1007/s10670-020-00303-7

Download citation

Received: 04 July 2019
Accepted: 27 July 2020
Published: 14 August 2020
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10670-020-00303-7

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Non-metric Propositional Similarity

Abstract