In the previous section, we proposed an analysis of the four-place relation expressed by ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’. As well as propositional similarity, the account potentially has other applications, for example to the question of higher-order resemblances amongst properties. We saw in Sect. 1 that comparative similarity facts between properties can give rise to comparative similarity facts between propositions: our example turned on crinsom being more like scarlet than blue is to green. One could explore the Sect. 3 ideas to ground an account of comparative similarity, replacing worlds with individuals and propositions with properties. That way, higher-order resemblance between properties could be reduced to ordinary resemblances between particulars.
To keep this paper to manageable length, however, we cannot here embark on this or any other applications. Nor is a detailed comparison with other accounts feasible, simply because there are too many of them. In lieu of that, Sect. 4.1 concisely situates the account vis-à-vis the verisimilitude literature. In Sect. 4.2, we explain why our comparative models are preferable to metric approaches. In Sect. 4.3, we consider another constraint that might be imposed on the models. Section 4.4 concludes with a brief explanation of why we take world similarity as primitive.
Generic Problems with Likeness Accounts
Approaches to truthlikenessFootnote 19 are standardly divided into three categories: the content, consequence and likeness approach. Ours is akin to the last of these, the likeness approach. The focus of these approaches is on defining what it is for X to be at least as close to the truth as Y, where X and Y are variously defined as propositions or theories or models, and ‘the truth’ may be a proposition or world or model. Hilpinen (1976) and Tichý (1974) are pioneers of the approach and Niiniluoto (1987) a classic synthesis advocating a possible-words likeness approach.Footnote 20
Now our topic is propositional similarity rather than verisimilitude, so our aims and these writers’ are correspondingly different. Many of the constraints on acceptable measures of verisimilitude, for instance, mention truth and falsity (e.g. ‘some false statements may be more truthlike than some true statements’).Footnote 21 These cannot sensibly be imposed on an account of propositional similarity, which must be independent of which world is the actual one: for which world is actual makes no difference to whether \(s_1\) is more similar to \(s_2\) than \(s_3\) is to \(s_4\). By the same token, accounts which make essential use of the class of true models, or correct consequences, or true theories, or the like, are different in both spirit and implementation from ours. The verisimilitude literature does contain discussion of how to assign a measure to pairs of sets of worlds (or models).Footnote 22 However, these treatments generally presuppose that the distance between such sets is represented by a scalar quantity, typically a real number, and for that reason differ significantly from our non-metric approach.
All that said, despite the differences, the present proposal does have several points of affinity with likeness accounts and some comparisons may be drawn.Footnote 23 Rather than attempt to review the large literature, we situate the account by seeing how it deals with the main generic difficulties associated with such approaches. Oddie (2014, sec. 1.4) usefully summarises these as:Footnote 24 how is likeness determined? What is the correct functional dependence of the truthlikeness of a proposition on the likeness of its members to the actual word (the extension problem)? Finally, does the account of likeness depend on which language is used to formulate propositions (the problem of language dependence)?
In answer to the first question, likeness in the Sect. 3 models is constituted by agreement over propositional atoms and unlikeness by divergence over them. Agreement over which atoms or collection thereof counts most is left completely open, as it is determined by the order relation (Sect. 3.3’s \(<_{\mathbb {ORD}}\)), which can be anything we like. Our account is thus inherently flexible. Furthermore, unlike metric accounts, which are constrained by the size of the real numbers (the continuum) and their metric properties (e.g. reals are separable, so admit only countable ascending or descending chains), our model allows similarity to be defined on propositional languages of arbitrary cardinality. (See Sect. 4.2 for more on this point.)
Moving on to the second question, functional dependence in our model is given by the measure \(DIST_{PP}\). We do not claim that this measure of ‘distance’ is uniquely motivated. It is, however, a natural one in our metricless setting. As the literature attests, likeness approaches based on metrics have to make an arbitrary choice from a wide range of functions that map collections of distances between worlds into a single distance between sets of worlds. Take Tichý’s famous (1974) toy example, in which there are three atomic propositions h (‘it’s hot’), r (‘it’s rainy’) and w (‘it’s windy’), and in which the state of the world is \(h \wedge r \wedge w\). The distance between conjunctions of literals drawn from these three atoms is then given by the taxicab metric, so that e.g. the distance between \(h \wedge r \wedge w\) and \(h \wedge \lnot r \wedge w\) is 1 because these two worlds disagree only on the truth-value of a single atom, namely r. Suppose now that we wish to assign a distance from the set of h-worlds to the set of r-worlds. There are four of each, and 15 distances to aggregate into a single output that represents the distance between h and r.Footnote 25 Any from a wide range of such aggregating functions is compatible with intuitive constraints. More generally, as the verisimilitude literature demonstrates, in trying to define the distance from a world w to a set of worlds \(\alpha\), given a distance function on worlds, one may use a wide array of measures: the average distance between w and the elements of \(\alpha\), or the infimum of such distances, or the supremum, or some weighted average of the last two. The choices are endless, and have familiar pros and cons. As Schurz and Weingartner remark, extant approaches have not successfully solved the problem of extending truthlikeness of worlds to truthlikeness of propositions because this extrapolation is ‘intuitively underdetermined’ (2010, p. 423). In our non-metric setting, far fewer technical options are available—because there are no real values for measures to take as inputs. In fact, we know of no other non-metric account to rival that presented in Sect. 3.
By not assuming distances between worlds, we avoid the extension problem in its starkest form. The distance between propositions is simply a function of the atoms over which their constituents differ, which function exactly depending on the order. In a sense, our account does not so much avoid the problem as openly embrace it. Propositional similarity is ultimately grounded in atomic difference; how these atomic differences are then weighed entirely depends on the order (\(\le _{\mathbb {ORD}}\) in the most general setting).
Finally—in answer to the third question—our account is not problematically language-sensitive. Choosing similarity respects and an order on sets of these respects fixes the propositional similarity facts. As an illustration, suppose we let \(q_0\) be \(p_0\), \(q_1\) be \(p_0 \vee p_1\) and more generally define \(q_n\) as \(p_0 \vee p_1 \vee \cdots \vee p_n\). Clearly, the \(p_i\) and the \(q_i\) are interdefinable using Boolean operations. If the similarity respects are the \(p_i\), as in the lexicographic account say, whether we express propositions using the \(q_i\) or the \(p_i\) makes no difference to their similarity. The similarity between \(q_0\) and \(q_1\) is the same whether we express these as \(q_0\) and \(q_1\) or alternatively as \(p_0\) and \(p_0 \vee p_1\). One might contend that we have avoided language dependence only by fixing similarity respects at the outset. But it is hard to see how any similarity judgements could get off the ground without privileging some similarity respects over others, since everything is identical to anything else in infinitely many different ways.
This, at any rate, is a sketch of how our account fares with respect to the main charges brought against likeness accounts. To further motivate the avoidance of metric assumptions, we now explain why the very move from a non-metric to a metric account saddles us with problems of its own making.
Metric and Pseudometric Accounts
Our account in Sect. 3 avoided any metric assumptions. Those who are familiar with the Hausdorff metric from real analysis will recognise, however, that our comparative similarity relation is Hausdorff-like. It is a sort of non-metric implementation of the idea behind the Hausdorff metric. A comparison of our non-metric approach with this metric will help bring out the former’s virtues. To keep this paper self-contained, we include a definition and exposition of the Hausdorff metric; but since it would obtrude too much if laid out here, we relegate the material to Appendix 1.4.
On a thumbnail (Appendix 1.4 has the full details), the Hausdorff distance \({\mathbb {D}}\) between two sets of worlds \(\alpha\) and \(\beta\) measures the largest shortest distance between a world in one set and the other set. From a metric d on the space of worlds W we may thereby define a pseudometric \({\mathbb {D}}\) (see Appendix 1.4) on the set of non-contradictory propositions construed as worlds. We may exploit a pseudometric \({\mathbb {D}}\) to define a four-place comparative similarity relation \(\preceq _{{\mathbb {D}}}\) on propositions as follows:
\((\alpha , \beta ) \preceq (\gamma , \delta ) =_{def}{\mathbb {D}}(\alpha , \beta ) \le {\mathbb {D}}(\gamma , \delta )\).
We thereby recover a four-place comparative similarity relation, this time based on metric assumptions. As Appendix 1.4 explains, if propositions are closed and bounded subsets of W then \({\mathbb {D}}\) turns out to be a metric on all propositions save the contradiction. But even if they are not, there’s no harm done; for our purposes, that \({\mathbb {D}}\) is a pseudometric is enough.Footnote 26
We note that the Hausdorff metric \({\mathbb {D}}\) has received a bad press in the verisimilitude literature. Niiniluoto (1987, p. 245) for instance points out that in the special case in which \(\beta = \{w\}\), \({\mathbb {D}}(\alpha , \{w\})\) is the supremum of the distances from w to \(\alpha\), so that only the maximum distance (supremum) between a world \(\alpha\) and the set \(\{w\}\) is taken into account. This is an unwelcome consequence for a measure of world-to-proposition distance. But as a criticism of a proposition-to-proposition measure, it draws a blank, since the metric takes into account all worlds in \(\alpha\) and \(\beta\), and in our setting no propositions correspond to a single world.
We provide four reasons for preferring our non-metric comparative model in Sect. 3 to the metric/pseudometric model one suggested in this subsection. Some of these reasons generalise to objections against any metric approach to propositional similarity.
1. In the general case, the pseudometric \({\mathbb {D}}\) may not be a metric. If it isn’t, \({\mathbb {D}}\) will elide the difference between two sets of worlds being identical and being as close to one another as possible. For example, it is natural to suppose that a proposition such as ‘John is no more than 2 m tall’ is the closure in the space of worlds of the proposition ‘John is less than 2 m tall’. Now under any sensible notion of set distance in a metric space, the distance of any set from its closure should always be 0. So the \({\mathbb {D}}\)-difference between these propositions should be 0, and will be 0 if \({\mathbb {D}}\) is the Hausdorff pseudometric, despite the propositions’ distinctness. Consequently, on this account the two propositions are just as similar to one another as the first is to itself. But this is the wrong result: ‘John is no more than 2 m tall’ and ‘John is less than 2 m tall’ may be infinitesimally close to one another, as we might put it, but they are less similar to one another than either of them is to itself. There is a difference between zero and infinitesimal proximity, a difference which the pseudometric approach obliterates.Footnote 27 Our Sect. 3 models, however, can respect this difference.
2. Second, the pseudometric \({\mathbb {D}}\) is highly sensitive to the underlying world metric d. It is easy to come up with examples in which \((X, \tau )\) is a topological space, \(d_1\) and \(d_2\) are metrics compatible with the topology \(\tau\) on X, and yet there are points x, y, z and w of X such that \(d_1(x, y) < d_1 (z, w)\) and \(d_2(x, y) > d_2 (z, w)\), i.e. \(d_1\) and \(d_2\) disagree on four-place comparative facts of the form ‘x is closer to y than z is to w’. In the same way, the four-place relation \(\prec _{{\mathbb {D}}}\) derived from \({\mathbb {D}}\) depends sensitively on the world metric d.Footnote 28
Metrics \(d_1\) and \(d_2\) may thus be very similar yet generate respective pseudometrics \({\mathbb {D}}_1\) and \({\mathbb {D}}_2\) that disagree on four-place comparative propositional similarity facts. The second problem for metric/pseudometric models, then, stems from the fact that data about world similarity is hardly ever quantitative. Looking back at the motivating examples in Sect. 1 we see that the facts about propositional similarity encountered there were all comparative. They were facts of the form ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is to \(s_4\)’, or three-placed ones of the form, ‘\(s_1\) is closer in meaning to \(s_2\) than \(s_3\) is’, rather than quantitative facts of the form ‘the distance in meaning between \(s_1\) and \(s_2\) is twice that between \(s_3\) and \(s_4\)’. Now there may be cases of this kind in which we have metric information; but they will be rare. To assume a metric on worlds and use it to derive a pseudo-metric on propositions is therefore to impose a spurious precision on the subject matter, at least for most applications of interest. Propositional metrics depend overly sensitively on world metrics.
3. The third problem with pseudometric models is the cardinality constraint they impose. Thought of as a binary relation on pairs of worlds, the derived relation \(\prec _{{\mathbb {D}}}\) cannot, for example, admit uncountable-ascending or uncountable-descending chains, since the real numbers admit no such chains.Footnote 29 But we would like to allow such possibilities rather than rule them out through the model’s formal features. We may, for example, wish to allow for uncountably many similarity respects.Footnote 30 The Sect. 3 models comfortably handle arbitrarily large numbers of worlds or similarity respects.
4. Our comparative models are defined for all propositions other than the contradiction. To plug this gap, one fix would be to set \(DIST_{PP} (\alpha , \bot )\) to be some subset of indices, when \(\bot\) is any contradictory \({\mathcal {L}}\)-sentence. Setting \(DIST_{PP} (\bot , \bot ) = \emptyset\) is forced if we wish to respect the condition that a proposition is no further in meaning to itself than any two propositions are to one another.Footnote 31 A natural choice for \(DIST_{PP} (\alpha , \bot )\) when \(\alpha\) is not contradictory is the full set of indices, so that \(DIST_{PP} (\alpha , \bot ) = \omega\) when this index is \(\omega\) (as in Sect. 3.1’s lexicographic example) and \(\alpha\) is not a contradiction. This latter clause captures the thought that, as a classical logician would put it, a contradiction disagrees with any other sentence about the value of all atomic propositions, since it takes all of them to be both true and false. Naturally, from a relevance logic perspective (which there is no room to explore), a different choice might be made here.
Contrast this with a pseudometric model, for which there is no technical fix. For example, we cannot just set the \({\mathbb {D}}\)-distance from a contradiction to any other proposition to 0 on pain of violating the triangle inequality for \({\mathbb {D}}\).Footnote 32 In a metric space, the Hausdorff distance \({\mathbb {D}}(S, F)\) between a fixed closed, bounded, non-empty subset F and a closed, bounded and non-empty subset S which tends towards the empty set will not in general tend to a particular limit.Footnote 33 So unlike the Sect. 3 models, the pseudometric model seems essentially incomplete. The wisdom of adopting a non-metric account of propositional similarity is once more apparent.
For all the reasons just outlined, we take our comparative models to be superior to metric accounts. Of course, when there is metric information relating worlds to worlds, one might essay a metric account of propositional similarity. But such cases tend to be the exception rather than the norm.
The Negation Constraint
Here is a natural-sounding constraint one might imagine applies to propositional similarity, truth-conditionally understood: the degree of similarity between \(\alpha\) and \(\beta\) should match that between \(\lnot \alpha\) and \(\lnot \beta\). Call this the negation constraint.
There are two readings of the negation constraint, depending on how the word ‘match’ is understood. The stronger reading construes match as identity, i.e. insists that the degree of similarity of \(\alpha\) and \(\beta\)equal that of \(\lnot \alpha\) and \(\lnot \beta\). The weaker reading does not insist on identity, but requires that the respective degrees be close. Though the weaker reading is vague, Sect. 3’s comparative models respect it, since \(DIST_{PP}(\alpha , \beta )\) is in general close in the ordering to \(DIST_{PP}(\lnot \alpha , \lnot \beta )\). Of course, how close will depend on \({\mathbb {ORD}}\); but the fact that \(DIST_{PP}(\alpha , \beta )\) is a subset of the indices of atoms in \(\alpha\) and \(\beta\) sets an upper bound on the difference between them. And when \(\alpha\) and \(\beta\) are literals, as well as in certain other special cases, \(DIST_{PP}(\alpha , \beta )\) simply equals \(DIST_{PP}(\lnot \alpha , \lnot \beta )\).
The constraint’s stronger reading is that \(DIST_{PP}(\alpha , \beta ) = DIST_{PP}(\lnot \alpha , \lnot \beta )\) for all \(\alpha\) and \(\beta\). This constraint is not respected by our Sect. 3 models. For example, as we saw in connection with Sect. 3.1’s lexicographic model, \(DIST_{PP} (p_0 \vee p_1, p_1 \wedge \lnot p_2) = \{1, 2\}\); yet
\(DIST_{PP} (\lnot (p_0 \vee p_1), \lnot (p_1 \wedge \lnot p_2)) = DIST_{PP} (\lnot p_0 \wedge \lnot p_1, \lnot p_1 \vee p_2) = \{0, 1\}\),
as the reader may verify. In general, whether \(DIST_{PP}(\alpha , \beta ) = DIST_{PP}(\lnot \alpha , \lnot \beta )\) will depend on \(\alpha\) and \(\beta\) as well as the underlying order \({\mathbb {ORD}}\).
One way to motivate the stronger constraint is by thinking of propositions as functions from worlds to truth-values. Models in which degrees of similarity are invariant under the permutation of truth-values are formally attractive. When it comes to applications to possible worlds, however, truth-values may not be permutable. Truth-value symmetry may be desirable in a purely mathematical setting; but as soon as intended interpretations—applied models—are in play, any reason to demand such symmetry fades away.
In fact, the constraint’s stronger reading is problematic. Propositions \(\alpha\) and \(\beta\) may be contradictory while their negations are jointly consistent. For example, if we let \(\alpha = p \wedge q\) and \(\beta = \lnot p\), then \(\alpha\) and \(\beta\) are inconsistent; but \(\lnot \alpha\) and \(\lnot \beta\) are respectively equivalent to \(\lnot p \vee \lnot q\) and p, which are jointly consistent. The motivation for our models also clashes directly with the negation constraint’s stronger reading. Using the lexicographic model as an illustration, suppose \(s_1\) is \(p_0 \wedge p_1\) and \(s_2\) is \(\lnot p_0 \wedge p_2\). Sentence \(s_1\) is very dissimilar to \(s_2\), as it differs from it along the most important dimension of similarity; \(DIST_{PP} (p_0 \wedge p_1), \lnot p_0 \wedge p_2) = \max (\{0, 1\}, \{0, 2\}) = \{0, 1\}\). However, not-\(s_1\)’s similarity to not-\(s_2\) is greater than \(s_1\)’s to \(s_2\), the reason being that not-\(s_1\) and not-\(s_2\) are compatible along the most important similarity dimension. For any way the world has to be for not-\(s_1\) to be true there is a way the world is in which not-\(s_2\) is true that is more similar to this first world than any \(s_1\)-world is to any \(s_2\)-world. More succinctly, \(s_1\) and \(s_2\) are constrained to differ along the most important dimension of similarity, whereas not-\(s_1\) and not-\(s_2\) are not; from a lexicographic perspective, that means that not-\(s_1\) is more similar to not-\(s_2\) than \(s_1\) is to \(s_2\). This is precisely the result our comparative models deliver.
In sum, our comparative models respect the negation constraint’s weaker reading. They do not respect the stronger reading, for good reason.
More About Similarity?
Finally, a few words about taking similarity as primitive. Our models assumed, rather than derived, similarity respects between different worlds; these respects are ‘exogenous’ to the models. To think this problematic—an omission—is to misunderstand our aims. The models derived a four-place similarity relation from a four-place comparative similarity relation on worlds. There is no reason to suppose that the relevant similarity respects will be the same for all applications of interest. As David Lewis once noted,Footnote 34 it seems unlikely that the same ordering of respects of comparison will underlie facts about verisimilitude as well as facts about counterfactuals. Various notions of similarity vary from context to context, which is why our models allowed different orderings. Moreover, all talk of similarity of worlds is neutral between similarity in a particular respect (which itself perhaps aggregates sub-respects of similarity) or overall similarity; either is compatible with our models.Footnote 35