Non-metric Propositional Similarity

The idea that sentences can be closer or further apart in meaning is highly intuitive. Not only that, it is also a pillar of logic, semantic theory and the philosophy of science, and follows from other commitments about similarity. The present paper proposes a novel way of comparing the ‘distance’ between two pairs of propositions. We define ‘p1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_1$$\end{document} is closer in meaning to p2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_2$$\end{document} than p3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_3$$\end{document} is to p4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_4$$\end{document}’ and thereby give a precise account of comparative propositional similarity facts. Notably, our definition eschews metric assumptions, which are unrealistic in most applications of interest.


Introduction
'Penelope likes eggs for brunch' is closer in meaning to 'Penelope likes eggs for lunch' than 'Cats mew' is to 'I love New York'. Likewise, 'Tom is 16 years old' is semantically closer to 'Tom is 17 years old' than 'Dick is 78 years old' is to 'Harry is 57 years old'. And 'Thus passes the glory of the world' is closer in meaning to the Latin 'Sic transit gloria mundi' than the Mel Brooks-inspired alternative translation 'Gloria is sick'. Such examples suggest that some statements are closer in meaning than others.
Logic, or more precisely formalisation, also relies on the idea that statements are closer or further apart in meaning in this same sense. Among the mostly implicit criteria we appeal to when formalising is that of semantic proximity. This criterion enjoins us to formalise the natural-language sentence s as a formal sentence if may be interpreted so as to be as close in meaning to s as possible. 1 For example, ∃xFx is on this criterion a better propositional formalisation of s ='Someone is French' than ∀xFx is. The reason is that some interpretation of ∃xFx is closer in meaning to s than any interpretation of ∀xFx.
A second philosophical application of the idea that there can be greater or lesser semantic distance between two propositions comes from the philosophy of language. 2 Some theories of meaning formalise natural-language sentences in some preferred formal language(s) and provide satisfaction-or truth-conditions for the formal vocabulary. 3 These theories' success turns partly on whether the resulting sentences' satisfaction-or truth-conditions are sufficiently close in meaning to those of the original sentences.
A third connection is with the philosophy of science. Ever since Popper (1963, ch. 10), philosophers of science have studied formal models of verisimilitude in an attempt to model the intuitive idea that some theories or propositions are closer to the truth than others. Take the (somewhat hackneyed) example of the number of planets: 4 the statement that there are 9 of them is closer to the truth than the statement that there are 9 billion of them. The claim ' s 1 is more verisimilar than s 2 ' is the special case of the claim ' s 1 is closer to s 3 than s 2 is to s 3 ' in which s 3 is the truth of the matter.
The existence of propositional similarity facts also follows from another consideration. Sharing identical or like properties makes for similarity. For example, a crimson flag is more similar to a scarlet flag than a blue flag is to a green flag, other things being equal. As a consequence, the sentence 'the flag is crimson' is truthconditinally more similar to 'the flag is scarlet' than 'the flag is blue' is to 'the flag is green'. The way the world has to be for the first sentence to be true is more similar to how it has to be for the second to be true than the way the world has to be for the third to be true is to how it has to be for the fourth to be true. Similarity of properties or objects thus makes for propositional similarity.
In short, the idea that sentences can be closer or further apart in meaning is highly intuitive. Moreover, it is a pillar of logic, semantic theory and the philosophy of science, and a consequence of other commitments regarding similarity. Our aim here is to define the relation expressed by ' s 1 is closer in meaning to s 2 than s 3 is to s 4 '. Although the notion of verisimilitude has been extensively studied (Sect. 4 contains some references), the proposal presented here for comparing the 'distance' between two pairs of propositions is novel. Previous discussion in the literature proceeds by assuming an underlying metric that captures the distance between worlds. The present article, in contrast, eschews metric assumptions, which are unrealistic in all but a few applications of interest. We do not, for example, assume that there is a numerical distance between our world and the world 5 most like ours in which Hillary Clinton won the 2016 US presidential election.

Non-metric Propositional Similarity
A point worth clarifying at the outset is that there are several notions of similarity among natural-language sentences. The sentence 'Hillary Clinton is well' is in a sense semantically very close to 'Hillary Clinton is unwell': both sentences assert something about a particular person's health, as opposed to say ' e i + 1 = 0 ' which has an entirely different subject matter. 6 Yet truth-conditionally, 'Hillary Clinton is well' and 'Hillary Clinton is unwell' are far from close. Likewise, the tautology 'It's raining or it's not raining' is similar to the contradiction 'It's raining and it's not raining' in terms of both linguistic expression and subject matter; yet as far as semantic value is concerned, the two sentences could not be further apart. In this paper, we focus on a notion of similarity among propositions based on similarity neither of linguistic expression nor of subject matter but of truth-conditions. The examples thus far have hopefully given readers a sense of the target notion, which it will now be our business to explicate and further clarify.

Preliminaries
We wish to account for comparative propositional similarity facts expressed by sentences of the form ' s 1 is closer in meaning to s 2 than s 3 is to s 4 '. Our approach will be to derive such facts from similarity facts about worlds, avoiding any metric assumptions (for reasons that will emerge in Sect. 4). Section 3 presents the models and Sect. 4 comments on them. Before that, a few preliminaries about the framework are in order.
It will be useful to model worlds as sets of propositions and propositions as sets of worlds. We construe worlds as maximally consistent sets of literals (atoms/sentence letters or negations thereof) from a fixed language. This propositional language, which we call L , has {p 0 , p 1 … , p n , …} as its set of atoms/sentence letters, is truth-functionally complete and has standard formation rules. 7 For example, under this construal Construing worlds as sets of propositions is a matter of formal convenience only: we do not assume that worlds really are sets of propositions. The assumption is only that, for certain purposes, worlds may be modelled as sets of propositions. To put it another way, we are interested in worlds only in as much as they do or don't share some salient properties, expressed by the propositions p 0 , p 1 , p 2 , .... That is, when presented with a world w, we ask whether it satisfies p 0 , p 1 , p 2 , ...; the answer to these questions is all the information about w we are interested in. 8 Worlds in our 1 3 models are thus really equivalence classes of worlds in which the given literals hold. The set of worlds may be thought of as the set (or class) of all possible worlds quotiented by L-equivalence, where L is the propositional language we are interested in. Two worlds are in the same L-equivalence class just when they agree on the truthvalues of L-atoms, i.e. similarity respects of interest.
Our models also construe propositions as sets of possible worlds. Thus p 3 ∨ p 4 = {w ∶ p 3 ∨ p 4 is true at w} is a proposition, and if w * = {p i ∶ i is even } ∪ {¬p i ∶ i is odd } as above, w * ∈ p 3 ∨ p 4 , since p 4 (and hence p 3 ∨ p 4 ) is true at w * . Modelling propositions as sets of possible worlds is a familiar technique in analytic philosophy, linguistics and logic. As above, we are not really assuming that propositions really are sets of worlds. We could, if we were less sparing with notation, mark the relation of a proposition holding at a world with a notational primitive. But it will be simpler to construe worlds as sets of propositions, as long as we keep in mind what this means. 9 So: we use the identity sign when representing propositions as sets of worlds and worlds as sets of propositions for brevity, not out of philosophical commitment. Naturally, modelling propositions as sets of worlds is adequate for certain purposes but too coarse-grained for others. For example, modelling all mathematical, necessary and logical truths as the same proposition entails identifying them all. However, these limitations are as familiar as they are general. They no more threaten our comparative model(s) than they do similar models in metaphysics, the philosophy of language, etc. Finally, though natural, we need not think of the worlds as metaphysically possible; we may equally think of them as epistemically possible, or in some other way. Our models are neutral between various such interpretations.

Comparative Models
To introduce our comparative models, recall two canonical versions of counterfactual logic, due to Stalnaker (1968) and Lewis (1973) respectively. To spell out the truth-conditions of 'If A were the case then B would be the case', Stalnaker and Lewis assume the existence of a three-place comparative similarity relation among worlds. This may be glossed as ' w 2 is more similar to w 1 than w 3 is to w 1 , as judged from the perspective of w 1 '. Stalnaker and Lewis also make further, slightly different, 10 assumptions on worlds, and base their semantics for propositional counterfactual logic (which need not detain us here) on the metaphysical assumption that worlds stand in the similarity relation just specified. Note that the clause 'as judged from the perspective of w 1 ' indicates which standards of similarity determine the relative proximity of w 2 and w 3 to w 1 ; but if we prescind from applications in counterfactual logic, there is no reason to suppose that these standards are somehow fixed by w 1 , the world of comparison. Taking the standards of similarity as independently fixed for now (i.e. fixed in a way we are not concerned with), we symbolise a threeplace comparative similarity relation between worlds as (w 2 , w 1 ) < (w 3 , w 1 ), read as ' w 2 is more similar to w 1 than w 3 is to w 1 '. This three-place relation can be recaptured from the four-place relation (w 2 , w 1 ) < (w 3 , w 4 ), read as ' w 2 is closer to w 1 than w 3 is to w 4 ' by setting w 4 = w 1 . Our comparative models exploit the existence of a four-place comparative similarity relation among worlds to define a four-place comparative similarity relation among sets of worlds or propositions. The latter may be written as: read as 'proposition is more similar to proposition than proposition is to proposition ', relative to some fixed standards of similarity. For expository clarity, we present a special case of the model in Sect. 3.1 before generalising it in Sects. 3.2-3.3.

A special case: the lexicographic model
Suppose there is a set of similarity respects, the first of which makes for greatest similarity, the second of which is next-most important, and so on. As always, we may think of these respects as the L-propositions p 0 , p 1 , … (renumbering if necessary). Thus suppose we are interested in the outcome of a Formula 1 race, and we base our similarity comparisons among worlds first and foremost on who wins the race, second, on who comes second, and so on. In that case, world w 1 is more similar than w 2 is to the actual world w iff the highest-placed divergence between the drivers' final positions in w 1 compared to their final positions in the actual world is higher than the highest-placed divergence between the drivers' final positions in w 2 compared to their actual final positions. In the general case, p 0 is the similarity respect that matters most, p 1 matters second-most, etc. 11 Let's see how to make formal sense of this idea.
As above, we take a world to be a maximally consistent set of L-literals, and an L-proposition to be the set of worlds at which it is true. (We assume in this subsection that the L-atoms p 0 , p 1 , … are countably infinite.) We also order the subsets of the non-negative integers = {0, 1, … , n, ⋯} lexicographically, using the standard ordering on . More precisely, a set s 1 ⊆ is smaller than s 2 ⊆ if the least element of s 1 is larger than the least element of s 2 or s 1 has no least element (i.e. s 1 is empty) but s 2 does (i.e. s 2 is non-empty); or, if the least elements of s 1 and s 2 both exist and are equal, the second-least element of s 1 is larger than the second-least element of s 2 or s 1 has no second-least element but s 2 does; and so on. For example, under this ordering, which we label , the seven subsets ∅ , {0} , {1} , {2} , {1, 2} , {2, 4} and of are ordered as follows Now if w i and w j are worlds-maximally consistent sets of L-literals-we set In other words, DIST WW (w i , w j ) is the set of indices n such that w i and w j disagree on p n 's truth-value. Here as elsewhere we are implicitly thinking of l ∈ w as l's being true in w, for l a literal. Thinking also of {p 0 , p 1 … , p n , …} as the (labelled) set of comparison respects in descending order of importance, the higher DIST WW (w i , w j ) is in the -ordering, the more w i and w j diverge; conversely, the lower DIST WW (w i , w j ) is in the -ordering, the more the two worlds are similar. Note that if w i = w j , then DIST WW (w i , w j ) = � , the least element in the -ordering; this is as it should be, since w i and w j are then as similar as possible since they're identical. DIST WW is thus a qualitative (rather than quantitative) measure of the 'distance' between worlds. 12 Now let be a (non-contradictory) L-proposition, i.e. a (non-empty) set of worlds. We let The function DIST WP is thus a measure of the 'distance' of a world to a set of worlds corresponding to a proposition. 13 The subscript 'WP' in ' DIST WP ' indicates that the distance in question is that of a world to a proposition; similarly, mutatis mutandis, for DIST WW as well as for DIST PP (to follow). According to DIST WP , then, w's distance to is its limiting closest distance to an element of , a limit that is always attained. In particular, if w is a member of then its distance is as small as possible (in the -ordering), since it is the empty set. If were an arbitrary set of worlds, DIST WP (w, ) would not necessarily have a -least element; but since is an L-proposition, it is guaranteed to have one. The proof is in Appendix 1.1. Now let and both be (non-contradictory) L-propositions. We set

Non-metric Propositional Similarity
As above, if and were arbitrary sets of worlds, DIST WP ( , ) would not necessarily have a -largest element; but since they are L-propositions, it is guaranteed to have one. The proof may be found in Appendix 1.2.
Let's illustrate our definitions thus far with an example. Consider the L-propositions p 0 ∨ p 1 and p 1 ∧ ¬p 2 , construed as sets of worlds, which are in turn construed as maximal consistent sets of L-literals. Since any world in p 1 ∧ ¬p 2 is in p 0 ∨ p 1 , DIST WP (w, p 0 ∨ p 1 ) = � for any world w in p 1 ∧ ¬p 2 . Let us next consider how far a world w in p 0 ∨ p 1 may be from p 1 ∧ ¬p 2 . By considering the value of p 0 in w, we see that in order to minimise w's proximity to p 1 ∧ ¬p 2 (equivalently: maximise its distance from p 1 ∧ ¬p 2 ), the literals p 0 , ¬p 1 and p 2 must all be true in w; but for any p i with i ≥ 3 , there is a world in p 1 ∧ ¬p 2 that p i -wise matches any given world , and the same goes for any world w in which p 1 is false and p 2 is true (and in which, consequently, Putting everything together, we may define a four-place similarity relation ≺ among L-propositions: with the left-hand side read as 'proposition is more similar to proposition than proposition is to proposition '. Similarly, (⪯ is the weak counterpart of the strict relation ≺ .) An easy check shows that ordered pairs of propositions under ⪯ form a linear pre-order, i.e. a transitive, reflexive, linear, 14 but not necessarily antisymmetric, 15 order. Readers familiar with the Hausdorff metric will recognise DIST PP as a sort of qualitative-metricless-version of it; this point will be explained in Sect. 4.2, when we stress the importance of doing without metric assumptions.

A general comparative model
Section 3.1 assumed that similarity among worlds is determined lexicographically with reference to a countable infinity of respects of similarity: the first respect matters most, the second matters second-most, and so on. From the order < on the subsets of thereby generated, we derived a four-place comparative relation ≺ (or ⪯ for the weak relation) on propositions. The lexicographic model makes p 0 an absolute dictator, in the sense that dissimilarity in this respect cannot be compensated for by similarity in others; likewise for similarity in the p i -respect versus similarity in lesser-ranked respects. To avoid this consequence, a more general comparative model does away with the assumption that similarity of worlds is determined lexicographically; it simply generates ≺ from an arbitrary linear pre-order. We may also relax the assumption that the number of atoms is countably infinite, since nothing turns on it. Our more general model thus assumes that is a (finite or infinite) cardinal and that worlds are maximally consistent sets of literals drawn from the atom set {p i ∶ i ∈ } . Propositions are now wellformed formulas in a (truth-functionally complete) propositional language whose set of atoms is {p i ∶ i ∈ } . We assume that ≤ ℝ is a weak linear pre-order on ℙ( ) , i.e. that ≤ ℝ is reflexive, transitive, and linear, but not necessarily antisymmetric. If ≤ ℝ happens to also be antisymmetric, we proceed as in Sect. 3.1, simply replacing ≤ with ≤ ℝ . In the more general case in which ≤ ℝ may not be antisymmetric, having defined DIST WW as in Sect. 3.1, we define DIST WP (w, ) to be any (arbitrarily chosen) ℝ -least member of and DIST PP ( , ) to be any (arbitrarily chosen) ℝ -largest element of That and are propositions rather than arbitrary sets of worlds guarantees that there are ℝ -least and ℝ -largest elements in the respective definitions of DIST WP and DIST PP . 16 When ≤ ℝ is not antisymmetric, we pick any one from an equivalence class of subsets of ; and if ⪯ happens to be a linear order (and so antisymmetric), then each equivalence class contains exactly one element, in which case we pick its only member.
Another example will help illustrate the more general account. Suppose as in Sect. 3.1 that worlds are represented as maximally consistent sets of L-literals, that the set of atoms of L is once more {p i ∶ i ∈ } , and that the distance of w 1 to w 2 is the set of atomic indices on which they differ. This time, however, the order on subsets of is determined by the number of propositions on which worlds differ, the smaller the number the more the two subsets are similar; when this number is the same, there is a tie, represented by ∼ (equivalently the obtaining of both ⪯ and ⪰ ). Under this ordering, which we label and of the set are ordered as follows: It is easy to see that the two-place comparative relation on pairs of propositions induced by the ℕ -ordering on DIST PP is a linear pre-order, but not a linear order.

Non-metric Propositional Similarity
The semantic proximity of any two worlds which differ over the status of p 1 only is, for instance, the same as the semantic proximity of any two worlds which differ over the status of p 2 only.

An Even More General Comparative Model
The models in Sects. 3.1-3.2 assumed a four-place similarity relation among worlds. In fact, the relation took a special form: in abstract terms, we assumed the existence of a two-place function DIST on the product of the space of worlds with codomain a linear pre-order, whose order relation we may call < ℝ , such that and similarly for the weak relation ⪯ . A general model based on the above lines simply assumes the existence of such a function DIST ∶ W 2 → ℝ , where W is the space of worlds and ⟨ ℝ , < ℝ ⟩ is a linear pre-order. In this case, we define DIST WP (w, ) to be any (arbitrarily chosen) ℝ -least member of and DIST PP ( , ) to be any (arbitrarily chosen) ℝ -largest element of Now DIST WP and DIST PP are well-defined only if the relevant sets are guaranteed to have, respectively, an ℝ -least member and an ℝ -largest element, for all propositions and ; we may call these combined assumptions ℝ -min-andmax. If ℝ -min-and-max obtains, the present model is more general than either of those in Sects. 3.1-3.2, because we need no longer assume that the distance between two worlds is determined by the set of L-atoms on which they differ. Indeed, in that case worlds need no longer be construed as maximally consistent sets of L-literals; they can be anything whatsoever. In general, of course, without some background assumptions, there is no reason to suppose that ℝ -min-and-max holds. 17 It's worth illustrating this third, most general, model with an example. An interesting use of similarity in metaphysics is the apparently lexicographic explication of similarity among worlds found in Lewis (1979). This account is intended to underpin his possible-worlds account of counterfactuals and thus ultimately of causation. Lewis's famous four conditions are: (1) It is of the first importance to avoid big, widespread, diverse violations of law.
(2) It is of the second importance to maximize the spatiotemporal region throughout which perfect match of particular fact prevails.
(3) It is of the third importance to avoid even small, localized, simple violations of law.
(4) It is of little or no importance to secure approximate similarity of particular fact, even in matters that concern us greatly. (Lewis 1979, pp. 47-48) Here is a natural way of making these remarks more precise. Assume that each of the four quantities in (1)-(4) can be measured; e.g. M 1 may be a linear order such that if a world w 1 's violations of law are small, narrowly confined, and of the same kind, then w 1 's measure in the M 1 -dimension is m 1 , a smaller element of M 1 than m 2 , which represents the amount of w 2 's violations of law, which are big, widespread and diverse. Similarly for M 2 , M 3 and M 4 , mutatis mutandis. The order ⟨ ℝ , < ℝ ⟩ is then a lexicographic order on M 1 × M 2 × M 3 × M 4 . In other words, and m 2 < m * 2 , etc. Each world is then assigned a quadruple DIST(w, w @ ) = ⟨m 1 , m 2 , m 3 , m 4 ⟩ that measures w's divergence from the reference world w @ . If we assume ℝ -min-and-max in this context, our definition formally captures Lewis's comparative similarity relation. (Naturally, there may be other ways of interpreting Lewis's four conditions.) Finally, all the material in Sect. 3 may be amended so as to define the three-place ' is more similar to than is'. The 'distance' between and can be defined as above in terms of the atomic indices worlds in these sets differ over, and this 'distance' may then be compared to that between and . We have opted for a fourplace relation here not out of any strong convinction, but mainly because as we saw in Sect. 1, we intuitively grasp four-place comparative similarity facts of the form ' is more similar to than is to ', and from this four-place relation the threeplace one is immediately definable. The literature also contains arguments that the four-place relation is methodologically preferable to the three-place one. 18 Whether or not our treatment based on a four-place relation is best recast in terms of a threeplace relation is a question we will not further address. We note merely that nothing in Sect. 3 respectively precludes or prejudges a positive answer to it.
This completes the comparative model's exposition. We now turn to some philosophical remarks.

Philosophical Commentary
In the previous section, we proposed an analysis of the four-place relation expressed by ' s 1 is closer in meaning to s 2 than s 3 is to s 4 '. As well as propositional similarity, the account potentially has other applications, for example to the question of 18 Tim Williamson has argued that, in general, this three-place relation is expressively inadequate for formulating comparative similarity claims, his example being 'there are three things of which the first does not differ from anything by as much as the other two differ from each other' (Williamson 1988, p. 471). As Williamson shows (1988, p. 467), a four-place comparative similarity relation cannot be defined from a three-place one. Moreover, the logic of the four-place comparative similarity relation is finitely axiomatisable, whereas that of the three-place comparative similarity relation is not (Williamson 1988, section 4).

3
Non-metric Propositional Similarity higher-order resemblances amongst properties. We saw in Sect. 1 that comparative similarity facts between properties can give rise to comparative similarity facts between propositions: our example turned on crinsom being more like scarlet than blue is to green. One could explore the Sect. 3 ideas to ground an account of comparative similarity, replacing worlds with individuals and propositions with properties. That way, higher-order resemblance between properties could be reduced to ordinary resemblances between particulars.
To keep this paper to manageable length, however, we cannot here embark on this or any other applications. Nor is a detailed comparison with other accounts feasible, simply because there are too many of them. In lieu of that, Sect. 4.1 concisely situates the account vis-à-vis the verisimilitude literature. In Sect. 4.2, we explain why our comparative models are preferable to metric approaches. In Sect. 4.3, we consider another constraint that might be imposed on the models. Section 4.4 concludes with a brief explanation of why we take world similarity as primitive.

Generic Problems with Likeness Accounts
Approaches to truthlikeness 19 are standardly divided into three categories: the content, consequence and likeness approach. Ours is akin to the last of these, the likeness approach. The focus of these approaches is on defining what it is for X to be at least as close to the truth as Y, where X and Y are variously defined as propositions or theories or models, and 'the truth' may be a proposition or world or model. Hilpinen (1976) and Tichý (1974) are pioneers of the approach and Niiniluoto (1987) a classic synthesis advocating a possible-words likeness approach. 20 Now our topic is propositional similarity rather than verisimilitude, so our aims and these writers' are correspondingly different. Many of the constraints on acceptable measures of verisimilitude, for instance, mention truth and falsity (e.g. 'some false statements may be more truthlike than some true statements'). 21 These cannot sensibly be imposed on an account of propositional similarity, which must be independent of which world is the actual one: for which world is actual makes no difference to whether s 1 is more similar to s 2 than s 3 is to s 4 . By the same token, accounts which make essential use of the class of true models, or correct consequences, or true theories, or the like, are different in both spirit and implementation from ours. The verisimilitude literature does contain discussion of how to assign a measure to pairs of sets of worlds (or models). 22 However, these treatments generally presuppose that the distance between such sets is represented by a scalar quantity, typically a real number, and for that reason differ significantly from our non-metric approach. 19 We take 'truthlikeness' and 'verisimilitude' as synonyms, unlike some writers such as e.g. Zwart (2001, p. 27). 20 Other papers outside the verisimilitude literature directly emerging from Popper (1963) and that relate to the present paper in different ways include Bigelow (1976), Blumson (2018) and Williamson (1988). 21 See Niiniluoto (1987, pp. 232-233) for a list of such constraints. 22 An example being Niiniluoto (1987, pp. 242-256).
All that said, despite the differences, the present proposal does have several points of affinity with likeness accounts and some comparisons may be drawn. 23 Rather than attempt to review the large literature, we situate the account by seeing how it deals with the main generic difficulties associated with such approaches. Oddie (2014, sec. 1.4) usefully summarises these as: 24 how is likeness determined? What is the correct functional dependence of the truthlikeness of a proposition on the likeness of its members to the actual word (the extension problem)? Finally, does the account of likeness depend on which language is used to formulate propositions (the problem of language dependence)?
In answer to the first question, likeness in the Sect. 3 models is constituted by agreement over propositional atoms and unlikeness by divergence over them. Agreement over which atoms or collection thereof counts most is left completely open, as it is determined by the order relation (Sect. 3.3's < ℝ ), which can be anything we like. Our account is thus inherently flexible. Furthermore, unlike metric accounts, which are constrained by the size of the real numbers (the continuum) and their metric properties (e.g. reals are separable, so admit only countable ascending or descending chains), our model allows similarity to be defined on propositional languages of arbitrary cardinality. (See Sect. 4.2 for more on this point.) Moving on to the second question, functional dependence in our model is given by the measure DIST PP . We do not claim that this measure of 'distance' is uniquely motivated. It is, however, a natural one in our metricless setting. As the literature attests, likeness approaches based on metrics have to make an arbitrary choice from a wide range of functions that map collections of distances between worlds into a single distance between sets of worlds. Take Tichý's famous (1974) toy example, in which there are three atomic propositions h ('it's hot'), r ('it's rainy') and w ('it's windy'), and in which the state of the world is h ∧ r ∧ w . The distance between conjunctions of literals drawn from these three atoms is then given by the taxicab metric, so that e.g. the distance between h ∧ r ∧ w and h ∧ ¬r ∧ w is 1 because these two worlds disagree only on the truth-value of a single atom, namely r. Suppose now that we wish to assign a distance from the set of h-worlds to the set of r-worlds. There are four of each, and 15 distances to aggregate into a single output that represents the distance between h and r. 25 Any from a wide range of such aggregating functions is compatible with intuitive constraints. More generally, as the verisimilitude literature demonstrates, in trying to define the distance from a world w to a set of worlds , given a distance function on worlds, one may use a wide array of measures: the average distance between w and the elements of , or the infimum of such distances, or the supremum, or some weighted average of the last two. The 23 According to the definion on page 25 of Zwart (2001), the lexicographic account with infinitely many propositional atoms is not strictly speaking a likeness approach. When the set of atoms is finite, the lexicographic account is a likeness one according to Zwart's definition, though this does not extend to the general orderings discussed in Sect. 3.3. 24 Chapters 1-3 of Zwart (2001) offer an accessible and mathematically precise overview; Niiniluoto (2018) is a very recent review. 25 Since there are 4 h-worlds and 4 r-worlds, there are 4 × 4 = 16 h-world and r-world pairs; one of these, the distance between h ∧ r ∧ w and h ∧ r ∧ ¬w , is counted twice over by the product. choices are endless, and have familiar pros and cons. As Schurz and Weingartner remark, extant approaches have not successfully solved the problem of extending truthlikeness of worlds to truthlikeness of propositions because this extrapolation is 'intuitively underdetermined ' (2010, p. 423). In our non-metric setting, far fewer technical options are available-because there are no real values for measures to take as inputs. In fact, we know of no other non-metric account to rival that presented in Sect. 3.
By not assuming distances between worlds, we avoid the extension problem in its starkest form. The distance between propositions is simply a function of the atoms over which their constituents differ, which function exactly depending on the order. In a sense, our account does not so much avoid the problem as openly embrace it. Propositional similarity is ultimately grounded in atomic difference; how these atomic differences are then weighed entirely depends on the order ( ≤ ℝ in the most general setting).
Finally-in answer to the third question-our account is not problematically language-sensitive. Choosing similarity respects and an order on sets of these respects fixes the propositional similarity facts. As an illustration, suppose we let q 0 be p 0 , q 1 be p 0 ∨ p 1 and more generally define q n as p 0 ∨ p 1 ∨ ⋯ ∨ p n . Clearly, the p i and the q i are interdefinable using Boolean operations. If the similarity respects are the p i , as in the lexicographic account say, whether we express propositions using the q i or the p i makes no difference to their similarity. The similarity between q 0 and q 1 is the same whether we express these as q 0 and q 1 or alternatively as p 0 and p 0 ∨ p 1 . One might contend that we have avoided language dependence only by fixing similarity respects at the outset. But it is hard to see how any similarity judgements could get off the ground without privileging some similarity respects over others, since everything is identical to anything else in infinitely many different ways.
This, at any rate, is a sketch of how our account fares with respect to the main charges brought against likeness accounts. To further motivate the avoidance of metric assumptions, we now explain why the very move from a non-metric to a metric account saddles us with problems of its own making.

Metric and Pseudometric Accounts
Our account in Sect. 3 avoided any metric assumptions. Those who are familiar with the Hausdorff metric from real analysis will recognise, however, that our comparative similarity relation is Hausdorff-like. It is a sort of non-metric implementation of the idea behind the Hausdorff metric. A comparison of our non-metric approach with this metric will help bring out the former's virtues. To keep this paper selfcontained, we include a definition and exposition of the Hausdorff metric; but since it would obtrude too much if laid out here, we relegate the material to Appendix 1.4.
On a thumbnail (Appendix 1.4 has the full details), the Hausdorff distance between two sets of worlds and measures the largest shortest distance between a world in one set and the other set. From a metric d on the space of worlds W we may thereby define a pseudometric (see Appendix 1.4) on the set of non-contradictory propositions construed as worlds. We may exploit a pseudometric to define a fourplace comparative similarity relation ⪯ on propositions as follows: We thereby recover a four-place comparative similarity relation, this time based on metric assumptions. As Appendix 1.4 explains, if propositions are closed and bounded subsets of W then turns out to be a metric on all propositions save the contradiction. But even if they are not, there's no harm done; for our purposes, that is a pseudometric is enough. 26 We note that the Hausdorff metric has received a bad press in the verisimilitude literature. Niiniluoto (1987, p. 245) for instance points out that in the special case in which = {w} , ( , {w}) is the supremum of the distances from w to , so that only the maximum distance (supremum) between a world and the set {w} is taken into account. This is an unwelcome consequence for a measure of world-to-proposition distance. But as a criticism of a proposition-to-proposition measure, it draws a blank, since the metric takes into account all worlds in and , and in our setting no propositions correspond to a single world.
We provide four reasons for preferring our non-metric comparative model in Sect. 3 to the metric/pseudometric model one suggested in this subsection. Some of these reasons generalise to objections against any metric approach to propositional similarity.
1. In the general case, the pseudometric may not be a metric. If it isn't, will elide the difference between two sets of worlds being identical and being as close to one another as possible. For example, it is natural to suppose that a proposition such as 'John is no more than 2 m tall' is the closure in the space of worlds of the proposition 'John is less than 2 m tall'. Now under any sensible notion of set distance in a metric space, the distance of any set from its closure should always be 0. So the -difference between these propositions should be 0, and will be 0 if is the Hausdorff pseudometric, despite the propositions' distinctness. Consequently, on this account the two propositions are just as similar to one another as the first is to itself. But this is the wrong result: 'John is no more than 2 m tall' and 'John is less than 2 m tall' may be infinitesimally close to one another, as we might put it, but they are less similar to one another than either of them is to itself. There is a difference between zero and infinitesimal proximity, a difference which the pseudometric approach obliterates. 27 Our Sect. 3 models, however, can respect this difference. 26 As should be clear, a pseudometric uniquely defines a four-place relation ⪯ , but not vice versa. For example, if the pseudometric satisfies ( , ) ⪯ ( , ) iff ( , ) ≤ ( , ) then so does, say, the pseudometric 2 i.e. 2 times , which multiplies all -distances by 2. The assumption that a particular pseudometric exists is thus a stronger hypothesis than that a particular comparative strict relation ≺ exists. (Or that a particular such weak relation ⪯ exists.) That said, the existence of a four-place comparative relation ≺ on a countable (finite or countably infinite) set satisfying some natural assumptions is equivalent to the existence of a metric. A version of this argument may be found in Suppes et al. (1989, p. 162). 27 Note that if the worlds in our models are not possible worlds but equivalence classes of possible worlds up to L-equivalence then no distinction can be drawn between (1) a pair ⟨w, w⟩ whose first member equals its second member, and (2) the pair ⟨w, w ′ ⟩ with the same w and w ′ a distinct world from w's same equivalence class. In contrast to the objection just raised, this is a perfectly acceptable consequence of thinking of L-propositions as the only similarity respects that matter.

3
Non-metric Propositional Similarity 2. Second, the pseudometric is highly sensitive to the underlying world metric d. It is easy to come up with examples in which (X, ) is a topological space, d 1 and d 2 are metrics compatible with the topology on X, and yet there are points x, y, z and w of X such that d 1 (x, y) < d 1 (z, w) and d 2 (x, y) > d 2 (z, w) , i.e. d 1 and d 2 disagree on fourplace comparative facts of the form 'x is closer to y than z is to w'. In the same way, the four-place relation ≺ derived from depends sensitively on the world metric d. 28 Metrics d 1 and d 2 may thus be very similar yet generate respective pseudometrics 1 and 2 that disagree on four-place comparative propositional similarity facts. The second problem for metric/pseudometric models, then, stems from the fact that data about world similarity is hardly ever quantitative. Looking back at the motivating examples in Sect. 1 we see that the facts about propositional similarity encountered there were all comparative. They were facts of the form ' s 1 is closer in meaning to s 2 than s 3 is to s 4 ', or three-placed ones of the form, ' s 1 is closer in meaning to s 2 than s 3 is', rather than quantitative facts of the form 'the distance in meaning between s 1 and s 2 is twice that between s 3 and s 4 '. Now there may be cases of this kind in which we have metric information; but they will be rare. To assume a metric on worlds and use it to derive a pseudo-metric on propositions is therefore to impose a spurious precision on the subject matter, at least for most applications of interest. Propositional metrics depend overly sensitively on world metrics.
3. The third problem with pseudometric models is the cardinality constraint they impose. Thought of as a binary relation on pairs of worlds, the derived relation ≺ cannot, for example, admit uncountable-ascending or uncountable-descending chains, since the real numbers admit no such chains. 29 But we would like to allow such possibilities rather than rule them out through the model's formal features. We may, for example, wish to allow for uncountably many similarity respects. 30 The Sect. 3 models comfortably handle arbitrarily large numbers of worlds or similarity respects. 4. Our comparative models are defined for all propositions other than the contradiction. To plug this gap, one fix would be to set DIST PP ( , ⊥) to be some subset of 28 Here is an illustration of ≺ 's sensitive dependence on d (via ). Assume that we assign each similarity respect p 0 , p 1 , p 2 … a weight, i.e. a positive real number; as usual, we imagine that context fixes these respects and weights. Thus if two worlds differ with respect to p 0 , this contributes x 0 to their distance (or dissimilarity), if they differ with respect to p 1 this contributes x 1 , and so on, with the x i all positive reals (where i is a natural number). In other words, if w i and w j are worlds, i.e. maximally consistent sets of L-literals, and x n is a positive real number for each n, we may set d(w i , w j ) = def Σx n , for n such that p n ∉ w i ∩ w j and ¬p n ∉ w i ∩ w j . Suppose now that w 1 = {p n ∶ n ≥ 0} , w 2 = {¬p 0 } ∪ {p n ∶ n ≥ 1} and w 3 = {p 0 , ¬p 1 , ¬p 2 } ∪ {p n ∶ n ≥ 3} . And suppose further that d 1 is the metric defined by taking each x n as 2 −n , and that the metric d 2 is instead obtained by setting x n as 1.5 −n . Then 1 = d 1 (w 1 , w 2 ) > d 1 (w 1 , w 3 ) = 1∕2 + 1∕4 = 3∕4 , whereas 1 = d 2 (w 1 , w 2 ) < d 2 (w 1 , w 3 ) = 2∕3 + 4∕9 = 10∕9 . Evidently, this difference in the four-place comparative relation on propositions derived from the metrics d 1 or d 2 carries over to differences in 1 or 2 respectively derived from these metrics. A quick way to see this is that when is derived from d as above, ({w 1 }, {w 2 }) = d(w 1 , w 2 ) , so that 1 ({w 1 }, {w 2 }) > 1 ({w 1 }, {w 3 }) whereas indices, when ⊥ is any contradictory L-sentence. Setting DIST PP (⊥, ⊥) = � is forced if we wish to respect the condition that a proposition is no further in meaning to itself than any two propositions are to one another. 31 A natural choice for DIST PP ( , ⊥) when is not contradictory is the full set of indices, so that DIST PP ( , ⊥) = when this index is (as in Sect. 3.1's lexicographic example) and is not a contradiction. This latter clause captures the thought that, as a classical logician would put it, a contradiction disagrees with any other sentence about the value of all atomic propositions, since it takes all of them to be both true and false. Naturally, from a relevance logic perspective (which there is no room to explore), a different choice might be made here.
Contrast this with a pseudometric model, for which there is no technical fix. For example, we cannot just set the -distance from a contradiction to any other proposition to 0 on pain of violating the triangle inequality for . 32 In a metric space, the Hausdorff distance (S, F) between a fixed closed, bounded, non-empty subset F and a closed, bounded and non-empty subset S which tends towards the empty set will not in general tend to a particular limit. 33 So unlike the Sect. 3 models, the pseudometric model seems essentially incomplete. The wisdom of adopting a nonmetric account of propositional similarity is once more apparent.
For all the reasons just outlined, we take our comparative models to be superior to metric accounts. Of course, when there is metric information relating worlds to worlds, one might essay a metric account of propositional similarity. But such cases tend to be the exception rather than the norm.

The Negation Constraint
Here is a natural-sounding constraint one might imagine applies to propositional similarity, truth-conditionally understood: the degree of similarity between and should match that between ¬ and ¬ . Call this the negation constraint.
There are two readings of the negation constraint, depending on how the word 'match' is understood. The stronger reading construes match as identity, i.e. insists that the degree of similarity of and equal that of ¬ and ¬ . The weaker reading does not insist on identity, but requires that the respective degrees be close. Though the weaker reading is vague, Sect. 3's comparative models respect it, since DIST PP ( , ) is in general close in the ordering to DIST PP (¬ , ¬ ) . Of course, how close will depend on ℝ ; but the fact that DIST PP ( , ) is a subset of the indices of atoms in and sets an upper bound on the difference between them. And when 31 Assuming ∅ is the order's minimal element. 32 The -distance from to , if positive, would then be less than the sum of the -distance from to and to (0 + 0), where is a contradiction. So if the -distance from any contradiction were 0, the only consistent -distance for all pairs of propositions would be 0. 33 'S tends towards the empty set' here abbreviates: the diameter of S tends to 0, where S's diameter is defined by diam(S) = sup x∈S,y∈S d(x, y) . The value of (S, F) as S tends to the empty set will in general depend on how S tends to the empty set.

3
Non-metric Propositional Similarity and are literals, as well as in certain other special cases, DIST PP ( , ) simply equals DIST PP (¬ , ¬ ).
The constraint's stronger reading is that DIST PP ( , ) = DIST PP (¬ , ¬ ) for all and . This constraint is not respected by our Sect. 3 models. For example, as we saw in connection with Sect. 3.1's lexicographic model, as the reader may verify. In general, whether DIST PP ( , ) = DIST PP (¬ , ¬ ) will depend on and as well as the underlying order ℝ .
One way to motivate the stronger constraint is by thinking of propositions as functions from worlds to truth-values. Models in which degrees of similarity are invariant under the permutation of truth-values are formally attractive. When it comes to applications to possible worlds, however, truth-values may not be permutable. Truth-value symmetry may be desirable in a purely mathematical setting; but as soon as intended interpretations-applied models-are in play, any reason to demand such symmetry fades away.
In fact, the constraint's stronger reading is problematic. Propositions and may be contradictory while their negations are jointly consistent. For example, if we let = p ∧ q and = ¬p , then and are inconsistent; but ¬ and ¬ are respectively equivalent to ¬p ∨ ¬q and p, which are jointly consistent. The motivation for our models also clashes directly with the negation constraint's stronger reading. Using the lexicographic model as an illustration, suppose s 1 is p 0 ∧ p 1 and s 2 is ¬p 0 ∧ p 2 . Sentence s 1 is very dissimilar to s 2 , as it differs from it along the most important dimension of similarity; DIST PP (p 0 ∧ p 1 ), ¬p 0 ∧ p 2 ) = max({0, 1}, {0, 2}) = {0, 1} . However, not-s 1 's similarity to not-s 2 is greater than s 1 's to s 2 , the reason being that not-s 1 and not-s 2 are compatible along the most important similarity dimension. For any way the world has to be for not-s 1 to be true there is a way the world is in which not-s 2 is true that is more similar to this first world than any s 1 -world is to any s 2 -world. More succinctly, s 1 and s 2 are constrained to differ along the most important dimension of similarity, whereas not-s 1 and not-s 2 are not; from a lexicographic perspective, that means that not-s 1 is more similar to not-s 2 than s 1 is to s 2 . This is precisely the result our comparative models deliver.
In sum, our comparative models respect the negation constraint's weaker reading. They do not respect the stronger reading, for good reason.

More About Similarity?
Finally, a few words about taking similarity as primitive. Our models assumed, rather than derived, similarity respects between different worlds; these respects are 'exogenous' to the models. To think this problematic-an omission-is to misunderstand our aims. The models derived a four-place similarity relation from a fourplace comparative similarity relation on worlds. There is no reason to suppose that the relevant similarity respects will be the same for all applications of interest. As David Lewis once noted, 34 it seems unlikely that the same ordering of respects of comparison will underlie facts about verisimilitude as well as facts about counterfactuals. Various notions of similarity vary from context to context, which is why our models allowed different orderings. Moreover, all talk of similarity of worlds is neutral between similarity in a particular respect (which itself perhaps aggregates subrespects of similarity) or overall similarity; either is compatible with our models. 35

Conclusion
The task this paper set itself was to derive comparative propositional similarity facts from world similarity facts. We sought to avoid a metric approach, common in the literature, but unrealistic in many contexts. The models presented in Sect. 3 did just that. They help make formal sense of the notion, central to philosophy, of propositional similarity in the usual case where there is no metric to be had. 36 Given a metric d on worlds, we define the Hausdorff metric on propositions, i.e. sets of worlds, with underlying set X the set W of possible worlds. 41 However, as observed, the Hausdorff metric is defined only on closed, bounded and non-empty subsets of the underlying space X. So we must ask whether propositions construed as sets of worlds are closed, bounded and non-empty subsets of W.
The first sticking point is that contradictions are empty subsets of W since they contain no worlds. 42 The problem may be obviated by excluding the contradiction from the domain of propositions on which is defined (but see Sect. 4.3). The second question is whether the sets of worlds propositions correspond to are closed and bounded. This is less of a worry than it might initially seem. is a pseudometric even if L-propositions (i.e. propositions expressible by L-sentences) are closed or bounded subsets of W. Symmetry for is immediate from its definition, which is symmetric for any sets and of the underlying space. The triangle inequality for follows by a routine argument from the triangle inequality for d. 43 The first half of Positivity for is also immediate: any subset of the original space has -distance 0 from itself. Thus is at least a pseudometric, i.e. a metric save that it may not obey the condition that if ( , ) = 0 then = . And as Sect. 4.3 shows, that is a pseudometric is all that's needed.