1 Introduction

There are numerous words across languages expressing that items are similar or indistinguishable in some sense, for example in German and English ähnlich/similar, so/such, and gleich/same. It seems reasonable to assume that the common core of the meaning of these words is a relation of similarity, which is considered in Cognitive Science as “… an organizing principle by which individuals classify objects, form concepts, and make generalizations” (Tversky 1977, p. 327). Still, there are significant differences between similarity expressions, one of them being gradability: While ähnlich and similar are gradable, so and such as well as gleich and same are not, see (1)–(3).Footnote 1

figure a

The starting point of this paper is the analysis of the German demonstrative so in Umbach and Gust (2014) arguing that German so as well as, e.g., Polish tak and English such are similarity demonstratives, that is, demonstratives expressing similarity (instead of identity) to the target of the demonstration (see Sect. 2). The similarity analysis is spelled out with the help of multi-dimensional attribute spaces defining similarity as indistinguishability with respect to, basically, a set of dimensions of comparison.

German ähnlich and English similar express similarity, too. But while so and such are demonstratives, ähnlich and similar are two-place predicates, and while similarity as denoted by so and such is reflexive,Footnote 2 it will be shown that this is not the case for ähnlich and similar. The most challenging difference, however, is gradability , which will be in focus in this paper.

Considering their scale structure, ähnlich and similar are clearly not open scale—increase of similarity is not open-ended. But at the same time they resist common tests for being upper-closed (see Kennedy and McNally 2005). For example, combination with vollkommen/completely yields heavily marked results. Intuitively, however, there is a maximum for ähnlich and similar which is expressed by the adjectives gleich and same, see (4).

figure b

In this paper, we start from the idea that the meaning of the three types of similarity expressions—so/such, ähnlich/similar, and gleich/same—is based on a single similarity relation. Differences in meaning are characterized in terms of additional constraints. The research questions addressed in this paper will be

figure w

In this paper, we will consider only nominal phrases (ignoring e.g., ähnlich aussehen/look similar and also ähneln/resemble; for resemble see Meier 2009) and we will only consider anaphoric/deictic uses (ignoring reciprocal constructions like Anna and Berta are similar, see footnote 11 in Sect. 3). Since the German and English expressions under consideration are close in meaning and distribution they will be analyzed in parallel.

This paper is organized as follows: In Sect. 2, the similarity analysis for so/such will be outlined as far as required in the subsequent sections. In Sect. 3, differences in distribution and meaning between the three types of similarity expressions will be explored. In Sect. 4, an analysis will be suggested accounting for the gradability of ähnlich/similar which is inspired by Klein (1980). Formal details are provided in the Appendix.

2 Similarity Demonstratives

There is a class of demonstratives found across languages modifying verbal, nominal and/or degree expressions, for example German so/solch, English such, Polish tak and Turkish böyle (see König and Umbach 2018). Some of them are uniform across categories, like German so and Polish tak; others are restricted to particular syntactic categories, like English such. In (5), German so and English such modify a noun.

figure c

In Umbach and Gust (2014), demonstratives like so/such are called similarity demonstratives and are analyzed in a framework spelling out similarity as indistinguishability with respect to given dimensions of comparison. This section provides a summary of the analysis and a brief overview of the formal framework. Details are provided in the Appendix.

The analysis starts from the common idea that the target of the demonstration is an individual or event. But while standard demonstratives like this denote identity between the demonstration target and the referent (as is in-built in Kaplan’s 1989 system), similarity demonstratives denote similarity rather than identity. Accordingly, so/such include a deictic component and a similarity component which jointly create sets of items similar to the target of demonstration. For example, so ein Tisch/such a table in (5) denote a set of tables similar to the table pointed at. This analysis entails that so/such are directly referential in the sense of Kaplan, which will be one key point in distinguishing so/such from ähnlich/similar and gleich/same in Sect. 3.

Similarity depends on dimensions of comparison.Footnote 3 The selection of the relevant dimensions is another key point in comparing the three varieties of similarity expressions. In the formal framework (Gust and Umbach 2015, Gust and Umbach to appear), dimensions of comparison define multidimensional attribute spaces and are equipped with measure functions mapping individuals to points in those spaces. Dimensions and measure functions are two components of what is called a representation. The third component is a set of classifiers, which are predicates on points in attribute spaces. They can be seen as defining a “grid”Footnote 4 where points within a cell are indistinguishable. Classifiers derived from basic ones by logical operations provide coarser (by disjunction) or finer granularity (by conjunction), which will be essential in devising a gradable notion of similarity in Sect. 4.2. Slightly simplifying, a representation \( {\mathcal{F}} \) is defined as a quadruple including a domain D, an attribute space F, a measure function μ: D F and a set of classifiers P*, \( {\mathcal{F}} = \, \left\langle { F,\mu, P^{*}, D } \right\rangle\) (see Appendix, Definition 2).

Similarity is defined as a three-place relation combining two individuals to be compared and a representation, sim(x, y, \( {\mathcal{F}} \)), such that two individuals are similar relative to a representation if and only if the points in the attribute space they are mapped to are indistinguishable relative to the given set of classifiers (Appendix, Definitions 3 and 4). Similarity defined in this way is an equivalence relation.Footnote 5

Consider, for example, the phrases so einen Tisch/such a table in (5). The semantic interpretation is shown in (6). Let us assume, for the sake of the example, that relevant dimensions of comparison are height, material, legs, and extras, and that tables are “measured” by the function in (7). Now suppose that the table the speaker points at is mapped to 〈55 cm, metal, 4, {}〉 and the set of classifiers is such that points within a range of height:40–60; material:{metal, plastics}; legs:2–4; extras:{} cannot be distinguished. Then (5) is true iff Anna’s table is mapped to a point within this range.Footnote 6,Footnote 7,Footnote 8

figure f

According to the similarity analysis, demonstratives like so/such create classes of similar items, e.g. similar tables. There is some evidence that in the nominal and verbal case (though not in the adjectival case) these similarity classes constitute ad-hoc kinds (see Umbach and Gust 2014). Anderson and Morzycki (2015) present an alternative analysis claiming that demonstratives like German so, English such and Polish tak are pro-kind expressions, adapting Carlson’s (1980) kind-referring analysis of such. The final results of the two accounts are fairly close (in the case of nominal and verbal phrases). However, Umbach and Gust not just postulate that there are kinds denoted by so phrases, but in addition show how these kinds emerge, namely by similarity. Moreover, by referring to a common similarity relation, this framework offers a basis to compare different types of similarity expressions, which is the topic in this paper.

Finally, it is important to note that the notion of similarity in this framework is qualitative (property-based), unlike that in Gärdenfors’ (2000) conceptual spaces which is quantitative (distance-based) (see Sect. 4.2).Footnote 9,Footnote 10 Even more importantly, unlike Gärdenfors’ conceptual spaces, multi-dimensional attribute spaces in the Umbach and Gust framework are integrated into referential semantics by means of generalized measure functions mapping referents to points in multi-dimensional attribute spaces . Note that this is just a generalization of degree semantics (e.g. Kennedy 1999) from the one-dimensional to the multi-dimensional case.

3 Three Types of Similarity Expressions

In this section the three types of similarity expressions—so/such, ähnlich/similar and gleich/same—will be compared focusing on semantic characteristics (for lexical and distributional data see Umbach 2014). First, ähnlich/similar as well as gleich/same are relational adjectives comparing two individuals. The second argument may be explicit (Ann’s car is similar to Berta’s car) or anaphoric (Ann’s car is similar).Footnote 11 In contrast, so/such are demonstratives (to be used deictically as well as anaphorically). Even though the target of the demonstration (or antecedent) is not identical to the referent of the phrase—the referent of such a table is not (necessarily) identical to the table pointed to—it would be misleading to think of so/such as expressions relating two distinct individuals. This is obvious when considering reciprocal readings which are licensed by ähnlich/similar (as well as gleich/same), but not by so/such (Anna and Berta have similar cars/*have such cars). Instead, these demonstratives create an ad-hoc set of items similar to the target—a set of tables similar to the table pointed to—which is then used to introduce a novel discourse referent (note that so/such are incompatible with definite determiners, *so der Tisch/*such the table).

Furthermore, while ähnlich/similar as well as gleich/same are predicates denoting pairs of individuals and may vary across indices, so/such are demonstratives. They refer directly to the target pointed at and block indexical shift (Kaplan 1989). This is shown in (8): (8a) is clearly true. But even though Adam and Ben both drive a Porsche, (8b) is false because the counterfactual index is irrelevant to the target of the demonstration—the speaker is still pointing to the old VW. In contrast, ähnlich/similar (as well as gleich/same) are evaluated at the counterfactual index, and thus (8c) is true.Footnote 12

figure g

Another difference between the three types of similarity expressions is given by the selection of the dimensions of comparison. In the case of so/such, dimensions are first of all determined by the lexical meaning of the noun—dimensions to be considered for something to be a table or be a bike. Other dimensions can be relevant as long as they relate to properties suited to create a subkind of the kind corresponding to the noun. Take the noun bike. For something to be such a bike it has to be similar to the bike pointed at in relevant bike dimensions. There may be additional dimensions which are not specific for bikes, surfacing in properties like rusty or dented. But properties like bought last year from her neighbor or fantastic would not qualify for comparison. This is why the namely continuations in (9a) and (b) are unmarked whereas in (c) and (d) they are clearly bad. In the case of so/such, dimensions of comparison are not restricted to those determined by the lexical meaning of the noun, but they must not relate to indexical (in a broad sense) or evaluative properties, because indexical and evaluative properties are unsuited to create subkinds (experimental evidence is described in Umbach and Stolterfoht in prep., see also König and Umbach 2018, Sect. 5).

figure h

Selection of dimensions is different in the case of ähnlich/similar. Consider the example in (10). First, while so/such phrases are perfect as kind-denoting terms in generic sentences, ähnlich/similar phrases are not, see (10a, b). Secondly, changing the (unacceptable) generic sentences in (10b) into the episodic sentence in (10c) reveals a clear difference in meaning: so ein Geschenk/such a present is something rare and valuable which can reasonably be considered as showing appreciation for the guest. A Panda bear serves this purpose, but an old manuscript or painting would do as well. In contrast, ein ähnliches Geschenk/a similar present need not be valuable, but it has to be similar to a Panda bear. When asked, what a similar present could be, informants mention tigers, rhinos, crocodiles etc. This is strong evidence that the ähnlich/similar version of similarity selects dimensions made salient by the antecedent.

figure i

In the case of gleich/same, there is a type and a token interpretation (Nunberg 1984). (11) may mean that Anna and Berta drive cars of the same type, or that Anna and Berta share a car (token).Footnote 13 The token interpretation yields referential identity, x = y, but the type interpretation is, first of all, just similarity —being indistinguishable with respect to dimensions given by the lexical meaning of the noun. Different from so/such and ähnlich/similar, additional dimensions are blocked for gleich/same. Suppose that Anna drives a Ford Fiesta. Then the same car on a type interpretation has to be a Ford Fiesta. But even if Anna’s car is rusty and dented, the same car could be spotless. Obviously, non-car-specific dimensions like conditions of usage are irrelevant. Moreover, while such a car may deviate from the values of the antecedent in some dimensions—e.g. by having two instead of four doors—the same car has to be exactly like the antecedent in every car dimension.

figure j

We will assume that for every noun there is a lexically associated canonical set of dimensions (called N-related dimensions). They are provided by criteria of application—what it means to be a table—and are not to be mistaken for criteria of identity.Footnote 14 Our hypothesis on the selection of dimensions of comparison is thisFootnote 15:

figure k

Let us finally consider reflexivity. In the example in (13) so eine Feuerwehr/such a fire brigade in (a) is anaphorically related to the previously mentioned team of fire fighters, which is the team the mayor intends to praise. So the referent of the so/such phrase is identical to the antecedent. When so/such is substituted by ähnlich/similar, as in (b), the mayor seems to praise a fire brigade different from the successful team, which appears strange in this context. A similar effect is found with gleich/same—(c) again gives the impression that there is another fire brigade (for (d) see below).

figure l

(13a) clearly shows that in the case of so/such similarity is reflexive. (13b) shows that in the case of ähnlich/similar reflexive pairs are excluded. But we started out from the idea that the three varieties of similarity expressions are based on one common similarity relation—it would be unintuitive to have an irreflexive similarity relation sim’ in addition to the ‘regular’ reflexive one. More importantly, (13c) shows the same effect as in (13b): there seem to be two distinct fire brigades. It would be absurd, however, to claim that gleich/same are not reflexive. We will therefore postulate distinctiveness as a precondition of usage (due to the two-place character of the lexical items).Footnote 16

Postulating distinctiveness as a precondition yields the required result for ähnlich/similar. Note, however, that in the case of gleich/same the distinctiveness effect is slightly different from what was found for ähnlich/similar. (13c) is strange only of there is no different description of the fire brigade available. But if the mayor earlier in his speech mentioned the fire brigade the community had 10 years ago, he could refer to the actual one by “the same fire brigade [as 10 years ago]” in the sense of token identity (suppose the group of fire fighters did not change), see (13d). So gleich/same do not require distinct referents but instead distinct senses—Arten des Gegebenseins—as in Frege’s distinction between sense and reference. The sentence The morning star is the same star as the morning star is decidedly odd whereas The morning star is the same star as the evening star is fine, which led Frege to distinguish sense and reference (Frege 1892). Accordingly, (13d) is fine because although the fire brigade referent is identical to the one 10 years ago (on the token reading) there are two different senses—fire brigade now, fire brigade 10 years ago.

Therefore, while ähnlich/similar presuppose distinctiveness of referents, gleich/same—on the token reading!—require distinctiveness of descriptions, or ways of identification. The type reading of gleich/same, on the other hand, requires that referents are distinct, which is trivial because otherwise it would not be a type reading.

Summing up, all of the three variants of similarity expressions can be analyzed as being based on a single similarity relation, sim(x, y, \( {\mathcal{F}})\). Their differences are due to differences in selecting dimensions of comparison and in different preconditions of usage.

figure m

Two remarks: First, we do not touch upon the issue of constraints on determiners due to reasons of space, (see Umbach 2014). Secondly, the precondition of usage in (b) may be formulated as a presupposition. This is not possible in (c) because way of identification is an intensional notion, which is not (yet) available in the similarity framework (see Appendix).

4 Gradability of ähnlich/similar

This section focuses, first, on the question of how ähnlich/similar compares to other gradable predicates, and what it means for two items to be more similar than some other two items. In the second part of this section, cognitive models of similarity are considered from the point of view of gradability, and the basic ideas of the model suggested in this paper are introduced (technical details are given in the Appendix). Finally, we will give a tentative answer to the question of why ähnlich/similar are gradable but neither so/such nor gleich/same are.

4.1 What Does It Mean to Be More Similar?

For relative gradable adjectives, the truth of the positive form depends on the relevant comparison class—Anna is tall may be true when comparing Anna to her classmates and false when comparing her to her basketball teammates. Absolute gradable adjectives do not require comparison classes because they make use of minimal or maximal degrees of the gradable property—The door is closed is true only if it is maximally closed, and false if it is ajar (cf. Kennedy and McNally 2005). So unlike relative adjectives, absolute ones include a lower or upper bound (or both).

Neither ähnlich nor similar admit reference to overt comparison classes, see (15a). The examples improve slightly when referring to a relativizing state of affairs, see (15b). Examples are unmarked when referring to dimensions of comparison (15c), which is no surprise since similarity generally requires dimensions.

figure n

Maxima can be linguistically indicated with the help of degree modifiers like vollständig and completely. As noted earlier, neither ähnlich nor similar admit these modifiers.Footnote 17 In fact, the combinations vollständig ähnlich and completely similar appear inconsistent, see (16a). Intuitively, if two items are similar, they do not fully agree in their properties, and if agreement is complete, the items are no longer called ähnlich/similar but instead gleich/same. So there is an upper bound, a maximum at which two items cannot possibly be more similar than they are. But this maximum is denoted by gleich/same, on either a token or a type reading, see (16b).Footnote 18

figure o

The intuition that gleich/same denote maximal similarity is based on the idea that the more features two items share, the more similar they are.Footnote 19 It is important to note, however, that this is one of two opposite perspectives. If there is a fixed set of features, then two items are more similar than two other items if they share more of these features.Footnote 20 If, on the other hand, the set of features is variable, then two items may be similar w.r.t. a reduced feature set, even if they were not similar in the original set. Take lens resolution in a camera, which is responsible for the details that can be distinguished. If lens resolution is given, similarity can only be increased by changing the facts in the world. But if lens resolution is decreased similarity is increased in the sense that two items may be similar even if they were not similar in the original resolution (while facts in the world did not change). The second perspective is the one taken in the next section.

Considering gleich/same from this perspective, both the token and the type reading entail maximal discriminating capacity in the following sense: The type reading implies similarity, i.e. indistinguishability, in any representation spanned by N-related dimensions regardless how fine-grained it might be, and the token reading implies similarity in any representation at all (i.e. including accidental properties).

4.2 Gradability and Granularity

In Cognitive Science, models of similarity are either distance-based or feature-based. Distance-based models, for example Gärdenfors’ (2000) Conceptual Spaces, start out from distances between points in a geometrical space representing objects of the domain in question. Similarity is determined by distance—the closer the points are (in a given metric) the more similar are the corresponding objects. Similarity is an intrinsic component of geometric representations and is exploited, e.g., in defining convexity.

In a distance-based model the notion of distance provides a “degree” of similarity. In degree-based accounts of gradability the meaning of the comparative, say, taller, is given by comparing degrees—a is taller than b iff a’s height exceeds b’s height. The positive, tall, is defined on top of the comparative by making use of a threshold provided by a comparison class (e.g. Bierwisch 1987; Kennedy 1999)—a is tall iff a’s height exceeds the threshold of the relevant comparison class.Footnote 21

The comparative of ähnlich/similar can be straightforwardly defined in distance-based models via the notion of distance (see, e.g., the comparative semantics for resemble in Meier 2009). The problem would be the positive. It is hard to imagine a way to define a predicate similar on the basis of the comparative, because there is no principled way to determine the threshold—what would be a plausible distance for two tables to count as similar?

The other type of Cognitive Science models of similarity are feature-based ones, most prominently Tversky’s (1977) contrast model. Tversky argued that there are empirical findings in conflict with the basic axioms of metric distance functionsFootnote 22: (a) minimality is problematic in view of results concerning the identification probability for identical stimuli, (b) symmetry is apparently false—the judged similarity of North Korea to Red China exceeds the judged similarity of Red China to North Korea—and (c) triangle inequality is hardly compelling—Jamaica is similar to Cuba (geographical proximity) and Cuba is similar to Russia (political affinity) but Jamaica and Russia are not similar at all.Footnote 23

In view of these issues Tversky claimed that “… the assessment of similarity between stimuli may better be described as comparison of features rather than as the computation of metric distance between points” (p. 328). He proposed a model in which similarity between two objects is computed on the basis of common and distinctive features: Similarity of two objects increases with an increase of common features and/or a decrease of distinctive ones.Footnote 24 This idea is modelled by a function S taking weighted sums of the feature sets A and B of objects a and b to an interval scale such that sim(a, b) ≤sim(c, d) iff S(a, b) ≤ S(c, d), where S(a, b) = θf(A ∩ B)αf(AB)βf(BA).Footnote 25

As before in distance-based models, the notion of similarity in Tversky’s feature-based model corresponds to a “degree” of similarity, thereby facilitating comparative statements. And as before, it is hard to imagine a way to define a predicate similar on the basis of the comparative because there is no principled way to determine the threshold.

The account of similarity proposed in this paper is feature-based. But instead of summing up common and distinctive features it makes use of dimensions and of classifiers determining whether values on these dimensions count as distinct. Similarity is defined in this account as indistinguishability with respect to given dimensions and classifiers: Two objects are similar if relative to the relevant dimensions and classifiers they are indistinguishable (see Appendix). In this account, the positive form similar is given, and the comparative form, more-similar, has to be defined on the basis of the positive.

In addition to degree-based accounts, there are so-called vague-predicate accounts of gradability, most prominently Klein (1980). In the latter, the comparative is defined on the basis of the positive form by making use of different interpretation contexts, i.e. (tripartite) partitions of the domain determining the extension of predicates. For example, a is taller than b is true if there is an interpretation context such that a counts as tall while b does not. The pros and cons of the two approaches have been the topic of a longstanding debate. One core issue is that degree semantics presupposes degrees, which are natural with adjectives like tall and old, since these adjectives are associated with units of measurement. But what would be degrees in the case of multidimensional adjectives like skillful and good and ähnlich/similar? If you think of multidimensional adjectives as spanning a multidimensional space, points in this space may be considered as degrees. But since points in a multidimensional space lack a natural order, some extra order has to be imposed (as, e.g., in Sassoon 2013, see footnote 10). This seems to suggest that in the case of multidimensional adjectives, vague-predicate approaches are more natural.

We adapt the idea of vague-predicate approaches by making use of representations of different granularity. Less granular representations have less discriminating capacity (pace dimensions and classifiers), and the lower the discriminating capacity of a representation is, the more items are similar, i.e. indistinguishable. Since the basic predicate similar is defined relative to a representation, the comparative will also be relative to a representation. We define the comparative in the following way:

Two items a and b are more similar than two items c and d in a representation \( {\mathcal{F}} \) if and only if there is a less granular representation \( {\mathcal{F}}^{{\prime }} \) such that a and b are similar in \( {\mathcal{F}}^{{\prime }} \) while c and d are not (Appendix, Definition 6, see also the remark on lens resolution at the end of Sect. 4.1).

Comparing this account to the Kleinian vague-predicate account, there are two points to be noted: First, one major characteristic of the Kleinian account is the elimination of degrees. However, the representations employed in defining a comparative of the similarity predicate include points in attribute spaces, which are in some sense analogous to degrees, thereby raising the question of why, in the similarity-based account, degree-like entities still play a role.Footnote 26 The answer is straightforward: Klein assumes predicates denoted by the positive forms, e.g. tall, to be given. The similar relation, in contrast, is not assumed to be given, but instead defined via representations. So points in attribute spaces are already required when defining the predicate denoted by the positive forms ähnlich/similar, independent of the definition of the comparative.

On a related issue, while Klein’s account presupposes a natural order on the items in the domain, e.g., w.r.t height, there is no natural order of similarity—being similar is in general relative to a representation. The requirement for Kleinian interpretation contexts to be consistent with the order on the domain can be seen as a grounding requirement: Interpretations must comply with the given structure of the world. Representations are the counterpart to interpretation contexts, raising the question of whether there is a grounding requirement for representations. In fact, there is such a requirement built into the similarity framework by means of a consistency constraint: Classifiers have to be consistent with the results of the predicates they correspond to (Appendix, Definition 2).

So from a broader perspective, both representations and Kleinian interpretation contexts are grounded in factual matters. The Kleinian account directly refers to orderings in the domain—this is why interpretation contexts need not themselves be ordered. In the similarity account, representations have to be ordered, thereby lifting the Kleinian order requirement to the level of representations.

Let us finally come to the question why the ähnlich/similar variety of similarity expressions is gradable while neither so/such nor gleich/same are. It turns out that the explanation is straightforward, in both cases referring to the need of a less granular representation in defining the comparative.

In the case of so/such, representations other than the actual one are inaccessible because so/such are demonstratives instead of content words and thus have to be evaluated in the actual context. Since representations are clearly part of the context, they are part of what cannot be shifted in the case of demonstratives.

In the case of gleich/same, maximal discriminative capacity is required—type identity entails indistinguishability in any representation spanned by the N-related dimensions, token identity entails indistinguishability in any representation whatsoever. In either case, defining a comparative making use of less granular representations is ruled out.

5 Conclusion

In this paper, three types of expressions were compared that express similarity in some sense—so/such, ähnlich/similar and gleich/same—starting from the observation that ähnlich/similar are gradable but neither so/such nor gleich/same are. Their semantics was compared on the basis of a common similarity relation revealing differences in, e.g., the selection of dimensions of comparison and the status of reflexive pairs. The similarity relation is spelled out as indistinguishability in a mathematically precise framework of representations combining multi-dimensional attribute spaces with classification functions. A predicate more-similar was defined in a Kleinian style making use of representations of varying granularity. The definition predicts gradability of ähnlich/similar but not of so/such and gleich/same.

The paper provides a semantic analysis of three closely related types of expressions which have, if at all, been considered only in isolation. Moreover, it can be seen as a contribution to a long-standing debate on sameness and indistinguishability in natural language (see, e.g., Nunberg 1984, 2004; Lasersohn 2000; Barker 2010).

Future research will extend the analysis to include demonstratives like dieser/this, the notorious contrast between German derselbe and der gleiche and the contrast between English same and identical, and also include expressions of difference.