Introduction

Most of the evidence in cognitive psychology comes from speakers of English and other Western, Educated, Industrialized, Rich, and Democratic (WEIRD) groups. In this study, we examined the relationship between language and cognition in speakers of English and speakers of three East Asian languages: Hmong, Japanese, and Mandarin. We focused on these language groups because of their use of numeral classifiers, a part of speech that is required in these languages with nouns when they occur with numerals (Li & Thompson, 1981). For instance, when saying three cows in Mandarin one has to say three tou (CLASSIFIER) cows. The choice of a classifier is thought to depend on the semantic features of the noun such as whether the noun refers to an animal (Komatsu, 2018). In Experiment 1, we examined how speakers of these languages used classifiers to organize nouns for solid objects. In Experiment 2, we asked whether the language patterns reflect perceived similarity.

A great deal is known about how categories are organized and learned by English speakers and people from other Western cultures (for more comprehensive reviews, see Murphy, 2002; Rogers & McClelland, 2004; Smith & Medin, 1981). Less is known about how people from other cultures who speak other languages organize a large set of objects. One reason for the empirical gap may be long-standing viewpoints that emphasize “universals” in cognition (e.g., Carey, 2009; Chomsky, 1965; Fodor, 1983; Plato (Bluck, 1961, translation)). The view that has dominated Western thinking over the last 2,000 years is that there is a universal set of concepts with which infants all over the world are born that constrain how they process information. This “innate ideas” view can be traced to the Ancient Greeks. For Instance, in Menos’ Paradox, Plato argued that what appears to be new learning is mere “recollection,” because any knowledge about a concept requires prior knowledge about what that concept is (Plato; 1961 translation by Bluck). In the mid-1960s, Chomsky (1965) used a similar idea, called Universal Grammar, to explain how children acquire the variety of the world’s languages quickly and uniformly. By this view, infants are born with a set of parameters that characterize all possible language rules. Acquisition involves recognizing or “setting” the parameters of the language that infants are exposed to based on a few examples. Fodor (1983) further argued that the mind is made up of various such “modules,” like language, that function independently of each other and of a universal cognitive system.

More recent theories of universal cognition have reduced the number of innate concepts to a smaller set known as “core knowledge” (Carey, 2009; Kinzler & Spelke, 2007; Spelke, 2000). These views propose that all infants are born with innate capacities for processing perceptual information about space (Hermer & Spelke, 1994, 1996; Spelke et al., 2010), numbers (Baillargeon & Carey, 2012; Feigenson et al., 2004), and solid objects (Strickland, 2017). Evidence showing that very young infants have sophisticated abilities is often cited to support these views (e.g., Baillargeon et al., 1985); however, studies showing that infants also have sophisticated learning capacities offer a counterpoint and illustrate how cultural experiences may lead to similarities and differences across groups (e.g., Saffran et al., 1996). Therefore, even in the case of documented universals, the notion that they are driven by pre-existing innate concepts is not necessarily warranted. Furthermore, evidence also suggests that adults sometimes categorize objects in different ways (e.g., McClosky & Gluxksber, 1978; Yearsley et al., 2022). This evidence of flexible and variable patterns of categorization raises the possibility that cognition may not be completely universal.

In contrast to views that focus on innate constraints and universals are perspectives that point to language as providing the organizational structure for thought. The most well-known proponent of the view that language affects cognition is Benjamin Whorf. Whorf’s (Whorf, 1956) theory includes two elements: (1) that languages vary significantly in their semantic systems, and (2) that the semantic system of a language affects the cognition of its speaker. From these two elements, it follows that speakers of different languages will think differently. Whorf’s proposal was initially dismissed because it lacked empirical evidence. Today, however, evidence suggests language-driven differences in a variety of domains including space (e.g., Yun & Choi, 2018), time (Boroditsky, 2000), number (e.g., Miller et al., 2000), color (e.g., Roberson et al., 2005), physical objects (Kuo & Sera, 2009; Sera et al., 2013), gender (Sera et al., 2002), and olfaction (Majid & Levisnon, 2007; Majid et al., 2018).

Differences between English speakers and speakers of East Asian languages in spatial language (e.g., Choi & Gopnik, 1995), number words (e.g., Miller et al., 2000), and numeral classifiers (e.g., Kuo & Sera, 2009) have been linked to cognitive differences between speakers of these languages. Most relevant to the current work are studies that point to the role of numeral classifiers in object categorization (Lucy, 1996; Kuo & Sera, 2009; Sera et al., 2013). For example, some classifiers in Mandarin Chinese are shape-based and are used with nouns that refer to both living and nonliving things (Tai, 1994). Accordingly, studies have shown that speakers of Mandarin rely more heavily on shape than English speakers when classifying objects (Kuo & Sera, 2009; Sera et al., 2013). For example, when shown a picture of a snake, and asked whether a rope or a frog is more similar to the snake, speakers of Mandarin are more likely to choose the rope than English speakers. The current work seeks to extend the evidence on classifiers and categorization by examining a broader set of classifiers and categories than was examined in past work.

It should be noted, however, that not all studies that have examined the relationship between language and cognition have found language-driven effects. For instance, Chinese and Japanese speakers use more verbs than English speakers (Choi & Gopnik, 1995; Imai et al., 2008; Tardif, 1996). Accordingly, children learning English have been shown to have a higher proportion of nouns in their early vocabularies than children learning Chinese or Korean (e.g., Choi & Gopnik, 1995; Tardif, 1996). Children learning English also associate a novel word with a novel object at a younger age than they associate it with a novel action. When studying the ages at which children learning Japanese, Chinese, and English extend novel words to objects versus actions, Imai et al. (2008) found that all of the youngest children extended novel words to objects. They further found that Japanese- and English-learning children extended words most similarly to each other and differently from the children learning Chinese even though the Japanese and Chinese children’s language input was more similar in terms of the proportion of nouns and verbs in their input. Thus, they attributed the differences found to other cultural differences between the groups.

Another reason to examine patterns of language and categorization of solid objects by adults from Western cultures such as English speakers and those from East Asian cultures is that evidence indicates differences between these groups in holistic versus analytic processing. Holistic processing takes in whole scenes or non-decomposed objects, while analytic processing involves decomposing scenes into their parts, and objects into their component features. Holistic and analytic processing are related to categorization as follows. Taxonomic categorization, or putting objects together based on shared attributes, requires analytic processing, whereas thematic categorization, or putting objects together based on shared uses or settings, involves more holistic processing. For example, when determining whether a person is more similar to a dog or to a sweater, an analytic strategy might yield that the person and dog are more similar because they are both are alive; a holistic strategy might yield similarity between the person and the sweater because sweaters are usually seen being worn on people.

Studies by Nisbett and colleagues provide evidence of cross-cultural differences in holistic and analytic processing between European Americans and East Asians (Boduroglu et al., 2009; Chua et al., 2005; Nisbett & Masuda, 2003). They found that East Asians pay more attention to the context of a scene than to its component objects in comparison to European Americans. This suggests that East Asians may have a tendency for holistic processing, which is more likely to result in thematic categories. In contrast, European Americans tend to divide scenes into objects and their parts, which yields more taxonomic groups. Indeed, when Ji et al. (2004) asked East Asian and European American adults to select which two of three words (e.g., monkeys, pandas, and bananas) are most similar, they found that East Asians were more likely to categorize the words thematically (e.g., saying that monkeys were more like bananas than pandas) compared to European Americans, who were more likely to categorize the words taxonomically (i.e., saying that monkeys and pandas were more alike). It is not clear why these differences in holistic versus analytic processing have emerged between Eastern and Non-Eastern groups. As an explanation for the differences found, de Oliveira and Nisbett (2017) proposed that adults from Eastern and Western cultures operate under collectivist versus individualistic frameworks of thought, respectively, which impact many aspects of day-to-day life, including cognition. Recently, Kitayama et al. (2019) offered evidence of a correlation between individualistic/collective cultural frameworks and analytic versus holistic processing. The differences might also be due to the aforementioned tendency of East Asian adults to talk about actions versus objects, at least when talking to children. Talking about actions and events may emphasize holistic and thematic relations among items, whereas talking about objects may highlight dimensional differences and similarities. However, we know of no evidence on this point.

The current study

In short, there are several factors that affect how adults categorize objects. Innate capacities, universally shared experiences, differences in language, and other cultural factors may all play roles. Experiment 1 documented similarities and differences among speakers of Mandarin, Japanese, and Hmong in how they use classifiers to group nouns. Experiment 2 examined whether the patterns of classifier use reflected the speakers’ perception of the similarity between the objects. Similarities among the groups would point to universal divisions in solid object categorization. If differences emerge that are consistent with the language groupings, those differences may be attributable to differences in language; however, if differences emerge that are inconsistent with the language groupings, those differences are likely due to other cultural factors.

Experiment 1

Although there is a large literature on classifiers, there is no consensus on how they are used. Allen (1977) studied how classifiers are used in more than 50 languages and concluded that semantic features determine classifier choice (see also Croft, 1994). Other linguistic analyses, however, have proposed that morpho-syntactic features drive the choices. For example, Greenberg (1972) noted that classifier languages typically lack plural markers and that classifiers function to individuate items for pluralization (see also Cheng & Sybesma, 1998; Chierchia, 1998; Zhang, 2007). These differing characterizations suggest that choosing a classifier to use with a noun may not be based on the noun’s semantic properties. Therefore, an important first step to understanding the relation between classifiers and categorization involves documenting the degree to which speakers of a classifier language agree on classifier use. If speakers show a large amount of agreement, we could use that evidence to make predictions about how speakers of the language categorize solid objects. If those same groupings emerge in a different task, it would offer evidence that classifiers affect perceived similarity between objects. Alternatively, if other differences emerge among the groups, they would be attributable to other factors.

Method

Participants

A total of 34 adults participated, 12 in the Mandarin-speaking group (seven females), ten in the Japanese-speaking group (six females), and 12 in the Hmong-speaking group (seven females). The Mandarin speakers were visiting graduate students from a national university in China. Their areas of birth ranged from Northern China (e.g., Beijing) to the Central Western region (e.g., Hubei), Eastern China (e.g., Shandong), and the Southwestern region (e.g., Chongqing). Despite the variety of origins of speakers, all were raised speaking Mandarin Chinese as a native language. The Japanese speakers all lived in the Minneapolis-St. Paul community at the time of testing, but originally came from Tokyo or Osaka, Japan. The Hmong speakers were also members of the Minneapolis-St. Paul Hmong community, originally from Laos or Thailand.

Materials

The stimuli for all three language groups were the 135 nouns that appear in Appendix A Tables 2, 3, 4 and 5, which also shows their translations in Hmong, Japanese, and Mandarin. We operationalized nouns referring to “solid objects” as nouns referring to objects with shapes. Accordingly, most of the nouns used (122 of them) came from a list of nouns that English-speaking adults rated as referring to objects with a shape as reported by Samuelson and Smith (1999). We excluded nouns from the Samuelson and Smith (1999) list that referred to body parts (e.g., arm, ear, eye), that did not have equivalents in the languages under study (e.g., cheerios, crib), or that were redundant with other nouns (e.g., firetruck and rocking chair are hyponyms of truck and chair). In addition, we added 13 nouns from the list of most frequently spoken words by children (from Wepman and Hass (1969) that referred to people, other living things, and common objects. The 13 nouns added from the Wepman and Hass (1969) list were: boy, girl, woman, man, flower, tree, snake, fish, river, book, picture, rope, and stone. Two of the nouns were excluded in Hmong because they do not function as independent lexical items. Specifically, the Hmong noun for book is an underspecified expression referring to something like paper, that requires a classifier to convey the concept of book. Similarly, the Hmong noun for river by itself means water and requires a classifier to convey the concept of river.

Procedure

Participants were tested orally in their native language by a native speaker. The task was introduced as follows (English translation):

We are studying the way people say things in different languages. We are going to give you a word and want you to give us the counting word that comes to your mind when you hear it. Here is one example, “When I hear apple the counter that comes to mind is ge (or ko in Japanese or lub in Hmong). In Chinese (or Japanese or Hmong), I would say, “One ge/ko/lub apple.” Here’s one for practice: How would you say, “One ___ chicken?”

If participants did not supply a classifier, the experimenter would follow up by saying, “In Chinese, I would say one zhi chicken.” The participant and experimenter would go over these two examples until the participant understood that s/he was to state which classifier they used with each noun. The task took approximately 1 hour to complete, and each participant was paid $10.00 for their participation.

Results and discussion

Because our goal in Experiment 2 was to use the most agreed-upon divisions made by classifier use, we focused on the most frequent and consistently used classifiers by each group in this section. Appendix A contains detailed information about the results from each language group on all the nouns studied. Figures 1, 2, and 3 show the English translations used by speakers of Mandarin, Japanese, and Hmong, respectively. Each circle in the figures represents one classifier. Overlapping circles indicate that multiple classifiers were used for the nouns. The line and the labels “living” and “non-living things” in Figures 1, 2, and 3 were provided by the researchers.

Fig. 1
figure 1

The noun groups that emerged from Mandarin speakers’ use of classifiers. *Different characters exist for the spoken version of zhi

Fig. 2
figure 2

The noun groups that emerged from Japanese speakers’ use of classifiers

Fig. 3
figure 3

The noun groups that emerged from Hmong speakers’ use of classifiers

Mandarin results

Eight different classifiers accounted for 77.4% of Mandarin classifier use. They were: zhi (24.5%), ge (21.6%), tiao (10.4%), ba (7.4%), liang (4.3%), shuang (3.5%), tou (3.5%), and ke (2.7%). Nouns within one circle in Fig. 1 elicited the same classifier at least 75% of the time. Nouns in two circles elicited two different classifiers, between 25% and 75% of the time. As depicted in Fig. 1, Mandarin speakers used ge and tiao with nouns that referred to living as well as non-living things. Spoken forms of zhi also appeared with nouns that referred to living and non-living things, however, these forms are homonyms marked by different Chinese characters.

Japanese results

In Japanese, the most frequently used ten classifiers combined for over 95% of classifier use. They were: hiki/piki (19.0%); tsu (16.4%); hon/pon(16.0%); ko (10.6%); dai (10.1%); mai (8.7%); tou (4.1%); soku (3.7); wa (3.5%); and nin/ri(3.2%). Each of the remaining 19 classifiers were used less than 1% of the time. Figure 2 illustrates the pattern of classifier use in Japanese. Nouns that appear only inside one circle elicited a single classifier 87% or more of the time. Nouns that appear in different circles elicited different classifiers between 14% and 86% of the time.

The overlap between the circles in Fig. 2 also captures set and subset relations in Japanese classifier use. For example, of the 37 nouns that refer to non-human animals, hiki/piki was used at least once with each noun and 100% of the time with 12 of these nouns. Hiki/piki was never elicited by nouns that referred to nonliving things. However, wa was elicited by most of the participants for ten of the nouns that also elicited hiki/piki. All the nouns that elicited wa were animals with wings, most were birds. There was not one noun that elicited wa that did not also elicit hiki/piki. An analogous pattern was observed with hiki/piki and tou. Therefore, hiki/piki seems to be a general classifier for animals, with wa and tou being used for non-overlapping subsets. We observed an analogous pattern in their use of classifiers with nouns that referred to non-living things. The classifier tsu was used with a broader set of nouns that also took other classifiers. Within the nouns that refer to living things, we also found that nouns that referred to people elicited ri/nin at least 90% of the time. Thus, Japanese speakers make a sharp distinction between humans and animals in their classifier use.

Hmong results

The Hmong speakers used 12 classifiers for the nouns. The two most frequently used classifiers, lub and tus, combined for 73.1% of classifier use. The other elicited classifiers were: rab (4.5%), txoj (3.9%), daim (2.9%), and txhais (2.7%). Figure 3 depicts the use of Hmong classifiers. Nouns within one circle elicited the same classifier at least 75% of the time. Nouns in two circles elicited two different classifiers between 25% and 75% of the time. The largest number of nouns used with one classifier was used with lub and the second-most commonly used classifier was tus. Most relevant to the current study is the finding that the Hmong speakers use one classifier, tus, with nouns that refer to humans and (non-human) animals.

Similarities across the languages

Our findings showed considerable agreement and systematicity in classifier use with nouns referring to solid objects. We examined the common groupings across the languages by focusing on the nouns used with the most commonly used classifiers in each language. Table 1 shows the classifiers that were used with overlapping nouns across the languages. The researchers provided the category labels in Table 1. If overlap across the languages in classifier use reflects similarities (and points to universals) across the groups, we would expect the following categories to emerge for all groups: (1) animals; (2) 3-dimensional non-living objects; (3) artifact with handles; (4) long, flexible objects; (5) large four-legged animals; and (6) large machines.

Table 1 Noun groups that emerged across all three language groups through their use of classifiers

Differences across languages

The groups differed in their uses of classifiers in a variety of ways. Every circled group of nouns in Figs. 1, 2, and 3 could lead to a predicted difference in the perceived similarity of objects among the groups. In the current work, we focus on the larger differences found. The groups differed in whether their use of classifiers reflected a hierarchical structure, with classifier uses from Japanese speakers indicating a hierarchical structure within living and non-living things; the evidence from Mandarin and Hmong speakers’ use of classifiers did not indicate a hierarchical structure. The groups also differed in the noun groups that emerged with humans, animals, and non-living things. Hmong speakers used one classifier for nouns that referred to humans and other animals. Japanese and Mandarin speakers used different classifiers for nouns that referred to animals versus humans. The groups also varied in whether they used the same classifiers for both living and non-living things. Hmong and Mandarin speakers often grouped nouns that referred to animals and humans with non-living things. Japanese speakers rarely did. In Experiment 2, we explored the following two ways that classifiers might impact the structure of object categories and lead to differences between the language groups:

  1. 1)

    If using classifiers hierarchically leads speakers to organize some objects within a category as more similar to each other, one would expect Japanese speakers to view certain objects more similarly than speakers of the other two groups. For example, they should view birds and four-legged mammals as more similar to each other than speakers of Hmong or Mandarin.

  2. 2)

    With respect to the categories of humans, animals, and non-living things, one would expect that Hmong speakers should view humans and animals as more similar to each other than speakers of Japanese or Mandarin, and that Hmong and Mandarin speakers should view humans and non-living things as more similar to each other than Japanese speakers.

If other differences among the groups emerge in categorization, they would not be attributable to the patterns of classifier use that we observed in Experiment 1.

Experiment 2

Do the patterns of classifier use that we observed in Experiment 1 affect perceived similarity among solid objects by speakers of these languages? A subset of 39 pictures depicting a broad set of categories from the nouns used in Experiment 1 was selected as the stimuli for Experiment 2. The participants rated the similarity of 39 drawings or 741 pairs of drawings. We used a similarity-rating task following the procedures of Rips, Shoben, and Smith (1973). Findings from these tasks have been replicated by other tasks and methods (Smith & Medin, 1981). We used a similarity rating because most models of categorization rely on similarity (see e.g., Goldstone, 1994; Nosofsky, 1986; Rogers & McClelland, 2004), and similarity ratings have been shown to be good predictors of categorization of the types of items used in this study – natural kinds and artifacts (e.g., Rosch & Mervis, 1975; Weber et al., 2009). Without being grounded in similarity, categories would not afford inferences within a kind, and considerable evidence indicates that even young children make powerful inferences about similar kinds of objects (e.g., Gelman & Markman, 1986; Sloutsky & Fisher, 2004). We also included a group of English speakers to serve as a reference point to the previous literature on categorization. In our instructions, we avoided using the nouns and classifiers that referred to the objects so that participants’ attention would not be drawn to language, and participants were free to classify the objects based on any dimension or combination of dimensions. These strategies vary from past work that has employed words as stimuli (e.g., Cole, 1971), and the use of a small set of objects that are similar on a single dimension such as shape (e.g., Kuo & Sera, 2009; Sera et al., 2013).

We began our data analyses with multi-dimensional scaling (MDS) and clustering techniques to explore the perceived similarity and organizational structure of the items. MDS takes the dissimilarities among each object pair and puts all of the objects in a multidimensional space that captures the similarity between them, with smaller distances indicating greater similarity. These techniques have been previously used to study the relationship between the organization of spatial terms and cognition (Manning et al., 2002; Talmy, 1983; Yun & Choi, 2018), and several other lexical and cognitive domains such as containers (Malt et al., 1999). Our strategy closely follows those of Malt et al. (1999), who used the same approach to examine whether differences in names for containers across languages influence speakers’ perception of the similarity of the containers. Malt et al. (1999) found similar patterns of container categorization despite the differences in naming. Yun and Choi (2018) also employed MDS to examine the use of grammatical prepositions and verbs relating to fit and containment in English and Korean speakers, and they found evidence of differences in the organizational structure of spatial relations by the groups. Finally, we used parametric techniques (ANOVAs) to determine whether differences that emerged from the MDS analyses were statistically reliable.

Method

Participants

Adult native speakers of English, Japanese, Mandarin, and Hmong in the Twin Cities participated in the study (N = 64; n = 16). This number of participants was comparable to numbers used in past work with MDS (e.g., Rips et al., 1973). All participants were students at the University of Minnesota, were tested in English, and were paid $10 for their participation. The Japanese and Mandarin speakers were international students visiting or attending the University of Minnesota. The English and Hmong speakers resided in the USA.

Materials

We selected pictures that depicted 36 nouns following the work of Rogers et al. (2004) on category structure, which was based on a corpus analysis. Thus, we selected pictures that depicted: (1) land animals (dog, cow, monkey); (2) birds (robin, duck, chicken); (3) water animals (fish, snake, frog); (4) people (boy, woman, man); (5) trees (pine, oak, palm); (6) flowers (rose, daisy, tulip); (7) hand-tools (shovel, hammer, scissors); (8) fruit (apple, orange, banana); (9) vehicles (car, bus, airplane); (10) simple artifacts (rope, box, comb); (11) household items (key, cup, chair); and (12) clothing (dress, pants, sweater). We also we added pictures depicting celestial bodies (sun, moon, and cloud) based on reports that these items may be treated as animates because of their apparent movement (Piaget, 1954). Thirty-five of the pictures depicted nouns from Experiment 1. Figure 4 shows reductions of the drawings used.

Fig. 4
figure 4

Reduced copies of the stimulus drawings used in Experiment 2

Procedure

Participants rated all non-redundant pairs of combinations of the 39 pictures on a 7-point scale (very similar =1; not similar at all = 7), which resulted in 741 unique dissimilarity judgments from each participant. Each participant was shown all the pictured objects before they made their ratings. All speakers were instructed in English. They were instructed to pick the two they thought were most similar and assign a value of 1 to such pairs, and to choose the two that were most different and assign a value of 7 to such pairs. These anchoring pairs remained in view of the participant throughout the experiment as s/he made their similarity judgments. The combinations were presented on a computer screen in 16 random orders that matched across the four language groups. Each session lasted approximately 45 min. Participants were identified as reversing the scale if their ratings on the computerized version of the task did not match the ratings of the items they picked to be as most similar or most different.

Results and discussion

We first used MDS to provide us with a general picture of the similarity ratings by the groups of the items, and hierarchical clustering techniques to identify the categories suggested by the similarity ratings for each group. We then used ANOVAs to determine whether any of the differences that emerged among the groups were statistically reliable.

Multi-dimensional scaling

Each participant’s 741 ratings yielded a proximity matrix consisting of the similarity ratings between each pair of the 39 items. We conducted a reliability analysis on each language group’s data to determine the similarity of the data per language group. The reliability statistics (Cronbach’s alpha) that emerged from these analyses for speakers of English, Mandarin, Japanese, and Hmong were .997, .997, .995, and .992, respectively. Thus, the data within each language group were similar enough to justify being averaged together. So we averaged the proximity matrices across the participants in each language group, which resulted in one proximity matrix for each group. The data from four participants (two Hmong speakers, one English speaker, and one Mandarin speaker) who reversed the scale during testing were not included in these analyses. We then used MDS to determine the distances between the objects for each language group’s proximity matrix. The MDS analyses were conducted in SPSS using the software program PROXSCAL (ratio option). PROXSCAL bases its analyses by calculating the squared Euclidean distances among a set of objects in multi-dimensional spaces. For additional information about PROXSCAL in SPSS see Commandeur and Heiser (1993). We used the exact same program and procedures to calculate one-dimensional (1D), two-dimensional (2D), three-dimensional (3D), and four-dimensional (4D) solutions for each group. Two criteria are typically used to select the appropriate dimensionality: (1) the stress of the solutions and (2) ease of interpretation. The stress of multidimensional scaling solutions is a measure of the goodness of fit between the solution and the proximity ratings (following Kruskal, 1964). When using stress as a guide for deciding on the appropriate solution dimensionality, researchers first plot the stress level against the dimensionality of solutions (see Fig. 5 below). Then, they find the “elbow” in the graph that marks the largest reduction in variance between the solutions and the participants’ ratings.

Fig. 5
figure 5

The stress of one- (1D), two- (2D), three- (3D), and four-dimensional (4D) solutions for each group

We selected the 2D solutions because they yielded the largest reduction in variance across all of the groups, and because it was easier to interpret than the 3D and 4D solutions. However, see Appendix B Fig. 9 for more information about the 3D solution. The 2D solutions, shown in Fig. 6, accounted for 90–93% of the dispersion across all groups.

Fig. 6
figure 6

The two-dimensional (2D) multi-dimensional scaling solutions for each group. The categories are depicted by colored circles as follows: red = humans, yellow = animals, green = plants, light blue = celestial bodies, dark blue = artifacts, purple = fruit

Dimension 1

Three of the four groups – speakers of English, Mandarin, and Japanese – divided the items into natural kinds and artifacts along one (the horizontal) axis, with drawings depicting boy, man, woman, fish, dog, chicken, frog, snake, cow, robin, duck, banana, orange, apple, rose, daisy, tulip, palm, oak, cloud, moon, and sun on one side of the axis, and dress, pants, sweater cup, comb, chair, key, hammer, scissors, rope, shovel, box, car, bus and airplane on the other side of the axis. Dimension 1 was similar for the Hmong speakers except that they appeared to rate man, woman, and boy as more like the artifacts, suggesting a more thematic, spatial, or shape-based (vs. taxonomic) interpretation of Dimension 1. Based on visual inspection of the MDS solutions it is not clear whether humans were rated as more like artifacts or animals for speakers of Japanese, English, and Mandarin, and whether speakers of Hmong’s rating are significantly different from those of the other groups. These issues were followed up with cluster analyses and the ANOVAs reported below.

Dimension 2

The interpretation of Dimension 2 was less clear. It may reflect either the typical spatial distance from humans, or biological similarity to humans. Within artifacts, items most proximate to people (e.g., pants, comb) were on one end, tools used with other objects were in the middle (e.g., scissors, shovel), and vehicles (e.g., airplane) were at the other end. Within natural kinds, people (e.g., woman, boy) and animals (e.g., fish, dog) were at one end, plants (e.g., oak, tulip) were in the middle, and celestial bodies (sun, moon) were at the other end.

Cluster analyses

We then used hierarchical cluster analyses (following Shepard, 1988) to explore the categories that emerged from the similarity ratings, and to examine whether the pattern of classifiers used by Japanese speakers in Experiment 1 corresponded to tighter clusters of certain categories. We entered the 2D coordinates for each item from the MDS analyses into hierarchical cluster analyses in SPSS. The strategy used for deriving the hierarchy was an agglomeration process that starts with each case in a different cluster. Then, the dissimilarity measures (i.e., the Euclidean distances) between items are used in an agglomeration process that first merges items that are most like each other into first-order clusters, and then combines the groupings systematically until all cases are joined under a single cluster. This commonly used agglomeration algorithm yields a hierarchical structure, called a dendrogram, for the items. We obtained a dendrogram for each group of participants (see Fig. 7). We provided the names for the items in the dendrograms.

Fig. 7
figure 7

A The dendrograms showing the hierarchical structure that emerged from the cluster analyses for Japanese and Hmong speakers. The labels for the pictured items were provided by the researchers. B The dendrograms showing the hierarchical structure that emerged from the cluster analyses for Mandaring and English speakers. The labels for the pictured items were provided by the researchers

Inspection of the dendrograms revealed that all four groups classified many of the objects into a broad category of natural kinds that included animals (chicken, duck, robin, cow, dog, snake, bird, monkey, fish), celestial objects (sun, moon, clouds), plants (flowers and trees) and fruit (apple, orange, banana). They all subdivided flowers (daisy, rose, and tulip) and trees (oak, pine, and palm) similarly. All the groups also classified many objects into a broad category of artifacts that included hand tools (hammer, scissors, key, and shovel), vehicles (car, bus, airplane), and personal items (pants, dress, sweater, comb, and cup). Several of the noun classes that emerged from Experiment 1 (see Table 1) also emerged from the cluster analyses. The dendrograms from all groups contained a second-order category of animals. However, the subdivisions within animals (i.e., the first-order categories) did not match classifier use for any language documented in Experiment 1. Within artifacts, we observed a category of artifacts with handles across all groups, which we also observed in classifier use in Experiment 1, and a vehicle category, which would fall into large machines. Overall, the findings from the cluster analyses line up with the similarities in classifier use we found in Experiment 1. Differences in whether classifiers were used hierarchically found in Experiment 1 did not line up with the results from the cluster analyses. We did not find that birds or four-legged animals emerged as first-order clusters in the rating by Japanese speakers, or speakers of the other languages.

With respect to our second prediction from Experiment 1, that speakers of Hmong would view humans and animals as more similar to each other than speakers of Japanese or Mandarin, we found just the opposite. The structure resulting from the Hmong speakers differed from the structures from the other language groups, with humans (man, woman, boy) clustering first as a sister node to personal items and later under the broad category of artifacts. For speakers of Japanese, English and Mandarin humans clustered with other animals (chicken, duck, robin, etc.) and with other natural kinds, with only minor deviations. This pattern of clustering was the opposite of what the classifier uses found in Experiment 1 predicted.

ANOVAs

We next used ANOVAs on the original ratings to determine whether the differences that emerged from the MDS solutions were statistically reliable. We focused on the categories of humans, animals, and artifacts. For each participant, we averaged the ratings between each item in the human category (man, woman, and boy) and each item in the animal category (chicken, cow, dog, etc.), and the ratings between each item in the human category and each item in the artifact category (airplane, bus, box, etc.). The average of the 27 ratings per participant for the humans versus animals contrast and the 45 ratings for the humans versus artifacts contrast went into a two-way, mixed-design ANOVA with Contrast (human vs. animal or human vs. artifact) as a within-subjects factor, and Language (Japanese, English, Mandarin, & Hmong) as a between-subjects factor. The ANOVA yielded reliable main effects of Contrast (F(1,56)=68.06, p < .001, η2 = .549) and Language (F(3,56)=5.701, p < .01, η2= .234 and a significant Language × Contrast interaction (F(3, 56) = 3.987, p < .05 η2= .176). The main effect of Contrast indicated that all groups rated humans and animals as more similar to each other than humans and artifacts (4.48 vs. 5.57). The main effect of Language indicated that speakers of Hmong rated all contrasts as more different than speakers of English and Japanese (p =.017 and p = .001, respectively, by Tukey tests). No other effects of language were reliable.

The ratings that made up the significant interaction between Language and Contrast appear in Fig. 8 and were further examined with two one-way ANOVAs, one on the ratings of humans versus animals, the other on the ratings of humans versus artifacts. The ANOVA on the humans versus animal ratings revealed a significant effect of Language (F(3,56)= 6.548, p < .001, η2 = .260) with speakers of Japanese rating animals and humans as more similar than speakers of Hmong (p < .001, Tukey HSD), and speakers of English also rating animals and humans as more similar than speakers of Hmong (p = .01, Tukey HSD). The Mandarin speakers’ ratings of the similarity between humans and animals did not differ reliably from those of any other group and fell in between those from Hmong and English speakers. No reliable effect of Language emerged from the ANOVA on the ratings of humans versus artifacts (F(3,56)= 2.7, p < .1, η2 = .126). We conducted two additional ANOVAs examining humans versus proximal artifacts, including clothing, and humans versus distal artifacts that did not include clothing. Neither of these analyses yielded any reliable effects.

Fig. 8
figure 8

The average dissimilarity rating of humans vs. animals and humans vs. artifacts by each language group. The mean of the ratings per group appears at the top of each bar in the graph

Therefore, the results from the ANOVAs on the original ratings partially confirmed the results from the MDS solutions and cluster analyses: speakers of Japanese and English view humans and animals as more similar to each other than do speakers of Hmong. They did not confirm the results from the MDS solution that Hmong speakers viewed humans and artifacts as more similar to each other than the other groups. Because the variance of the ratings across the groups is comparable (and the reliability of the data within and across the groups is high and comparable based on the Cronbach’s alpha statistics reported previously), we believe that the constraints in item placement inherent to MDS is likely responsible for the discrepant results. Thus, we take the results from the original ratings as being more accurate.

To summarize, the findings from Experiment 2 revealed similar organizations of solid objects by speakers of different classifier languages. The differences that emerged did not seem to follow the patterns of classifier use documented in Experiment 1. It should also be noted that the differences that emerged do not correlate with the residency of the groups. Hmong speakers (residing in the USA) differed significantly from speakers of English (also residing in the USA) and from speakers of Japanese (not residing in the USA). So, residency does not coincide with the pattern of findings. Overall, the findings support a more universal than relativistic organization of solid objects.

General discussion

It is important to examine the universality of object similarity because of its fundamental role in theories and models of cognition. Our findings extend existing evidence on the organization of solid objects in several ways. Our findings from Experiment 1 documented similarities and differences between speakers of different classifier languages in their grouping of nouns based on classifier use. In Experiment 2, we explored whether the patterns of classifier use found in Experiment 1 corresponded to similarity ratings of pictured objects. We asked whether the common groups that emerged from Experiment 1animals; 3D non-living objects; artifacts with handles; long, flexible objects; large four-legged animals; and large machines would emerge from the similarity ratings of Experiment 2. We also explored

two differences among the groups based on their patterns of classifier use: (1) that hierarchical use of classifiers by Japanese speakers would yield more closely related categories of bird and four-legged animals in them than in speakers of the other languages; and (2) that Hmong speakers would view humans and animals as more similar to each other than speakers of Japanese and Mandarin.

Overall, our findings offer evidence of a widely shared organization of solid objects by a diverse group of participants from different cultures who speak different languages. All four groups classified many of the objects into a broad category of natural kinds that included animals, celestial objects, plants, flowers, trees, and fruit. All the groups also classified objects into a category of artifacts that included hand tools, personal items, and vehicles. Our findings also indicate that hand tools, vehicles, and personal items are important subdivisions within artifacts across different cultures. The reason for these groupings is not clear. One possibility is that plants and animals are “crying out” to be grouped together (Berlin, 1992; Malt & Majid, 2013). Perhaps the shared “core” cognitive capacities that are supposed to give rise to the category of “solid object” also give rise to the subdivisions of plants and non-human animals. Perceived similarities across objects’ shapes, colors, sizes, component parts (e.g., facial features, handles, and wheels) may link plants, animals, and various artifacts within each other. Shared experiences in science courses (all our participants were university-educated and residing in urban areas), farming, or observing plants and animals in nature may also give rise to these categories. Similarly, shared experiences with people, their things, and other animals probably give rise to the finding that humans anchor both categories of natural kinds and artifacts. Shared experiences are the likely source of these anchoring effects since they have not been universally observed (e.g., Medin et al., 2010). Regardless of their cause, however, the common patterns of categorization found in Experiment 2 place limits on the impact that unique experiences, such as language, may have on cognition. In fact, the differences that we found in Experiment 2 did not follow the language patterns of classification in Experiment 1.

Another aspect of our results that extends previous work is our finding that humans appear to anchor both categories of natural kinds and artifacts. In past work, Carey (1985) and Medin et al. (2010) have found that humans often serve as the prototype for early concepts of animals. They found that properties learned on humans were more likely to be attributed to other animals than properties learned on another animal. Using a similarity-rating task, our work offers converging evidence that there is a trace of this tendency in adults. In the natural kinds category, humans were at one end of the category with animals coming next to humans, followed by plants and then celestial bodies. In the case of natural kinds, biological properties (and their absence) seem to underlie the ordering. For artifacts, we found an analogous trend. Items that are typically encountered near humans, such as personal items, were judged to be more closely related to humans, and items that are encountered farther away, such as airplanes, were judged to be more distantly related. In the case of artifacts, a thematic dimension of spatial proximity seems to underlie the judgments. However, it is also possible that the organization of artifacts reflects similarity to humans based on shape, since articles of clothing may be viewed as more similar in shape to humans than other artifacts, and they were judged as most similar to humans. Clearly, discovering the bases of the similarity of humans to artifacts awaits additional work. Regardless, our finding that humans anchor the category of artifacts, to our knowledge, is a new contribution to the literature.

We did find a critical difference between the groups in their categorization of humans. Speakers of Japanese and English rated humans and animals as more similar to each other than Hmong speakers. One reason for these differences could have been the difference between speakers of the East Asian languages in classifier use. As previously mentioned, speakers of Hmong use one classifier, tus, with nouns that refer to animals and humans. In contrast, speakers of Japanese use different classifiers for humans (ri/nin) versus animals (hiki/piki). These patterns of classifier use would predict that speakers of Hmong should judge humans and animals as more similar to each other than speakers of Japanese. However, our findings are the exact opposite of that prediction. We found that speakers of Japanese rated humans and animals as more similar than speakers of Hmong. English speakers classified items most similarly to Japanese speakers, and Mandarin speakers’ judgments were in between the judgments from Hmong and English speakers. These findings parallel those of Imai et al. (2008), who found that despite language similarities between speakers of East Asian languages, the judgments of Japanese speakers were most closely aligned with those of English speakers. They also resemble those of Malt et al. (1999), who found similarities in speakers of different languages in how they categorized one kind of physical object – containers – despite language differences in their names. Our findings extend those of Malt et al. (1999) because we also found similarities in categorization despite differences in languages, but our findings apply to a larger set of physical objects. When patterns of categorization show an organizational structure that is different than certain aspects of language predict, we can begin to tease apart the impact of language from other cultural factors.

Having eliminated the possibility that differences in classifier use account for the differences found, we briefly turn to the role of other cultural factors. Considerable work in child development has shown that many preschoolers do not view humans as animals (Herrman et al., 2012; Winkler-Rhoades et al., 2010). Some of the cultural and social experiences promoting the concepts that link humans with other animals have begun to be investigated. Recent work points to experiences with animals (DeLoache et al., 2011), and reading stories with anthropomorphic characters as playing roles (Ganea et al., 2011; Ganea et al., 2014). These same factors may be at play in the differences that we observed among adults. Such practices may be more common in the USA and Japan than China and the more geographically similar areas where Hmong speakers reside. However, the universality of such concepts or the cultural practices that promote the links between humans and animals have not been widely studied.

There are several ways to reconcile our current findings of no effects of language on cognition with previous findings showing such effects (e.g., Kuo & Sera, 2009). One possibility is that previous work with fewer items that differ on a few dimensions encouraged classification by dimensional values such as shape similarity (e.g., Sera et al., 2013). Using a broad range of objects that differ on many dimensions has been shown to encourage ratings by overall similarity in adults (Smith & Kemler, 1984), and it is possible that the large set of objects used in the current study attenuated reliance on specific dimensional values in similarity ratings. Another possibility is that for effects of language to emerge, the language contrasts have to line up conceptually in certain ways. For example, Sera et al. (2002) only found effects of grammatical gender on categorization when the grammatical gender system lined up with the natural gender system. Perhaps a larger number of classifiers need to be used hierarchically for such effects to emerge with respect to solid objects. A more focused experiment on the role of linguistic hierarchical structures on similarity may reveal reliable effects. It is also possible that classifiers’ effects are minimal with respect to the solid objects that we studied, but more likely to emerge among nouns that refer to abstract entities (e.g., idea, story) in which perceptual features cannot “cry out” as a basis for the similarity ratings. Finally, it is also possible that the classifier uses we documented in Experiment 1 are less reflective of semantic structure and more reflective of other pressures on classifier uses in language such as pluralization (see Chierchia, 1998, for a discussion along these lines). Clearly, these issues await future work.

In conclusion, our findings offer new evidence on the important subdivisions within solid objects made by adult speakers of different languages raised in different cultures. Our findings are among the first to offer quantitative evidence of shared patterns of categorization of a large set of solid of objects by diverse groups of participants. The similarities observed support the idea of universal categorization patterns and identify what those categories might be. It is not clear if the similarities observed are the results of innate “core” cognition, or shared experiences. We also found a difference among the groups in how they categorized humans, indicating that non-universal factors also play a role. Importantly, the pattern of categorization that we found is not consistent with the idea that numeral classifiers are responsible for the differences that we found. As such, we began to separate the effects of language on cognition from other cultural factors. The findings provide a solid reference point towards a more complete picture of human categorization and the factors that affect it.