It has long been recognized that individuals from different cultures can vary in how they make sense of the world. But because all humans share much the same neural hardware, it has usually been assumed that basic perceptual and cognitive processes are largely the same, with cultural differences manifesting primarily in high-level knowledge and beliefs. However, cultural psychologists have recently started questioning the assumption of a “universal architecture,” arguing that culture may affect not only our interpretation of the world, but also our perceptual and cognitive modes—that is, the ways that we process information. According to this view, differences in cultural background could well result in considerable differences in people’s basic experiences of the world itself (Miyamoto & Wilken, 2013; Norenzayan, Choi, & Peng, 2007).

In this study we investigated how perceptual mode can be affected by the environmental contexts of individuals, including their patterns of interactions with others, their physical surroundings, and the forms of written language they use. In particular, we examined malleability—that is, whether, having acquired one mode, it is possible to change it or acquire others. Our results support the proposal that the perceptual modes used by observers are determined at least in part by their environments. Moreover, when an observer moves to a new environment, new modes can be deployed for at least some perceptual processes, even as an adult.

Cultural influences on perception and cognition

The influence of culture on perception and cognition was discovered over a century ago, in studies showing that the strength of several visual illusions depended on the cultural background of the observer (see, e.g., M. Nisbett & Masuda, 2003; R. E. Nisbett, Peng, Choi, & Norenzayan, 2001). Over time, such differences were also encountered in higher-level aspects of cognition; for example, cultural background was discovered to influence psychological constructs such as self-esteem (Markus & Kitayama, 1991). Later work showed that it could also influence self-construal (Heine, 2005), orientation toward others (Heine, Takemoto, Moskalenko, Lasaleta, & Henrich, 2008), and the manifestation of stable personality traits such as extroversion and introversion (Allik & McCrae, 2004).

R. Nisbett (2003) argued that many of these differences could be explained by factors that were relatively general. One was an emphasis by Westerners on units in isolation from context, reflecting an analytic mode, which focuses on the extraction and manipulation of key items. East Asians, in contrast, appeared to prefer a holistic mode, which emphasizes relationships to other units and to overall context, placing greater emphasis on the “big picture.” Thus, for example, Westerners are often thought to be better than East Asians at recognizing objects that reappear with a background other than the one originally present (Masuda & Nisbett, 2001); in contrast, East Asians are often believed to be better at tasks in which a holistic approach would be more suitable, such as recalling contextual information about the background in an image.

These differences also appear in the allocation of visual attention. For example, Boduroglu, Shah, and Nisbett (2009) compared the abilities of East Asians and Westerners (Americans) to detect a change in the color of a square in two successively presented arrays, with the second appearance of the array being a simple expansion or contraction of the first. East Asians were better than Westerners at detecting changes when the second display was an expansion, but worse when it was a contraction. They were also relatively poor at detecting changes in the center of the display. This suggested that East Asians have a mode of attentional allocation in which spatial extent is relatively broad (Boduroglu et al., 2009). A similar pattern was found in the perception of Navon figures, which contain structure at both the local and global levels; here, East Asians were more sensitive to global structure than were Westerners (McKone et al., 2010). This tendency is evident even in early childhood (Imada, Carlson, & Itakura, 2013). Moreover, it also appears to mediate other differences between East Asians and Westerners, such as the ability to shift set in executive functions (Imada et al., 2013). Interestingly, the Himba (a nomadic people in rural Namibia) have an attentional focus even narrower than that of Westerners, and are relatively unaffected by irrelevant items located just outside the area needed for the task (de Fockert, Caparos, Linnell, & Davidoff, 2011). Differences of this kind have also begun to receive support from neuroimaging studies (Hedden, Ketay, Aron, Markus, & Gabrieli, 2008; Park & Huang, 2010).

Cultural influences have even shown up in visual search, in which observers must determine as quickly as possible whether a unique target item is present among a set of nontarget (or distractor) items in a given display. The search rate (time needed per element) is believed to reflect the speed of an attentional “spotlight” that travels from item to item, binding together the shape elements (or visual features) of each item into something more coherent (see, e.g., Treisman & Gelade, 1980; Wolfe, 1994). For many conditions, Westerners show a strong search asymmetry: Search for a long line among short ones, for example, is faster than search the other way around (Treisman & Gormican, 1988). East Asians, in contrast, exhibit no asymmetry for these stimuli (Ueda et al., 2016).

Underlying factors

The effects of culture described above raise an immediate question: What is their cause? Individuals from different cultures differ in many ways—for example, speaking different languages, living in different urban or rural geographies, and encountering different styles of art. These differences also include the unique norms, histories, traditions, social customs, patterns of interactions, and writing systems that differentiate cultures, not to mention possible effects of genetic structure. All of these factors (and more) are potentially relevant with regard to perceptual mode.

How might we categorize the possible factors involved? A first step is to separate factors extrinsic to the observer (e.g., environmental factors) from those that are intrinsic (e.g., genetic structure). Subdividing the environmental factors, a relatively natural division is via the particular aspects involved: the natural environment (e.g., weather, geographic features), the artifactual environment (e.g., typical architecture, density of urban areas), and the cultural environment (including both spoken and written forms of the language used). Another perspective would be the visual aspects of all these, which constitute the visual environment.

The effects of cultural environment have already received a degree of attention. A well-known example is the Sapir–Whorf hypothesis, which posits that a given language can restrict or encourage the expression of various cognitive and emotional processes (see, e.g., Heine, 2008). The strongest version of this hypothesis—that language determines what we think—has been challenged by studies on the Dani tribe of New Guinea, who had no difficulties learning the English words for many colors, despite having only two words for color in their own language (Boroditsky, 2001). But a weaker form of this hypothesis—that language can influence how we think and perceive the world—still remains viable (Boroditsky, 2001).

Evidence also suggests that perceptual mode can be influenced by visual environment. For example, East Asians tend to have saccades of smaller size and fixations of shorter duration than Westerners, possibly indicating a more holistic mode that reflects the visual structure of their surroundings (Rayner, Li, Williams, Cave, & Well, 2007). This is consistent with the finding that the Himba, who live in a desert relatively devoid of clutter, exhibit a perceptual mode even more analytical than that of Westerners (Caparos et al., 2012; de Fockert et al., 2011).

An interesting question in this regard is whether perceptual mode could change in response to changes in the visual environment. There is some evidence in favor of this possibility. Miyamoto, Nisbett, and Masuda (2006) measured the information search performance of Japanese and American participants who viewed images of American and Japanese cities of comparable sizes and population densities. All observers adopted a holistic search pattern when viewing images of Japanese cities, but an analytic search pattern when viewing American cities (Miyamoto et al., 2006). Miyamoto et al. argued that when the visual environment is chaotic and ambiguous (an environment commonly encountered in many East Asian countries), a holistic mode tends to be more useful, whereas when the visual environment is more orderly and predictable (a situation typical of urban areas in North America), an analytic mode is better. A similar cross-cultural variation exists in the susceptibility of observers to the Müller-Lyer illusion, likely for similar reasons (McCauley & Henrich, 2006).

Acculturation

To determine the extent to which the development or deployment of different perceptual modes is affected by the environment, a useful perspective is the behavior of individuals who migrate to a country with a different culture. Such individuals typically undergo a process of acculturation, a set of changes on various psychological dimensions caused by the move from their original (heritage) culture to a new (host) culture (Heine, 2008). If environmental factors continue to influence the development or deployment of perceptual modes over their lifespan, and if genetic factors are relatively weak, the perceptual experience of such individuals might change noticeably in response to their new environment, even for the same stimuli.Footnote 1

Although acculturation is not completely understood, cultural psychologists have mapped out several of its components. For example, Heine and Lehman (2004) investigated the self-esteem of adult Japanese and American immigrants. American culture promotes a strong sense of self and high self-esteem, whereas Japanese culture encourages conformity to group norms and critical self-reflection. Within 1 year of arrival in their host environment, Japanese immigrants developed a level of self-esteem similar to that of Americans, whereas the self-esteem of American immigrants became more like that of the Japanese, suggesting that self-esteem can change with cultural environment, even for adults. It also suggests that the basis of this change is largely independent of factors such as personality and social situation.

Changes induced by acculturation are not limited to high-level constructs. Kitayama, Duffy, Kawamura, and Larsen (2003) compared East Asians and Americans on the framed-line test. Here, participants were shown a line within a frame and then asked to draw it in a new, blank frame; the line could have either the same length as the original (absolute) or the same ratio (relative). As compared to East Asians, Americans performed better on the absolute task and worse on the relative one; East Asians showed the opposite pattern (see also Ji, Peng, & Nisbett, 2000). However, Japanese who had been in the United States for less than four months and Americans in Japan for a similar amount of time performed much like individuals in their host culture (Kitayama et al., 2003). Similarly, Himba who migrated to (Westernized) urban areas eventually developed a more holistic perceptual mode, with some manifestations of this (e.g., experiencing the Ebbinghaus illusion) attaining a strength comparable to that among Westerners (Caparos et al., 2012).

Scope of the study

The results above suggest that the perceptual mode used by an individual depends at least in part on their environment, and may under some circumstances change in response to a new one. In what follows, we investigated the extent to which this is true of visual attention and visual memory.

To maximize the connection with previous work, we focused on East Asian immigrants who have lived in a Western country (Canada) for different amounts of time. We also used two perceptual tasks in which cultural differences have been found, but that involve stimuli with minimal culture-specific meaning. To examine visual attention, we used visual search, in which an observer must detect as quickly as possible the presence of a given target item (a unique color or shape, say) among a set of nontargets, or distractors. For some tasks, Westerners show a strong asymmetry (e.g., search for a long line among short lines is faster than for short among long), whereas East Asians exhibit no such asymmetry (Ueda et al., 2016). To examine visual memory, we used object recognition, which tests visual long-term memory for images of objects in particular backgrounds. East Asians have been reported as being worse than Westerners at recognizing objects that reappear in a background that is different from the original one, while being better at recalling properties of the background itself, perhaps because of a greater reliance on the holistic mode (Masuda & Nisbett, 2001). These tasks can let us determine the extent to which the two groups of observers differ in their general analytic/holistic processing strategies, and the extent to which these differences correlate in different perceptual processes.

General method

Each observer was given two tasks—visual search and object recognition—that were intended to assess the modes used for visual attention and long-term visual memory, respectively. For both tasks, the monitor screen had a visual angle of 24° vertically × 32° horizontally, with a refresh rate of 112 Hz. Participants were seated 57 cm from the screen. Before providing data for any task, participants completed a set of practice trials identical to those used in the experiment, with these continuing until error rates were below 5 %. The order of the tasks was counterbalanced. To reduce interference between the tasks, participants took a 10-min break between them, during which they filled out a brief form.

After completing both tasks, observers were given two questionnaires: one about the individual’s cultural attitudes (see Appendix A) and the other about more detailed demographic information (see Appendix B). These were given at the end of the experiments, to avoid priming any potential culture-specific responses. The demographic questionnaire probed factual information (e.g., birth country, first language), whereas the acculturation test assessed subjective attitudes. Because the latter appeared more susceptible to bias, participants completed the acculturation questionnaire first.

The instructions for the tasks and questionnaires were always in written form. For some observers, these were in English; for others, these were in their native (heritage) language, either JapaneseFootnote 2 or Chinese. The language of instruction was not randomly assigned (as we describe in detail below). The instructions and questionnaires in non-English languages were back-translated by two research assistants fluent in both English and the heritage language. The instructions for the visual search task were printed on paper; the instructions for the object recognition task were shown onscreen. The entire experiment lasted about 1 h, and participants were paid ten Canadian dollars for their time.

Participants

Three groups of participants were recruited. Group R consisted of East Asians who had recently arrived in Canada—that is, within the past 2 years.Footnote 3 Members of this group were required to have been born in China or Japan, with all parents and grandparents being Chinese or Japanese, and not to have traveled outside their home country for longer than four weeks at a time prior to immigrating. This group contained 46 people whose average age was 21.4 years.

Roughly half (n = 19, average age = 20.2 years; 13 females, six males) of Group R was recruited from an on-campus Japanese exchange program. These participants were unable to read English, and so were administered instructions in Japanese. The remaining participants (n = 27, average age = 20.6 years; 16 female, ten male) were recruited from an on-campus guest house for international students; they understood and were administered English instructions. Because of possible differences due to the different languages of instruction (Ji, Zhang, & Nisbett, 2004), we split Group R into two subgroups: R-h, whose participants received Japanese (heritage) instructions, and R-e, whose participants received English ones. Apart from the language of instruction and average time in Canada (1.8 months vs. 1–24 months), Groups R-e and R-h fit the same criteria.

A second group (Group W) was formed of Caucasians who had been born and raised in Canada, had never become fluent in a language other than English, whose parents and grandparents were Caucasian and had been born in Canada, and who had never traveled outside Canada for more than four weeks at a time. This group was considered the Western counterpart of Group R. It contained 30 people whose average age was 21.7 years, with eight females and 15 males (seven of the participants did not record their gender).

To examine the effects of acculturation, a final group of participants (Group L) was formed of East Asians who had lived in Canada for a relatively long time—2 years or more. As in the case of Group R, the members of this group were required to have been born in China or Japan, with all parents and grandparents being Chinese or Japanese. The amount of time these individuals had lived in Canada ranged from 29 to 312 months, with an average of 160 months. This group consisted of 88 people. As in the case of Group R, these were split into two subgroups. Most participants (n = 60; 35 female, 22 male, and three who did not record their gender) were given instructions in English; these formed group L-e, a counterpart to group R-e. Another group of long-time immigrants (all Chinese) were given instructions in written Chinese. This group, L-h, had 28 people (20 female, four male, and four who did not record their gender).

Note that the language of instruction was not randomly assigned, but was determined by the competencies of the participants: For example, Group L-h was formed from long-term immigrants who could read their heritage language, whereas L-e was formed without regard to participants’ competencies in their heritage language. The lack of Chinese immigrants in Group R-h created a similar situation. Owing to the existence of these potentially confounding factors, any substantial differences between the -h and -e groups can only be suggestive with regard to underlying causes.

All participants were recruited online from the University of British Columbia student population, except for the 19 participants of Group R-h, who were contacted via personal e-mail.

Visual search

This task was generated using the VSearch software (Enns, Ochs, & Rensink, 1990). Observers were given two search conditions: a long among short items, and a short among long items. The target was present on a randomly selected half of the trials in each condition. We replicated the setup of Ueda et al. (2016), who followed a relatively common design. In it, observers viewed displays consisting of an array of three, six, or 12 items, with density kept constant. Each item was a white line, with short lines being of length 0.8° and long lines of length 1.0°. These were displayed on a black background, with the maximal extent of the array of items being 6.0° × 5.0° (Fig. 1).

Fig. 1
figure 1

Example of stimuli for the visual search task. In one of the conditions (left), participants searched for a long among short lines. In the other (right), participants searched for a short among long lines

Each search task consisted of 300 trials: 150 in the long-target condition, and 150 in the short-target condition. These were blocked and their order counterbalanced. Each task was split into five blocks of 30 trials each, with participants being given an opportunity to rest between blocks. Participants were given written instructions, and then sets of practice trials consisting of 25 trials each. They were administered practice sets until the error rates were less than 5 %; participants typically achieved this on their first set of practice trials.

Each trial began with a fixation display (a white cross in the middle of the black screen) followed by the search display, which remained on until the participant responded. Participants pressed one key for “present” and another for “absent,” with the mapping of the keys counterbalanced across the participants. Participants were shown feedback after each response: a “+” sign for correct and a “–” sign for incorrect.

Object recognition

Object recognition was assessed by the “foreground–background” task of Chua, Boland, and Nisbett (2005). All images contained a “focal object” placed into a preexisting background using a graphics editor. We used the same stimulus set and presentation order described by Chua et al. Focal objects fell into one of two categories (animals or nonliving things); following Chua et al., 20 images of animals and 16 of nonliving things were used. In all stimuli tested, the backgrounds were taken from separate photographs and featured a collection of East Asian and Western scenes; 36 (18 of each type of scene) were used. Each test image occupied about 70 % of the screen and was presented on a black background (Fig. 2).

Fig. 2
figure 2

Example image for the object recognition task. Here, a focal object (deer) is placed in a forest background

The object recognition task itself consisted of two phases. In Phase 1 (study), observers were presented with a sequence of 36 images and asked how much they liked each image, with no instructions to focus on either the focal object or the background. Judgments were given on a 7-point Likert scale (1 corresponding to the least-liked image, 7 to the most-liked); responses were entered via keyboard. Each image was 16° × 22° in extentFootnote 4 and was displayed for 3,000 ms, after which it was replaced by a white screen of equal size with a black fixation cross in the center. Participants entered their response only after the image had disappeared—that is, they always viewed each image for 3,000 ms.

After completing this phase, participants were given a short “distractor” task intended to interfere with any conscious recollection strategies. Similar to the distractor task used by Chua et al. (2005), participants were asked to count backward out loud from 100 by steps of 5. After finishing this, they proceeded to Phase 2.

In Phase 2 (recognition), participants were shown 72 images, one at a time. Again following Chua et al. (2005), 36 of these contained images of the focal objects shown in Phase 1, half in their original backgrounds and half in new backgrounds not previously seen. The other 36 images contained new focal objects—half in the backgrounds present in Phase 1, and half in new backgrounds. For each image, participants reported whether they had seen the focal object before. The image was replaced by a fixation screen once either the participant had responded or 3,000 ms had elapsed. The next trial started after the participant had entered a response. Following Chua et al., all instructions were given onscreen, and participants were not given any feedback after making a response.

Questionnaire 1: Modified Vancouver acculturation index

After completing both perceptual tasks, participants were given a modified version of the Vancouver Acculturation Index (VAI), a common measure of the degree of acculturation of an individual. (For details, see Appendix A.) All participants were given the questionnaire in the language used for their task instructions.

The VAI is based on a bidimensional model in which changes in cultural orientation toward heritage and host cultures can occur independently. Individuals can therefore identify with both their heritage and host cultures to varying degrees. As individuals continue to immerse themselves in the host culture, they gain cultural fluency and assimilate to their new social and cultural environment, potentially independently of their affiliation to their heritage culture (Ryder, Alden, & Paulhus, 2000). As such, the VAI provides a snapshot of how an individual feels about his or her cultural immersion and acknowledges the potential for individuals to have bicultural identities.

Questionnaire 2: Demographic information

After completing the modified VAI, a custom-designed questionnaire requested more detailed demographic information. (See Appendix B.) This was also given in the same language used for the task instructions. After completing the questionnaire, participants completed a standard sign-in form, which also asked for the number of months of their stay in Canada.

This questionnaire was intended to measure factors such as cultural immersion, language use, and breadth of exposure to various visual environments. Participants provided information on their country of origin, language proficiency, and the location and duration of travels outside Canada. They also indicated their gender, age, percentage of their life spent inside Canada, and—if they were immigrants—age of entry into Canada. The questionnaire also included the country and city of their birth, their parents’ countries and cities of birth, and time spent traveling. It also asked participants to indicate which countries other than Canada they had visited, and for how long.

Time abroad was intended to capture potential exposure to different visual environments independently of cultural competencies. Exposure to such environments might influence the flexibility with which observers could develop or switch perceptual modes. The length of time and location away from Canada were deemed influential if the participant had traveled for more than four weeks, because significant and measureable changes to perception can occur beyond this length of time (Leung, Maddux, Galinsky, & Chiu, 2008).

Results

Visual search

The primary measures of interest were response time (RT) and error rate. The RTs for each observer were log-transformed, and the resulting averages were used as the basis for the search slope (the average time needed per item in the display) and baseline (time needed for the first item). The search slope was calculated by regressing the average RTs for each set size (3, 6, 12) against the set size itself. The baseline was calculated by taking the intercept of the regression line with the y-axis (corresponding to the setup time for processing) and adding the search slope (corresponding to the time needed for a set size of 1). We omitted data from participants whose search slopes deviated more than three standard units from their group’s mean. Applying this standard, we excluded the data from two participants (one each from the R-h and R-e groups).

The results for the various groups are shown in Fig. 3. In all cases, the analysis involved target-present trials only; target-absent slopes were highly variable (likely indicating large differences in high-level search strategies), and so were omitted from the analysis. Since we found no significant differences in the error rates for any group [F(4, 159) = 1.6, p > .19], these were omitted from the subsequent analysis.

Fig. 3
figure 3

Response times and error rates for visual search. Averages are shown for each of the five groups of participants. Lines show the average response times, and bars the average error rates. Error bars indicate standard errors of the means. R-e, recent immigrants with English-language instructions; R-h, recent immigrants with heritage-language instructions; L-e, long-term immigrants with English-language instructions; L-h, long-term immigrants with heritage-language instructions; W, Westerners with English-language instructions

Search asymmetry was assessed in two ways. The first was via a within-subjects t test of the slopes between the short- and long-target conditions for each group. Here, “asymmetry” was regarded as categorical: It existed if and only if the slopes differed reliably. The second, more quantitative measure of asymmetry was the slope ratio, the ratio of the short- to the long-target slopes (cf. Treisman & Souther, 1985). These ratios were log-transformed to obtain a reasonably normal distribution; analysis of the slope ratios was done on these transformed values.

The results are shown in Table 1. Replicating the results of Ueda et al. (2016), Group R-e showed no significant asymmetry (short = 56 ms/item, long = 50 ms/item; p > .2, ratio = 1.29), and Group R-h also showed no significant asymmetry (short = 47 ms/item, long = 42 ms/item; p > .4, ratio = 1.33). Although both groups tended to have slightly higher baselines for short targets, these tendencies were not significant, either for Group R-e (short = 947 ms, long = 904 ms) or Group R-h (short = 1,025 ms, long = 943 ms; both ps > .35). More generally, no significant differences between the groups were apparent in either the slopes (both ps > .17) or baselines (both ps > .4), suggesting no effect of language of instruction for East Asians with only a limited exposure to a Western environment. For most purposes, therefore, we combined these into a single group (R). This combined group (n = 44) still showed no significant asymmetry in slopes (short = 52 ms/item, long = 46 ms/item; p > .12, ratio = 1.3) or baselines (short = 979 ms, long = 922 ms; p > .1).

Table 1 Results for visual search

Meanwhile, replicating previous studies of Westerners (e.g., Treisman & Gormican, 1988), Group W—the Western counterpart of Group R—showed an asymmetry that was quite strong: Differences were found both in both slopes (short = 53 ms/item, long = 39 ms/item) [t(29) = 3.5, p < .002, d = 0.66, ratio = 1.81) and baselines (short = 864 ms, long = 785 ms) [t(29) = 2.4, p < .03, d = 0.46].

Interestingly, group L-e (long-term immigrants given the instructions in English) also showed a strong asymmetry, with search being much faster for long targets than for short (short = 58 ms/item, long = 37 ms/item) [t(59) = 6.4, p < .0001, d = 0.81, ratio = 2.05]. Baselines also tended to differ (short = 894 ms, long = 847 ms) [t(59) = 2.03, p < .05], though this effect was relatively small (d = 0.26). Post-hoc comparisons using Tukey’s HSD showed that the L-e slope ratio did not differ from that of Group W (2.05 vs. 1.81, p > .4). Indeed, Group L-e did not differ significantly from Group W in either slopes [short: t(59, 29) = 0.52, p > .41; long: t(59, 29) = 0.28, p > .67] or baselines [short: t(59, 29) = 0.48, p > .56; long: t(59, 29) = 0.94, p > .16]. In other words, performance in long-term immigrants was much the same as that of native-born Westerners.

In contrast, post-hoc comparisons using Tukey’s HSD confirmed that the slope ratios of Group L-e differed from those of Group R (p < .02). To analyze this in greater detail, we conducted a two-way mixed analysis of variance (ANOVA) with Group (L-e, R) and Target Length as factors. This revealed a main effect of target length [F(1, 102) = 25.2, p < .0001], no main effect of time in Canada (p > . 55), but an interaction between target length and time in Canada [F(1, 102) = 7.9, p < .007]: Namely, only Group L-e was faster at searching for long than for short targets (p < .05), whereas Group R was equally slow at searching for long and short targets (p > .05). Thus, although length of time in Canada did not affect overall search speeds, it did affect the differences: participants who had lived in Canada longer showed a greater difference in speed—and in particular, were faster for long targets—than recent arrivals.

Meanwhile, Group L-h (long-term immigrants given the instructions in their heritage language) exhibited a pattern largely similar to that for Group R. For observers in Group L-h, search slopes did not differ significantly (short = 43 ms/item, long = 40 ms/item; p > .5, ratio = 1.2), nor was there a significant difference in baselines (short = 899 ms, long = 836 ms; p = .08). A post-hoc two-factor ANOVA of Groups L-h and L-e showed no main effect of instruction group [F(1, 86) = 2.6, p > .1], but did show a strong interaction between instruction group and target length [F(1, 86) = 8.07, p < .006]: In particular, Group L-h had shallower short-target slopes than did Group L-e [t(86) = 1.9, p < .05, d = 0.62], whereas long-target slopes did not significantly differ (p > .05). In summary, then, instructions in their heritage language appeared to cause these observers to use a perceptual mode that resulted in relatively little search asymmetry, with search being faster overall than for Group L-e.

Although the proportions of male to female participants were roughly similar overall, the proportions did differ between groups (p < .02). To rule out the possibility that these differences were responsible for the effects found, we tested whether gender had any effect on the slope ratios. No significant effect was found (p > .39), making it unlikely that these imbalances affected our results.

Correlations with questionnaire responses

To determine whether search performance depends on nonvisual environmental factors (in particular, ones involving culture), we examined the extent to which the degree of asymmetry—or more precisely, the slope ratio—correlated with group and individual responses on the questionnaires. The first of these measures (VAI) assessed two independent dimensions (affinities to the heritage and host cultures), each based on several questions involving a Likert scale ranging from 1 (strongly disagree) to 9 (strongly agree). Differences in participants’ raw VAI responses can be dampened by a bias to default to the “neutral” response. To minimize dampening effects from participants who did not respond strongly to some questions, we calculated for each question the “median-split score”—that is, the percentage of responses from each participant that were greater than 5. Although the median-split score is a coarser measure than the raw responses, neutral responses do not dampen it. As such, it was a more sensitive measure than the means for group and individual polarizations toward the heritage versus host cultures. (See Appendix C). The second source of information was the demographic questionnaire, which described the life history of the participant, especially with regard to exposure to different cultures. (For details, see Appendix D). Westerners were exempted from these analyses. In addition, several participants (three from Group L-h, one from L-e, one from R-h, and one from R-e) turned in questionnaires that were blank or otherwise incorrectly completed. The data from these participants were excluded.

For the VAI, we calculated the average median-split percentages on the heritage and Western subscales for each group. Scores are shown in Table 2.

Table 2 Averages and standard errors of the median-split scores of individuals in each Asian group for the Heritage (Asian) and Host (Western) culture affinity subscales of the Vancouver Acculturation Index

The performance of East Asians on the framed-line task appears to depend on their level of self-identification with their heritage culture (Hedden et al., 2008). Such self-identification could capture a host of factors that might influence visual search, such as susceptibility to visual or linguistic (i.e., the language of instruction) primes. In accord with the findings of Hedden et al., VAI responses at the group level were consistent with some of the patterns in visual search. Recent arrivals had relatively symmetric search and also indicated greater self-identification with their heritage than with Western culture. On the other hand, although the L-h and L-e groups differed in search, the L-h group did not show a significantly greater identification with their heritage than with Western culture, suggesting that these factors may not have played a large role in visual search.

We also calculated for each East Asian group the Pearson correlations between the slope ratios and the various quantities on the two questionnaires. This was also done for all East Asians pooled together. (We refer to this pooled group as Group A.) Almost none of our predictors had a significant effect. The only exceptions were the North American subscale and the number of months in Canada. Controlling for months in Canada, individuals with greater North American subscale scores showed a weak correlation with faster search for long lines (r = –.18, p < .04). Meanwhile, when controlling for scores on the North American subscale, a positive relationship was found between slope ratio and months in Canada (r = .25, p < .004); this essentially restates our main finding that the slope ratio depended on length of time in Canada.

The absence of a relationship between search performance and responses to these questionnaires (other than length of time in Canada), coupled with the observation that group-level differences in cultural affinity failed to differentiate the L-h and L-e groups, suggests that socio-cultural identity plays at most a minor role in visual search asymmetry.

Finally, to get an approximate estimate of how search asymmetry is affected by the amount of time spent in a new environment, a simple post-hoc analysis was carried out on the East Asians given instructions in English. This group was split into four quartiles according to time spent in Canada, with the average of each quartile taken as its value. The results are shown in Fig. 4. As is clear from this figure, search asymmetry is low during the first year or so in Canada, after which it climbs to a level similar to that of native-born Westerners and stays at that level.

Fig. 4
figure 4

Degree of search asymmetry as a function of time in Canada. Mean slope ratios are shown for each quartile; error bars describe standard errors of the means. For the first year or so in Canada, the degree of asymmetry is low, after which it reaches and stays at a level comparable to that of native-born Westerners (the dashed line represents the mean; the gray area represents the standard error of the mean for Group W)

Object recognition

Performance on this task was assessed by comparing two measures: (i) the accuracy of recognition when a focal object reappears in the background in which it was originally seen, and (ii) the accuracy when it appears in a new background (referred to as “old” and “new,” respectively). The degree of holism is reflected in the extent to which these measures differ: Worse performance for new backgrounds indicates a more holistic mode (Chua et al., 2005). Similar to the case for search asymmetry, holism was assessed in two different ways. First, it was treated as a categorical factor, with error rates in the two conditions being compared via within-subjects t tests. Next, the degree of holism was assessed, using the recognition ratio (error rate for new backgrounds/error rate for old ones). Owing to random experimenter error, 11 participants (three in Group R-h, two in R-e, two in L-e, one in L-h, and three in W) left the study room before completing the object recognition task. Because these errors were distributed uniformly over all five groups, they reduced the degrees of freedom but not the internal validity of our analyses of the groups’ object recognition performance.

The results are shown in Fig. 5, and detailed comparisons are in Table 3. Consistent with Chua et al. (2005), all East Asian observers exhibited a clear holistic mode, performing measurably worse for new than for old backgrounds. Interestingly, participants instructed in their heritage language (Groups R-h and L-h) performed no better on the object recognition task than those given instructions in English. Indeed, the recognition ratios for all groups—including Group W—did not differ significantly from each other (one-way ANOVA, p > .57). In other words, Group W (Westerners who had lived in Vancouver most of their lives) had about the same degree of holism (old = 22 %, new = 35 %; t(39) = 3.3, p < .003, d = 0.83, ratio = 2.1) as Group A (old = 28 %, new = 43 %; t(125) = 7.2, p < .0001, d = 0.69, ratio = 2.7).

Fig. 5
figure 5

Error rates for object recognition. The average rates are shown for each of the five groups. Error bars indicate standard errors of the means

Table 3 Results for object recognition

Westerners appeared to have lower error rates on the task than did East Asians. If our Westerners had an analytical processing style, East Asians should have shown a greater difference in performance between the old and new backgrounds than Westerners. To assess this possibility, we conducted a two-way mixed ANOVA comparing the error rates for old and new backgrounds between the Westerners and the (pooled) Asian group. The test confirmed the main effect of group, with Westerners performing better than East Asians overall [F(1, 174.7)Footnote 5 = 8.34, p < .005], and the main effect of background, with new backgrounds yielding higher error rates than old backgrounds [F(1, 174.7) = 28.5, p < .0001]. Crucially, the test failed to find an interaction between background and group [F(1, 174.7) = 0.23, p > .63]: Both groups appeared equally affected by the switch from a new to an old background, consistent with a common, holistic processing style. The RTs of Groups W and A also did not differ significantly (W = 1,850.6, A = 1,845.3; p > .65). In any event, given the lack of difference in relative error rates between the two groups, it is unlikely that speed–accuracy trade-offs can account for this pattern of results.

It is not clear why the native Vancouverites tested here (Group W) performed differently from the Midwesterners tested by Chua et al. (2005). Assuming that this was not due to sampling error or to the slightly smaller images used in our experiments, it may be that a more holistic mode is induced by living in the high visual density of the Vancouver urban area. It may also be that Vancouver has enough of an East Asian influence in its visual environment or social culture to induce all of its inhabitants to adopt a more holistic visual mode. Settling this issue must await future studies. In any event, it is worth pointing out that since the Westerners tested here used a holistic mode in visual recognition similar to that of East Asians, the analytic/holistic distinction (at least, as applied to visual memory) is unlikely to account for the differences in search asymmetry encountered in Experiment 1.

Correlations with search

To further investigate possible connections between visual search and visual recognition, we calculated Pearson correlations between the recognition ratios and slope ratios (as well as between the slopes for short and long targets). This was done for each of the East Asian groups, as well as for all observers pooled together. No correlations were significant (all ps > .07). This result accords with the conclusion above that analytic/holistic differences in object recognition are unlikely to explain the differences in search found here.

Correlations with questionnaire responses

Finally, for each of the four non-Western groups, we calculated Pearson correlations between the recognition ratios and various quantities from the questionnaire data, such as months in Canada and the median-split scores on the two VAI dimensions. This was also done for these four groups pooled together.

As in the case of search asymmetry, almost none of our predictors had a significant effect. The only exception was a modest (r = .19, p < .04) inverse relationship between holism and the participant’s median-split score on the North American VAI dimension. Pooling across all East Asians, observers who were more strongly oriented to North America tended to be somewhat less holistic, regardless of how they stood on the Heritage subscale. In general, however, cultural factors appeared to play a relatively weak role in visual recognition, consistent with performance on this task being largely the same for all cultural groups tested.

General discussion

The results of the experiments here indicate that environmental factors can strongly affect the perceptual mode deployed by an observer. Our results replicated previous findings that observers raised in a Western environment show a strong asymmetry in visual search for length (Treisman & Gormican, 1988), whereas observers raised in an East Asian environment have no significant asymmetry on this task (Ueda et al., 2016). In addition, our results showed that East Asians who live in a Western environment for more than a few years become as asymmetric as native Westerners. They also showed that such observers lose this asymmetry when given written instructions in their heritage language.

No significant correlations were found between asymmetry and various measures of cultural identity, suggesting that the factors responsible were likely not cultural per se. Instead, they may have involved physical factors, such as the density of buildings or the nature of the written language. Our results also indicated that—contrary to previous findings (Chua et al., 2005)—East Asians and Westerners can be equally holistic in object recognition, so that differences in search asymmetry were unlikely to stem from differences in general modes of analytic/holistic processing (Miyamoto, Nisbett, & Masuda, 2006).

These results have several implications. First, they support the thesis that the environment—and likely, the visual environment—can affect the perceptual modes available to an observer, and that this can occur even for adults. They also indicate that a new environment can lead to the development—or at least, the deployment—of a new attentional mode in as little as 2 years. This conclusion is broadly consistent with the results of other studies, such as those on the rural Himba, who when they continually visited an urban environment were found to become more susceptible to the Ebbinghaus illusion (Caparos et al., 2012).

Our results also have implications for current models of visual search. To begin with, none of these models (e.g., Treisman & Gelade, 1980; Wolfe, 2007) can easily account for different degrees of search asymmetry in observers from different cultures; they must be expanded to account for the existence of different perceptual modes. Moreover, they must not only explain when asymmetry exists, but also when it does not; environmental factors will almost certainly play a role, here. And the resulting architecture must be flexible enough to enable different modes to develop (or at least, be deployed) in response to changes in the environment.

Our finding that observers who behave differently on visual search do not differ on object recognition also suggests that a single analytic/holistic mode cannot apply to both visual attention and visual memory. (Observers might, however, behave holistically with regard to visual memory and analytically with regard to visual attention.) Beyond this, however, caution is required: The analytic/holistic distinction, for example, may not even apply to visual search. The latter view is consistent with the proposal of Caparos et al. (2012) that not all differences in perceptual mode have their origins in the analytic/holistic distinction. It is also consistent with the suggestion by Ueda et al. (2016) that cultural differences in search are due to differences in the ways that properties are encoded, rather than differences in the use of analytic or holistic processing.

The suggestion of Ueda et al. (2016) was based on their finding of different patterns of asymmetry for different visual stimuli. However, a holistic perceptual mode might still account for the lack of asymmetry found here. This mode creates a preference for relative over absolute judgments; for example, Japanese observers appear to be more accurate than Westerners at determining the relative length of a line within a frame (Kitayama et al., 2003). As such, judging the length of a line in relation to its frame might be a natural way for East Asians to handle a search task. Given that some models of search view asymmetry as arising from the relationship of the target item to the background (e.g., Rosenholtz, 2001), a holistic mode might therefore include the length of the monitor frame in the calculations of item lengths, weakening the relative distinctiveness of the targets, and thus reducing the asymmetry.

Another challenge for current models is the finding of a significant difference in the performance of long-time East Asian immigrants, depending on the language of instruction. Because we did not randomly assign bilingual immigrants to the two conditions, we cannot unequivocally claim that instructional language was the key factor. But it does raise the question of what could have led to the difference in performance. If it were the language of instruction, this would suggest that the use of a language can “trigger” an alternate perceptual mode. More generally, it suggests that adults might host multiple perceptual modes, with the particular one used being triggered by appropriate environmental conditions (e.g., the appearance of the text in the instructions). This possibility would raise a set of interesting issues concerning the nature of triggering—for example, what kinds of stimuli can trigger a mode, and what else is brought about when this happens?

Such a research program could usefully interact with work in cultural psychology. Two theories are currently dominant regarding how bicultural individuals make sense of and respond to their cultural world. The first is frame switching: Individuals have two sets of cultural schemas/modes and act according to the particular one in effect. The second is blending, in which individuals blend or mix both modes; the result is a hybrid that incorporates the ways of being, knowing, and perceiving of both cultures (Heine, 2008). Cultural psychologists are uncertain which of these is more likely to be the case. The results here support at least some degree of frame switching, with the reading of a particular written language perhaps being a trigger for the switch.

This latter possibility is supported by the finding that biracial individuals primed with their White identity show a visual search advantage similar to that of Whites when identifying White faces, and a similar advantage for Black faces when primed with Black identity (Chiao, Heck, Nakayama, & Ambady, 2006). This suggests that the cognitive flexibility resulting from navigating two cultural worlds—like that of the long-term immigrants studied here—could enable visual search modes to change depending on the context or environment.

Meanwhile, given that we found—in apparent contradiction to the results of Chua et al. (2005)— visual memory to be equally holistic for East Asians and Westerners, what can be said about their underlying mechanisms? The holistic visual memory of East Asians has sometimes been explained with reference to the visual qualities of East Asian city scenes; for instance, photographs of Japanese scenes tend to be rated as more “complex” and “ambiguous” than Western ones, with more “foreground objects” and “invisible portions” (Masuda, Gonzalez, Kwan, & Nisbett, 2008). It may be that many Westerners in Vancouver are exposed to East Asian media, East Asian video games, East Asian food, and have East Asian friends, which may have created a high degree of Asian influence. Or it may simply be that the look (or just the density) of Vancouver is sufficiently close to that of many East Asian cities, with its inhabitants adapting accordingly. In any event, the finding of differences in search asymmetry supports the thesis that visual memory and visual attention have different susceptibilities to environmental factors.

Considerations of this kind indicate that the nature of visual environments needs to be better understood. One potential project would be to assess exactly how photographs of East Asian and Western scenes differ with respect to various factors. Indeed, it would be interesting to determine the extent to which individual cities—in both the East and the West—vary in their visual environments, and the extent to which this affects the perceptual modes of their inhabitants. It would be especially interesting to know how properties such as clutter and density might influence the formation (or, at least, the deployment) of a perceptual mode. What is needed for this is a better characterization of the visual properties of scenes relevant for human perception. The work of Yu, Samaras, and Zelinsky (2014) on visual clutter is an example of how such issues might be approached in a more rigorous way.

At a more general level, our results also suggest that researchers would benefit from knowing their participants’ cultural background, and taking the relevant factors into account when analyzing performance. Even in controlled laboratory settings, for example, a relatively small difference in environment could trigger a particular mode, affecting such things as the control of visual attention.

More generally yet, our results suggest that interesting opportunities can arise in collaborations between vision and cultural psychologists. In particular, they suggest at least seven potential research directions for future work:

  1. 1.

    Which environmental factors influence visual search?

  2. 2.

    Which environmental factors influence object recognition?

  3. 3.

    Are there constraints on the degrees to which visual search and object recognition can both be analytic or holistic? (For example, can an individual have an analytic mode for visual attention and an analytic mode for visual memory as well?)

  4. 4.

    Do the visual environments of Westerners and East Asians differ with regard to low-level visual factors such as density or clutter?

  5. 5.

    How might these factors impact visual attention and visual memory in general?

  6. 6.

    Can individuals develop multiple perceptual modes? If so, how many are possible?

  7. 7.

    Can modes be switched by changing the language of the instructions? How do such triggers depend on the culture of the observer?

In summary, then, our results show that, far from being a source of “noise” in the data, cultural variability can be an important source of insight into various issues. From the point of view of vision science, it can help us explore the nature and number of modes that can be held by an observer, as well as the mechanisms through which these modes are switched. From the view of cultural psychology, such perceptual differences may be an interesting—and potentially more reliable—way to measure aspects of cultural identity beyond those available via self-report. As such, the results of this study suggest that collaboration between cultural psychologists and vision scientists can be quite productive and may have a considerable future ahead of it.