Complexity has been an object of study in many fields, including visual perception (Oliva, Mack, Shrestha, & Peeper, 2004; Palumbo, Ogden, Makin, & Bertamini, 2014; Yoon, Lim, & Ji, 2015), auditory perception (Eerola, Himberg, Toiviainen, & Louhivuori, 2006; Hannon, Soley, & Ullal, 2012; North & Hargreaves, 1995), and memory (Alvarez & Cavanagh, 2004; Chen, Li, & Liu, 2017; Halford, Wilson, & Phillips, 1998; Kemps, 1999). Researchers have long tried to quantify complexity to either measure or control its effects (Attneave, 1957; Forsythe, Mulhern, & Sawey, 2008; French, 1954; Jakesch & Leder, 2015; Purchase, Freeman, & Hamer, 2012). However, since complexity appears in so many contexts, multiple definitions have been put forth and remain controversial (Edmonds, 1995; Forsythe, 2009; Nadal, Munar, Marty, & Cela-Conde, 2010; Rao & Lohse, 1993; Xing & Manning, 2005). For example, Snodgrass and Vanderwart (1980) defined complexity as the amount of detail or intricacy of lines in a picture, while Heaps and Handel (1999) defined complexity as “the degree of difficulty in giving a verbal description (on page 303).”

Regardless of these differences in how complexity is defined, most would agree that complexity affects behavior (e.g., McDougall, De Bruijn, & Curry, 2000; Rock, Halpern, & Clayton, 1972; Rosenholtz, Li, & Nakano, 2007; Sweller, 2010; Tuch, Bargas-Avila, Opwis, & Wilhelm, 2009). In particular, stimuli of higher complexity typically lead to longer processing times and worse performance on tasks, including diminished memory span (Alvarez & Cavanagh, 2004; Eng, Chen, & Jiang, 2005; Song & Jiang, 2006). Higher complexity stimuli are often considered to be an additional strain on working memory, consuming more working memory resources to process and manipulate the information (Liu, Chen, Liu, & Fu, 2012; Luria, Sessa, Gotler, Jolicoeur, & Dell’Acqua, 2010).

Although these results do not seem controversial, recent empirical studies (Reder, Liu, Keinath, & Popov, 2016; Shen, Popov, Delahay, & Reder, 2018) from our lab suggest that the effects of complexity on memory disappear with familiarization. For example, in one study (Reder et al., 2016), subjects who were previously unfamiliar with Chinese characters were trained to recognize these characters over a period of weeks using a visual search task with some characters presented 20 times more often than other characters. These characters were randomly assigned to frequency conditions, irrespective of their complexity. After the training sessions, subjects were asked to learn associations of two Chinese characters with one English word. The results showed that associations involving high-frequency characters were better remembered than those involving low-frequency characters. Moreover, subjects also showed better performance with high-frequency characters in a working memory task (N-back task) than with low-frequency characters.

Although those studies did not explicitly manipulate complexity, their results suggest that the effect of complexity might be modulated by whether the information has been both chunked and the chunk practiced enough that it becomes stronger (more familiar). When a stimulus is highly complex and has many parts, the process of chunking results in a reduction in the number of parts, leading to an effective reduction in “subjective” complexity. The notion that it is easier to hold information in working memory when it can be “chunked” into meaningful or organized units has been known for over half a century (e.g., Miller, 1956; Simon, 1974). More recently, we have argued that when these chunks become stronger, they consume less of limited working memory resources, making it easier to complete tasks with familiar chunks (Popov & Reder, 2020; Reder et al., 2016; Reder, Paynter, Diana, Ngiam, & Dickison, 2007; Shen et al., 2018).

While the notion that familiarity modulates the effect of complexity on memory performance seems intuitive, no prior studies have directly examined this, nor shown that relationship, to the best of our knowledge. In fact, as reviewed above, the literature tends to define complexity as an absolute quantity based on counting or classifying the presenting features and components (Chikhman, Bondarko, Danilova, Goluzina, & Shelepin, 2012; Donderi, 2006; Forsythe, Sheehy, & Sawey, 2003; García, Badre, & Stasko, 1994; Machado et al., 2015). Therefore, the experiments in this article are intended to test the hypothesis that complexity is, in fact, a relative quantity and that preexisting experience and knowledge affects memory performance on stimuli that differ in objective complexity (e.g., the primitive features or number of strokes, number of radicals).

Here, we compare the performance of two groups of subjects on stimuli that are novel to everyone, but for which the constituent parts of these stimuli vary in familiarity between the two groups. One group has had extensive experience with Chinese characters, and the other group has not. The stimuli are novel pairings for which the constituents are highly familiar for only Chinese subjects. Chinese pseudowords consist of two real Chinese characters and Chinese pseudocharacters consist of two real Chinese radicals. The stimuli to be learned will be unfamiliar for both language groups, but for native English speakers, these novel combinations are unfamiliar both at the level of combination and at the level of the constituent elements. For Chinese speakers, the stimulus cues are also completely novel but the components or constituents of these stimuli are familiar to them, either as characters or radicals. These novel stimuli are studied with different English word response terms that must recalled when the stimulus is later presented as part of a cued-recall task.

As a control for potential individual differences between the two groups on other dimensions, both groups also study pairs involving Ethiopic pseudowords as cues. Since these stimuli are equally unfamiliar for both groups, this provides a reference for ability to learn novel stimuli.

Experiment 1

Subjects studied novel symbol combinations (never seen before as a pair) along with English words. There were three lists, each with different types of stimulus pairs that served as a unique cue to an arbitrarily assigned English word. After studying a list, subjects were given a cued-recall task that required them to try to recall the English word that had been studied with the current probe stimulus. After subjects attempted to recall each cue on the current list, another test list was presented until all lists had been tested.

Method

Subjects

Fifty-one U.S. college students from Carnegie Mellon University participated in Experiment 1 for partial course credit and an additional bonus up to $5, depending on performance (accuracy) on the task. Thirty-four subjects were native English speakers with no prior experience with the Chinese language, except through casual exposure. There were 17 native Chinese speakers, fluent in Mandarin, raised in China, and educated there at least through high school. These subjects were also sufficiently fluent in English to matriculate at a good American university. One Chinese subject was dropped from the experiment because his performance was two standard deviations below the mean of the remaining subjects. This left 16 Chinese subjects and 34 English subjects.

Design and materials

This experiment used a 3 (stimulus types) × 2 (native languages) mixed design such that both groups of subjects were exposed to all three stimulus types: Ethiopic pseudowords, Chinese pseudocharacters, and Chinese pseudowords. An Ethiopic pseudoword consisted of two Ethiopic characters, while a Chinese pseudocharacter consisted of two Chinese radicals randomly combined to form a novel character never seen by either subject group. Chinese pseudowords consisted of two randomly combined Chinese characters with the constraint that they did not inadvertently form a real word in Chinese (see Fig. 1). Note that each Chinese pseudoword contained two characters, and each Chinese character contained at least two radicals such that Chinese pseudowords contained at least twice the number of strokes as a pseudocharacter. Therefore, if complexity of Chinese pseudowords and Chinese pseudocharacters is defined by the number of strokes contained in the cue, Chinese pseudowords are at least twice as complex as Chinese pseudocharacters.

Fig. 1
figure 1

Examples of Ethiopic pseudoword, Chinese pseudocharacter, and Chinese pseudoword symbol pairs

The design involved a total of 16 Ethiopic characters, 16 Chinese characters and 16 Chinese radicals. As noted above, each list was comprised of only one of the three types of stimuli. For lists involving Chinese pseudocharacters, each radical was used twice in two different symbol pairings, and each pair combination was studied with a different English word. Likewise, each character in the Chinese pseudoword condition and each Ethiopic character in the Ethiopic pseudoword condition were also in two different combinations within a list, with each pairing associated with a different English word. By using each symbol in two different pairings within a list, subjects were forced to memorize both elements of the pair (not just the left or right symbol of a pair) that was associated with a unique English word. Given that each character/radical was repeated exactly twice in each list and bound with a different character/radical for the other association, each character or radical of a pair should contribute equally to the learning of the association. Each list contained exactly 16 cues associated with a different English word.

All Chinese characters and radicals were chosen from Levels 1 and 2 ( i.e., medium-frequency and high-frequency characters) of the Standard List of Common Characters in Modern Chinese (State Language Work Committee, 1988), with between six and 12 strokes for characters (mean = 8.9) and between two and five strokes for radicals (mean = 3.8) and displayed in Song typeface. All the Ethiopic characters were chosen from online Unicode Entity Codes for Ethiopic Language (http://www.personal.psu.edu/ejp10/symbolcodes/bylanguage/ethiopicchart.html). Forty-eight English words were randomly assigned to those symbol pairs for each subject, forming 48 pair combinations. These English words were chosen from the MRC Psycholinguistic Database, had familiarity ratings of 600 or higher, and had a word length between three and six letters (http://websites.psychology.uwa.edu.au/school/MRCDatabase/uwa_mrc.htm). The size of each character/radical on the screen was 130 × 130 pixels, while the screen resolution was 1,280 × 800. The viewing distance was approximately 50 cm.

Procedure

Subjects repeatedly studied the same 48 pair combinations across six rounds to test learning. The order of the three lists was the same for each round, but the order of the 16 pair combinations in each study list and test list was randomized for each subject. Figure 2 illustrates an example round, consisting of study and test phase for each of the three stimulus types. During study trials, each symbol pair was shown along with its English word for 3 seconds before the next pair combination automatically appeared. After studying all 16 pair combinations in a list, test trials immediately began for the items on that list. The subject was cued with one of the studied symbol pairs and prompted to enter its associated English word. The 16 possible English words for that list were displayed on the left and right sides of the screen in a standardized layout, in random order across rounds. There was no time limit for responses. After entering the answer, subjects received visual feedback that indicated whether their response was correct or not. This continued until all 16 of the studied pair combinations for that list were tested. There were six rounds total for each of the three types of stimulus lists, and there was a 1-minute break between each round. The entire experiment lasted approximately 1 hour.

Fig. 2
figure 2

Example round consisting of a study and test phase for each of the three stimulus types. There were six rounds total of the three types of stimulus lists

Results and discussion

To ensure that the sample size achieved adequate power, a power analysis was conducted using the “pwr package” in R (Champely, 2016; Cohen, 1988). Because each subject had six rounds of study and test trials, we assumed a large effect size (i.e., ηp2 =.14; Cohen, 1992). The achieved power was .898, with α = 0.05.

We analyzed the accuracy data via logistic mixed-effects regressions (Baayen, Davidson, & Bates, 2008; Jaeger, 2008). Figure 3 shows the proportion correct on the cued-recall tests as a function of language group and stimulus type of cue for each round. The main effect of round was significant, ΔAIC = −2,008; LLR χ2(1) = 2,018.6; p < .001, such that accuracy increased from Round 1 to Round 6. Because we are not interested in the role of practice of the pairs (round), further analyses focus on the interaction between language group and stimulus type. There was a significant interaction between stimulus type and language group, ΔAIC = −166, LLR χ2(1) =169.412, p < .001, such that Chinese speakers performed basically the same in the Chinese pseudocharacter condition, the Chinese pseudoword condition, and the Ethiopic condition, ΔAIC = 2.7, LLR χ2(1) =1.297, p =.523. In contrast, performance for English speakers differed significantly depending on stimulus condition, ΔAIC = −493, LLR χ2(1) = 496.65, p < .001. Native English speakers performed best in the Ethiopic condition, slightly worse in the Chinese pseudocharacter condition, and worst in the Chinese pseudoword condition. In summary, this pattern suggests that, although a Chinese pseudoword contains two characters or twice the number of radicals as a pseudocharacter (i.e., it is “objectively” more complex), the effect of complexity is moderated by familiarity for Chinese speakers.

Fig. 3
figure 3

Mean performance of the two language groups over six rounds of practice. The rounds of study–test learning are shown on the x-axis with separate plots for each of the three stimulus types. Error bars represent 95% confidence intervals

We would like to conclude that complexity effects on learning are modulated by familiarity with the stimuli; however, there is an alternative explanation for these results. Specifically, it could be that Chinese pseudowords were not subjectively less complex for Chinese speakers than for English speakers, but rather that pseudowords were pronounceable, giving them a memorization advantage. That is, it would be easier to subvocally rehearse pseudowords than pseudocharacters. In general, memory researchers agree that rehearsal facilitates learning (e.g., Atkinson & Shiffrin, 1968; but see Lewandowsky & Oberauer, 2015, for an alternative perspective). Since real Chinese characters, unlike pseudocharacters, are pronounceable, native Chinese speakers could more easily pronounce the two chunks of the pseudowords compared with the two chunks of the pseudocharacters. This subvocal rehearsal advantage might compensate for any advantage of fewer strokes for the pseudocharacters. To assess the merit of this alternative explanation, Experiment 2 was designed to eliminate the rehearsal advantage for pseudowords over pseudocharacters.

Experiment 2

A plausible conclusion of Experiment 1 was that complexity in terms of number of strokes did not matter when the stimuli were highly familiar and unitized into chunks. However, that finding was subject to a potential artifact of a rehearsal advantage for Chinese pseudowords compared with pseudocharacters. To determine the plausibility of the alternative explanation, we used stimuli that effectively removed any possible rehearsal advantage. This method involved using only homophonous Chinese pseudowords. If all stimuli have the same pronunciation, subjects cannot use rehearsal to remember which Chinese pseudoword was associated with a given English word. We only used native Chinese speakers as subjects since the rehearsal advantage does not apply to non-Chinese speakers. Apart from the change in some of the stimuli and only using Chinese speakers, Experiment 2 was the same as Experiment 1.

Method

Subjects

A new group of 16 U.S. college students from Carnegie Mellon University and the University of Pittsburgh participated in this study. As defined in Experiment 1, all subjects were native Chinese speakers raised in China and educated in China at least through high school. They all also spoke English. In exchange for participation, subjects received a payment between $10 and $14, depending on performance (accuracy) on the task.

Design and materials

The Ethiopic pseudowords and Chinese pseudocharacters used in Experiment 2 were the same as those of Experiment 1. However, we selected 16 new homophonous Chinese characters; eight of them were all pronounced “zhi,” and the other eight were all pronounced “shu.” The 16 Chinese pseudowords in Experiment 1 were replaced with 16 new Chinese pseudowords that used homophones for the left and right characters such that all Chinese pseudowords would be pronounced in the same way, “zhi shu” (see Table 1). Although the pronunciation was identical, the characters differed for each pair. Like in Experiment 1, each character was used in two different symbol pairings so that subjects were forced to memorize both elements of the pair, not just left or right character of the pseudoword pair. All the homophonous Chinese characters were chosen from Levels 1 and 2 (i.e., medium-frequency and high-frequency characters) of the Standard List of Common Characters in Modern Chinese (State Language Work Committee, 1988), with stroke counts between seven and 13 (mean = 9.8). They were displayed in Song typeface.

Table 1 The list of 16 homophonous Chinese pseudowords (all pronounced “zhi shu”)

Procedure

The procedure for Experiment 2 was the same as Experiment 1.

Results and discussion

A power analysis, using the “pwr package” in R (Champely, 2016; Cohen, 1988), indicated that the achieved power was .942, with α set at .05 and a large effect size assumed (i.e., ηp2 =.14; Cohen, 1992).

Figure 4 plots the accuracy by round in Experiment 2 for the three types of stimuli. As in Experiment 1, there is a main effect of round, ΔAIC = −1,041; LLR χ2(1) = 1,050.991; p < .001, such that performance improved over the six rounds. Importantly, native Chinese speakers did not show significant differences in performance among stimulus types, ΔAIC = 2.7, LLR χ2(1) = 1.297, p = .522.

Fig. 4
figure 4

Mean performance of native Chinese speakers in Experiment 2 over six rounds. Error bars represent 95% confidence intervals

The goal of this experiment was to rule out the alternative explanation for the finding that complexity did not affect performance for Chinese speakers. That is, did pseudoword cues produce equivalent performance because they could be more easily rehearsed than the less complex stimuli and thereby offset any disadvantage from their greater number of features? The results of Experiment 2 ruled that out that explanation because cued recall performance was still as good for pseudowords when rehearsal could not benefit Chinese speakers. When all the pseudowords are pronounced the same, rehearsal cannot help a subject remember the appropriate English response term to a given character pair.

While Experiment 2 provided additional support for the conclusion that complexity should not be defined by the number of visual elements in a stimulus, and instead should take into account the user’s familiarity with the stimuli to be processed, the results do raise other questions. Zhang and Simon (1985) found that homophones performed equivalently to radicals that also could not be pronounced, and we still found the expected equivalence. On the other hand, their study showed a memory advantage for items where rehearsal could help performance (i.e., nonhomophones). Comparing across studies, it is not clear whether there would be a residual advantage for items that could be pronounced and rehearsed for Chinese speakers. Our Experiment 2 did not compare within subject nor within the same experiment a contrast between pseudowords that had distinct pronunciations versus pseudowords that were homophones. Experiment 3 was designed to compare these conditions in a within-subjects design to determine whether (a) we can demonstrate an effect of rehearsal advantage at the same time that we show the absence of a complexity disadvantage for Chinese subjects, and (b) the same set of Chinese stimulus cues generate opposite patterns for native English speakers compared with native Chinese speakers based on prior experience. In other words, can we demonstrate that Chinese subjects still are unaffected by “objective complexity” when rehearsal advantage is removed, while non-Chinese subjects are strongly affected by this objective complexity factor?

Experiment 3

In Experiment 3, we crossed the two within-subjects factors (homophonous vs. nonhomophonous characters and high vs. low complexity, as defined by number of strokes) with language group, creating a mixed design. We expected the first factor to affect performance for native Chinese speakers, but not the complexity factor (number of strokes) and the opposite pattern for native English speakers, unfamiliar with Chinese. For native English speakers, homophones should not matter, since they do not know the characters’ pronunciation, but the complexity factor should again strongly affect performance.

Method

Subjects

Subjects were 41 college students recruited from Carnegie Mellon University and the University of Pittsburgh. As defined in Experiment 1, 21 subjects were native English speakers, with no training in Chinese, and 20 were native Chinese speakers who were all raised in China, educated in China at least through high school, and also spoke English. Two subjects were dropped from the experiment because one English subject did not complete the study and another English subject’s performance was at chance. This left 20 Chinese subjects and 19 English subjects. In exchange for participation, subjects received a payment between $8 and $16, depending on performance on the task.

Design and materials

We used a mixed design, with the between-subjects factor being native language: Chinese versus English. The two within-subjects factors were complexity (high vs. low) and whether the pronunciation of the pair of Chinese characters were identical for each pair in a list (homophonous) or unique. This design yielded four conditions for the pseudoword lists: HD for high-complexity with different pronunciation; HS for high-complexity with the same pronunciation; LD for low-complexity with different pronunciation, and LS for low-complexity with same pronunciation. In addition to these four stimulus conditions, as in Experiment 1, we included a fifth stimulus type, Ethiopic pseudowords (referred to as PE).

We selected 144 Chinese characters that were not used in Experiments 1 and 2, with 36 characters for each condition. Complexity was defined by the number of strokes for a given character, determined using criteria specified in the online Xinhua Dictionary (https://zd.diyifanwen.com/zidian/bh/). Low-complexity characters were defined as having no more than six strokes (mean = 4.72), and high-complexity characters were selected to have at least 10 strokes (mean = 12.08). Within the 36 characters for each condition, there were four subgroups of nine characters each. For the homophonous conditions (HS and LS), all nine characters in a subgroup had the same pronunciation (e.g., “shi,” “zhi,” “jian”). For each condition, instead of the same 16 pseudowords (characters) for all subjects in Experiment 1, 12 characters were randomly selected to generate 12 unique pseudowords for each subject, six of which came from one subgroup and the other six from a separate subgroup. The randomized assignment of characters to each condition for each subject enabled us to further control for the potential confound that would be caused by the nature of characters, such as frequency and meaning. As in Experiments 1 and 2, each character was used twice in two different symbol pairings so that subjects were forced to memorize both elements of the pair, not just the left or right character. All 144 Chinese characters were chosen from Levels 1 and 2 (i.e., medium-frequency and high-frequency characters) of the Standard List of Common Characters in Modern Chinese (State Language Work Committee, 1988). One hundred and twenty English words were randomly assigned to those symbol pairs for each subject, forming 120 pair combinations. All of the English words were chosen from the MRC Psycholinguistic Database, had familiarity ratings of 600 or higher, and had a word length between three and six letters (http://websites.psychology.uwa.edu.au/ school/MRCDatabase/uwa_ mrc.htm). All the Chinese characters were set in Kaishu typeface. The size of each character on the screen was 130 × 130 pixels, and the screen resolution was 1,280 × 800. The viewing distance was approximately 50 cm.

Procedure

Experiment 3 used the same procedure described in Experiment 1, with a few modifications, as noted here. Subjects studied five different lists, corresponding to the five different conditions denoted in Fig. 5. There were three rounds of study, followed by test for each of the five lists. The five lists were presented in a random order that changed for each subject on each round. After performing all study–test phases for the five conditions, the entire experiment was repeated with a totally different set of stimuli representing the same conditions. We refer to these as Block 1 and Block 2, with each block consisting of 15 study–test lists. There was a 1-minute break between each round and a 5-minute break between the two blocks. The entire experiment lasted approximately 70 minutes.

Fig. 5
figure 5

Example round consisting of a study and test phase for each of the five stimulus types. Altogether, there were two blocks, each consisting of three rounds

Results and discussion

A power analysis using the “pwr package” in R (Champely, 2016; Cohen, 1988) indicated that the achieved power was .864, with α set at .05, and a large effect size assumed (i.e., ηp2 =.14; Cohen, 1992).

Figure 6 plots the mean performance for the cued-recall tests as a function of language group and stimulus type for each round after collapsing over Block 1 and Block 2. Performance for each of the five stimulus types is plotted in the left panel for native Chinese speakers and in the right panel for native English speakers. The results showed a significant main effect of round, ΔAIC = −738, LLR χ2(1) = 740.16, p < .001, such that both groups performed more accurately in later rounds, regardless of stimulus type, as the combinations became more familiar. Because we are primarily interested in the interaction effects between pronunciation and complexity, further analyses did not include the round factor. Consistent with Experiment 1, Chinese speakers performed better in the Chinese conditions than English speakers, ΔAIC = −4, LLR χ2(1) = 6.627, p = .01. On the contrary, English speakers showed better performance in the Ethiopic control condition than Chinese speakers at a trend level, ΔAIC = −1, LLR χ2(1) = 3.011, p = .083.

Fig. 6
figure 6

Mean performance of two groups in Experiment 3 after collapsing over Block 1 and Block 2. Left panel plots accuracy for native Chinese speakers, and right panel plots accuracy for native English speakers. PE = Ethiopic pseudowords; HS = high-complexity, same pronunciation; LS = low-complexity, same pronunciation; HD = high-complexity, different pronunciation; LD = low-complexity, different pronunciation. Error bars represent 95% confidence intervals

Figure 7 shows the cued-recall performance of the two native language groups when the cues are Chinese characters, plotted as a function of pronunciation (x-axis) and complexity (separate lines for the two levels). Also plotted on the graph is performance for the Ethiopic pseudoword cues (triangles), which served as baseline stimuli for which neither group was familiar. First, we report the analyses that exclude the Ethiopic stimuli. As expected, there was a significant interaction between complexity and language group, ΔAIC = −32, LLR χ2(1) = 33.46, p < .001, such that native English speakers were more accurate for less complex cues, ΔAIC = −42.9, LLR χ2(1) = 44.863, p < .001, while Chinese speakers showed no effect of complexity, ΔAIC = 1.2, LLR χ2(1) = .851, p = .356. There was also a significant interaction between pronunciation and language group, ΔAIC = −19, LLR χ2(1) = 20.842, p < .001, such that Chinese speakers showed a difference in performance based on the pronunciation of the cues (performance was worse when the pronunciations were the same), ΔAIC = −14.5, LLR χ2(1) = 16.522, p < .001; in contrast, English speakers showed no effect of whether Chinese characters were homophones, ΔAIC = 1.8, LLR χ2(1) = .1416, p = .707. There was no significant two-way interaction between complexity and pronunciation on accuracy, ΔAIC = 1, LLR χ2(1) = 1.049, p = .306, nor a three-way interaction between language group, complexity, and pronunciation, ΔAIC = 2, LLR χ2(1) = .303, p = .582.

Fig. 7
figure 7

Mean recall accuracy of English words as a function of the type of cues paired with words on a given list for the two native language groups. Error bars represent 95% confidence intervals

To summarize, native English speakers’ performance was affected by the complexity of the cue stimuli, but not whether the pronunciation was the same. Conversely, performance of native Chinese speakers showed the opposite pattern: They were affected by whether the different cues shared the same pronunciation, but were unaffected by the complexity (number of strokes) of the stimuli. These results provide additional support for the results from Experiments 1: Complexity of Chinese characters only affects performance for those unfamiliar with Chinese characters.

The fact that native Chinese speakers were adversely affected when the stimuli were homophones while native English speakers were unaffected rules out the possibility that the weaker effect of complexity for native Chinese speakers was due to a ceiling effect caused by being better subjects. In fact, Chinese speakers did no better, and arguably worse, when the cues were from a third language unknown to either group (Ethiopic stimuli). In the General Discussion, we provide an explanation for why native English speakers did slightly better when the cues were from a language unfamiliar to both groups.

General discussion

It has long been known that high-complexity stimuli are harder to process, remember, and reproduce from memory than are low-complexity stimuli (Alvarez & Cavanagh, 2004; Attneave, 1957; Bradley, Hamby, Löw, & Lang, 2007; Eng et al., 2005; Song & Jiang, 2006). However, one limitation of the literature is that complexity is commonly regarded as an absolute quantity that is defined by the number of features and components, while subjective factors that modulate the effect of complexity, such as preexisting knowledge of the stimuli, have been neglected. Here, our results demonstrate that objective complexity does not tell the whole story. Our results provide cases in which objective complexity (here, defined as number of strokes) failed to make a difference. Moreover, whether or not one finds an effect of complexity on learning and memory is modulated by the familiarity of the stimuli. Specifically, the effects of complexity exist only when subjects are unfamiliar with the stimuli, and once the stimuli are highly familiar to subjects, the effects of complexity disappear completely. In Experiment 1, we found evidence that the effects of complexity on learning were eliminated when the stimuli were highly familiar (for native Chinese speakers), but not for those unfamiliar with the stimuli. Experiment 2 replicated the lack of a complexity effect for native Chinese speakers while ruling out the potential confound that more complex stimuli only appeared as easy as simpler stimuli because they could be vocalized to aid rehearsal. Experiment 3 provided a demonstration that the same set of Chinese stimulus cues could generate opposite patterns of difficulty for native English speakers compared with native Chinese speakers based on prior exposure to the stimuli. Specifically, Chinese speakers’ performance suffered when the cues were different pseudowords that were all pronounced the same way; however, their performance was unaffected by the complexity of the stimuli (as defined by number of strokes). Conversely, native English speakers’ performance suffered when the stimuli were more complex but were unaffected by the pronunciation of the stimuli.

It is no surprise that subjects who do not speak Chinese would not be affected by whether Chinese characters share the same pronunciation, since English speakers could not pronounce them in the first place. Nevertheless, it is useful to demonstrate that English speakers are less affected by one of the dimensions of the Chinese stimuli than are Chinese speakers given that they are more affected by another dimension of the stimuli. Although it may seem obvious that English speakers would be unaffected by features of the stimuli that they cannot perceive (the sounds of the characters), it is not obvious that Chinese speakers should be less affected by complexity than English speakers. In the latter case, all subjects can perceive the visual differences among stimuli. In fact, Chinese speakers’ performance was virtually unaffected when the stimuli involved greater complexity (more strokes) than simpler stimulus cues. English speakers were considerably worse when the stimuli were more complex.

One explanation for the difference in effects of complexity for the two groups is offered by the chunking theory of Miller (1956) and Simon (1974). Chunking theory posits that when information can be grouped or chunked into fewer units, the resulting stimuli are more easily processed. For example, C-A-T can be considered as three separate chunks consisting of three different letters (and each letter consisting of features such as lines and parts of circles). Alternatively, CAT can be considered as one chunk that has letters as constituent parts and refer to a feline household pet. It seems reasonable to assume that, with time, native Chinese speakers likewise group the features of a character into chunks.

While the work of Simon and Miller demonstrated that more stimuli can be recalled when the information can be chunked, it does not explain why English speakers performed better in the Ethiopic condition than did Chinese speakers,Footnote 1 given that Ethiopic pseudowords were equivalently novel to both groups. More recent work (e.g., Reder et al., 2016; Reder et al., 2007; Shen et al., 2018) has extended this theory and shown that more familiar chunks are easier to combine together (into, for example, paired associates) and also to associate these pairs of strong chunks with an additional arbitrary stimulus. According to this theory, the strength of the chunks reflects their familiarity. Furthermore, as chunks become stronger, they deplete fewer working memory resources, allowing more to be devoted to binding stimuli together in long-term memory.

According to this elaboration of the chunking theory (see also Popov & Reder, 2020), the native English speakers have stronger/more familiar chunks for the English response terms, given their greater experience with English words. That means that there would be slightly more working memory resources available for the binding process for native English speakers when the cues were equivalent in familiarity, as was the case when the cues were Ethiopic characters. That is, while both groups expend working memory resources to encode the Ethiopic characters and then to associate them with response terms, the processing and binding of the English response terms would consume less of the remaining resources for native English speakers, giving them a slight advantageFootnote 2 in binding.

More generally, the finding that familiarity acts as a reduction in complexity can be extended and used as an explanation for several other results. For example, there is an effect of number of syllables on working memory for pseudowords and low-frequency words, but not for high-frequency words, even when controlling for similarity and letter count (Ferrand, 2000; Ferrand & New, 2003; New, Ferrand, Pallier, & Brysbaert, 2006). This result may not seem intuitive and the cause is still debated (Juphard, Carbonnel, & Valdois, 2004), but it is predicted by our elaboration of the chunk theory. In that case, syllable count is the measure for complexity. Those effects are present for stimuli that are either novel or unfamiliar because they place greater demands on the limited resources of working memory; however, the effects disappear when stimuli are familiar and working memory resources are more plentiful (see Popov & Reder, 2020, for formal specifications of the theory.)

Other researchers have also reported situations in which familiarity with the stimuli removes the effects of complexity (Bethell-Fox & Shepard, 1988; Qian, Reinking, & Yang, 1994; Su & Samuels, 2010; Sun, Zimmer, & Fu, 2011). In addition, many studies on norms for pictures and icons have found a negative correlation between ratings of familiarity and ratings of complexity (Bonin, Peereman, Malardier, Méot, & Chalard, 2003; Cycowicz, Friedman, Rothstein, & Snodgrass, 1997; McDougall, Curry, & De Bruijn, 1999; Rossion & Pourtois, 2004; Sirois, Kremin, & Cohen, 2006; Snodgrass & Vanderwart, 1980), suggesting that the reduction of complexity is found in both performance and perception.

One potential concern with the current study is that the Chinese characters that formed pseudowords have semantic meanings in addition to pronunciations. One might wonder whether the meanings of the characters facilitated the memory for pseudowords even though the effect of pronunciation was controlled. However, based on a classic study by Zhang and Simon (1985), the effect of different meanings of Chinese homophones was very small compared with the problem of shared pronunciation. In addition, we further controlled the potential confounding of meaning of Chinese pseudowords by randomly combining two Chinese characters with the constraint that they did not inadvertently form a real word in Chinese. Thus, it is hard for subjects to form semantic representations of two random combined characters during the short presentation time (i.e., 3 seconds) for each pair.

In summary, the current study provides evidence that the effect of complexity on learning paired associates is modulated by familiarity with the stimuli. Moreover, the results support the argument that complexity is not an absolute property based on the number of visual elements, but rather is a relative property affected by one’s prior knowledge with the stimuli. Once the stimuli are highly familiar, the effects of complexity go away. In addition, by including the Ethiopic pseudowords, our findings also support the theory that ability to learn novel associations among stimuli is affected by the strength, as well as the number, of chunks involved in the association.