A classic result in cognitive psychology is that experts have an excellent memory for meaningful material taken from their domain of expertise, even when this material is presented only briefly. This result was originally uncovered by De Groot’s (1965) and Chase and Simon’s (1973) study of chess players, and later replicated in many domains, including sports, science, engineering, and games (see Ericsson, Charness, Feltovich, & Hoffman, 2006; Gobet, 2015, for overviews). Experts’ superiority has often been explained by the acquisition of high-level knowledge structures (Chi, Feltovich, & Glaser, 1981; Cooke, Atlas, Lane, & Berger, 1993; Holding & Pfau, 1985; Kalyuga, Ayres, Chandler, & Sweller, 2003; Patel & Groen, 1986), or the ability to process information holistically, unlike nonexperts who have to process it piecemeal or analytically (Curby, Glazek, & Gauthier, 2009; Dreyfus & Dreyfus, 1986; Richler, Wong, & Gauthier, 2011). High-level knowledge structures, such as schemata and verbal concepts, abstract from the detail of the material to memorize. For example, in chess, a complex position could be summarized by the description “an Italian opening, variation Giuoco Pianissimo, with White’s pressure on the white squares.” With holistic processing, it is assumed that the scene or object being perceived is not decomposed into simpler units, but is processed as a unified whole.

In a meta-analysis of 13 studies, Gobet and Simon (1996b) showed that, at least with chess, these explanations were not sufficient to explain experts’ superiority. They found that experts maintained some superiority with random positions, in which any high-level structure had been destroyed. With such positions, experts’ advantage cannot be explained by the use of high-level structures (by construction, these do not exist in random positions) nor by holistic processing (there is no whole to process after the location of pieces has been randomized). Gobet and Simon’s (1996b) result was predicted by computer simulations based on the mechanism of chunking (Gobet & Simon, 1996a). As proposed by Chase and Simon (1973), expertise in chess is acquired by learning, through practice and study, a large number of chunks, which are units of both perception and meaning; in chess, chunks consist of constellations of pieces occurring often together in masters’ games. Experts’ superiority with meaningful material (game positions in chess) is explained by their ability to rapidly identify patterns present on the board, and retrieve chunks from their long-term memory (LTM). As shown by the computer simulations, some patterns still occur, by chance, in random positions; as experts are more likely to notice them due to their large store of chunks, they can maintain some superiority. Importantly, this superiority is not an artefact of the specific kind of randomization used, as proposed by Vicente and Wang (1998), because it is maintained with positions obtained with different methods of randomization (Gobet & Waters, 2003; Waters & Gobet, 2008).

Gobet and Simon’s (1996b) result is important theoretically, as it can readily be explained by theories based on chunking, such as chunking theory (Chase & Simon, 1973) and template theory (Gobet & Simon, 1996c), but not by theories focusing on high-level representations or holistic processing. However, it is unknown whether this result generalizes to other domains of expertise beyond chess. Therefore, the aim of this study was to establish whether experts maintain some memory superiority with random stimuli in different domains of expertise. Support for this hypothesis would strongly corroborate theories based on chunking.

The present meta-analysis

The present meta-analysis aimed to evaluate two predictions of chunk-based theories on the recall of random material: (a) the positive correlation between expertise and performance in recalling domain-specific random material occurs regardless of the particular domain, and thus is not specific to chess, and (b) this skill effect is no more than moderate, because the number of meaningful chunks in unstructured material is heavily reduced after randomization. Thus, the skill effect is supposed to be relatively modest in size.

To test these two hypotheses, a systematic search of articles having used random material with experts and nonexperts was carried out, and an overall correlation expressing the relationship between expertise and the capacity of recalling random material was calculated. Then, a moderator analysis was run to evaluate whether the relationship between expertise and recall performance of random material was present in every domain. To evaluate the role of domain as moderator, the studies were categorized into five different domains: games, music, programming, sports, and others.

The prediction of chunk-based theories applies primarily with short presentation times, less than 8–10 seconds (time to create a new chunk in LTM; Gobet & Simon, 2000; Simon, 1969), where perceptual and short-term mechanisms dominate. As exposition time varies with the type of material to recall (e.g., seconds for game positions and music notes, minutes for computer programs), we also ran a moderator analysis to evaluate whether the exposition time affected the effect size. Exposition time is positively related to performance on the recall task with randomized chess positions (Gobet & Simon, 2000), but of course both novices and experts can take advantage of prolonged time to use alternative memory mechanisms (e.g., learning new chunks, semantics) and thus be able to recall more items.

Finally, because several studies reported only the direction of the effect (e.g., experts outperforming novices) without providing data sufficient to calculate an effect size, we also calculated—following the approach adopted in Gobet and Simon (1996b)—the probability of k occurrences of the skill effect out of the total number (n) of cases.

Method

Literature search

In line with the PRISMA statement (Moher, Liberati, Tetzlaff, & Altman, 2009), a systematic search strategy was used to find relevant studies (see Fig. 1 for a summary of the procedure). Using several combinations of the terms recall, random, scrambled, unstructured, shuffled, and meaningless, we searched ERIC, PsycInfo, Scopus, WorldCat, ProQuest Dissertation & Thesis databases, and Google Scholar to identify all the potential relevant studies. In addition, previous narrative reviews were examined, and we e-mailed researchers in the field (n = 13) asking them for unpublished studies and data. Finally, we performed citation searches for two publications: Chase and Simon (1973) and Gobet and Simon (1996b).

Fig. 1
figure 1

Flow diagram of the studies considered and ultimately included for the calculation of the binomial probability analysis and the meta-analysis

Inclusion/exclusion criteria

The studies were included according to the following seven criteria:

  1. 1.

    The domain of expertise studied did not entail training memory per se; for example, memory experts using mnemonics (e.g., in the digit-span task), were excluded.

  2. 2.

    A measure of performance in a recall task was collected.

  3. 3.

    Some kind of random material was used.

  4. 4.

    The task was performed by participants with different levels of expertise (e.g., years of practice, categories, or Elo points).

  5. 5.

    Novices were not totally unfamiliar with the material to recall.Footnote 1

  6. 6.

    The random material was obtained by shuffling all the elements (e.g., chess pieces, lines of programs) of structured material. No partially randomized material was considered.

  7. 7.

    The data presented were sufficient to establish the direction of the effect (e.g., experts better than novices) or, better, to calculate an effect size.

We found 45 studies conducted from January 1973 to March 2015 meeting the above criteria, including 1,401 participants, and 55 independent samples. These were included in a binomial distribution analysis. The 24 studies reporting sufficient data to calculate an effect size were included in a meta-analysis, and included 903 participants, 28 independent samples, and 28 effect sizesFootnote 2 (see Table 1).

Table 1 Summary of the 28 samples included in the meta-analysis

Effect sizes

As a measure of effect size, we used the correlation between expertise in a domain and performance in recalling random material. Two studies reported a correlation coefficient, which we used. When group-level comparisons (e.g., novices vs. experts) were reported (k = 26), we converted Cohen’s dsFootnote 3 to point biserial correlation (Schmidt & Hunter, 2015). Artificial dichotomization was corrected for the effect sizes extrapolated from group-level comparisons only in chess studies, because only the field of chess—among the ones considered in the present meta-analysis—has a continuous variable assessing expertise (Elo, 1978).

Moderators

The two potential moderators were as follows:

  1. 1.

    Domain (categorical variable): This variable includes games, music, programming, sports and others.

  2. 2.

    Time of exposition (dichotomous variable): The time of exposition (in seconds) to the material to recall was more than 8 seconds or less or equal than 8 seconds.

Results

Meta-analysis

A random model (k = 28) was built to calculate the overall correlation. The overall correlation was \( \overline{r} \) = .41, 95 % CI [.29; .51], p < .001 (see Fig. 2). The degree of heterogeneity between effect sizes was I 2 = 63.06, suggesting potential moderator effects.

Fig. 2
figure 2

Overall correlation (\( \overline{r} \)) for skill effect in recalling random material. Correlation coefficients (circles) and 95 % CIs (lines) are displayed for all effects entered into the meta-analysis. The diamond at bottom represents the meta-analytically weighted correlation coefficient (\( \overline{r} \)). For studies with multiple independent samples, the result of each sample (S1, S2, etc.) is reported separately

Moderator analyses

We ran a moderator analysis to evaluate Domain as potential moderator. Domain was a marginally significant moderator, Q(4) = 8.69, p = .069, k = 28. The correlations were \( \overline{r} \) = .42, 95 % CI [.25, .56], p < .001, k = 11, for games; \( \overline{r} \) = .69, 95 % CI [.35, .86], p < .001, k = 5, for music; \( \overline{r} \) = .36, 95 % CI [.21, .49], p < .001, k = 6, for programming; \( \overline{r} \) = .30, 95 % CI [-.04, .58], p = .083, k = 3, for sports; and \( \overline{r} \) = .24, 95 % CI [-.09, .52], p = .154, k = 3, for other domains. The music-related correlation was slightly superior to the other four overall correlations (b = 0.62, z = 2.85, p = .004).

We also performed a moderator analysis to test whether Time of exposition significantly affected the effect sizes. No significant effect was found, Q(1) = 0.89, p = .346, k = 24.

Publication bias

Publication bias occurs when experiments showing weak results are systematically excluded from the literature when the sample sizes are small. To test whether our results were affected by publication bias, we created a funnel plot depicting the relation between Fisher’s Z and standard error and performed Duval and Tweedie’s (2000) trim-and-fill analysis.

The funnel plot depicting the relationship between Standard Error and Fisher’s Z value looked asymmetrical. The trim-and-fill analysis showed the presence of publication bias. Eight studies were trimmed and the estimate overall correlation was \( \overline{r} \) = .29, 95 % CI [.17, .41]. The funnel plot including both the studies in this meta-analysis and the filled in ones is shown in Fig. 3. Finally, the fail-safe N—that is, the number of missing studies with effect equal to zero necessary to make the observed effect (\( \overline{r} \) = .41) nonsignificant (p > .05)—was calculated, and found to be 745.

Fig. 3
figure 3

Funnel plot of standard errors and effect sizes (Fischer’s Z). The white circles represent the studies included in the meta-analysis and the black circles represent the filled-in studies. The white diamond indicates the overall correlation estimated from the studies included in the meta-analysis, and the black diamond indicates the overall correlation estimated by the trim-and-fill analysis

Sensitivity analysis

Two studies included in the meta-analysis presented some methodological issues. As mentioned earlier, Sloboda’s (1976) first experiment included an unspecified number of participants with no music training in the novice group. This condition partly violates one of the inclusion criteria. Also, in Knecht (2003), the novice group did not correctly recall any item (i.e., mean = 0). Although this condition did not violate any of the inclusion criteria, such an unusually poor performance in the novice group might have inflated the effect size.

A sensitivity analysis was thus performed to test the robustness of the results by excluding the two effect sizes. A random model (k = 26) was built to calculate the overall correlation. The overall correlation was \( \overline{r} \) = .39, 95 % CI [.28, .50], p < .001, I 2 = 64.15. Regarding publication bias, the point estimate was \( \overline{r} \) = .28, 95 % CI [.15, .40], with seven effect missing sizes filled in left of the mean.

No significant effect was found for either of the two moderators (p = .116 and p = .420 for Domain and Time of exposition, respectively). The music-related correlation was still slightly superior to the other four overall correlations (b = 0.74, z = 2.56, p = .010).

Binomial distribution analysis

The 45 studies included 55 experiments; in 49 cases, the experts outperformed the novices (for a list of the articles, see Table 2 in Appendix A and Appendix B). Assuming a binomial distribution with a probability of success (i.e., experts performing better than the novices) of .50, n = 55, and k = 49, the probability of obtaining at least 49 successes out of 55 is p = 9.11 × 10-10.

Discussion

The results presented in our meta-analysis suggest that experts keep an advantage even when they recall random material; this skill effect is not limited to one specific domain (e.g., chess), but is common to nearly every kind of material, with only sports and “other domains” failing to reach statistical significance. In addition, the overall correlation was significant but no more than moderate.Footnote 4 This outcome corroborates the hypothesis according to which human memory mechanisms are in part based on small memory structures (such as chunks), which are stored in LTM (Chase & Simon, 1973; Gobet & Simon, 2000). Experts—who have access to many more of these structures than do novices—are more likely to recognize the patterns that accidentally emerge after domain-specific material is randomized. As previously mentioned, theories of expert memory focusing on high-level structures or holistic processing of stimuli (e.g. Holding & Pfau, 1985; Dreyfus & Dreyfus, 1986) cannot explain this result, because the structures they postulate cannot be used with random stimuli.

Moderator effects

The moderator analysis showed that the skill effect was more than moderate only with musicians (\( \overline{r} \) = .69). This seems to be an empirical anomaly. As suggested by Gobet and Waters (2003) and Knecht (2003), the skill effect is inversely related to the degree of randomness of the material to recall, and it is reasonable to assume that music-related materials used in recall tasks had a lower degree of randomness. For example, the task used in Sloboda’s (1976) experiments consisted of recalling only five notes presented inside a musical stave, with nine possible positions (five lines and four spaces) for every note. Therefore, the number of possible combinations that could have been obtained by randomizing those musical notes was far lesser than—for instance—random chess positions, which usually contained 20–25 pieces placed on 64 possible squares. Thus, the greater skill effect in the domain of music was probably due to the low degree of randomness of the material used in the studies included in the meta-analysis, and not to some other feature specific to the field of music.

Finally, the time of exposition of the stimuli exerted no significant influence on the effect sizes. This outcome suggests that no other memory mechanism—such as encoding new chunks or using semantics—was uniquely used by the experts during the recall of the unstructured material. It is likely that additional time allows both novices and experts to learn new long-term memory chunks (e.g., Gobet & Simon, 2000). Consistent with this hypothesis, the effect was stronger in musicians (\( \overline{r} \) = .69) and games players (\( \overline{r} \) = .42)—who were exposed just for a few seconds to the stimuli—than in programmers (\( \overline{r} \) = .36)—who had up to 10 minutes.

It is worth noting that the lack of effect for the presentation time moderator is different from what Gobet and Simon (2000) found: with random positions, the slope of recall increase was slightly larger for masters than for candidate masters and Class A players, a result that was accurately simulated by their computer model. However, the differences were small, as indicated by the parameter c in their Tables 1 and 3. Whether this result generalizes to other domains should be investigated in further experiments, where the presentation time of the stimuli is systematically varied. In the current meta-analysis, the presentation time is confounded with domain.

Limitations of the study

The present meta-analysis has four limitations that merit discussion. First, the total number of studies (N = 24) and participants (N = 903) was relatively small. As a consequence, it was not possible to carry out moderator analyses on variables such as age, gender, or expertise level. However, we note that the skill effect was present in nearly all the studies excluded for not providing enough data to calculate an effect size. Those studies often reported not only that randomization reduced the skill effect in the recall task but also that experts kept a small advantage over novices when recalling random material, which is in line with our main analysis.

Second, and linked to the first limitation, the presence of publication bias suggests that the overall correlation we calculated (\( \overline{r} \) = .41) is probably an overestimation. Nonetheless, the value estimated by the trim-and-fill analysis (\( \overline{r} \) = .29) is still statistically significant, and both values suggest that the skill effect in recalling random material is significant, but at best moderate, a result consistent with the chunking hypothesis. Moreover, the high number (N = 745) of studies estimated by the fail-safe analysis and the low probability (p = 9.11 × 10-10) estimated by the binomial analysis suggest that the skill effect we found is a genuine result.

Third, the randomization methods varied from domain to domain, most likely a necessity as they depend on the structure of a specific domain. In addition, different methods can be used in a single domain. Although this weakness was unavoidable, further research should systematically investigate different methods of randomization in a domain, testing the predictions of a formal model. For example, most studies on chess memory followed Chase and Simon’s method, where pieces from a game position are randomly reassigned to a new square. Gobet and Waters (2003) and Waters and Gobet (2008) explored different methods, including selecting pieces with the same probability, and used the empirical data to test the prediction of CHREST, a chunked-based model.

Finally, we could not correct for measurement error because only five studies provided reliability coefficients for the recall tasks. Moreover, among the domains considered in our meta-analysis, only chess (to the best of our knowledge) uses a rating system (Elo, 1978), whose reliability coefficient has been calculated (r = .91; Hambrick et al., 2014). In any case, this limitation does not invalidate the main outcome of the meta-analysis, which is that a moderate skill effect in the recall task still remains even with random material, and that this phenomenon applies to nearly every domain considered.

Conclusions

The results presented in this meta-analysis show that a skill effect occurs in recall tasks even when the domain-specific material to recall is unstructured. This outcome lends support to the hypothesis according to which human memory mechanisms are in part based on small memory structures such as chunks. Larger, schema-like structures are gradually built on chunks as a function of the exposure to frequent objects and scenes in the environment (Gobet & Chassy, 2009; Gobet & Simon, 1996c). Conversely, theories of expert memory based only on high-level knowledge structures such as schemata or holistic processing cannot explain a skill effect in recalling random material.

One possible alternative explanation is that experts try to recall more items (e.g., chess pieces, music notes) than do novices. To test this hypothesis, Charness and Schultetus (1999) analyzed the performance of chess players in the recall task and controlled for errors of commissions (pieces placed incorrectly). The results showed that the expert chess players (i.e., Elo rating >1999) still outperformed the group of novices.

Another alternative explanation is that experts outperform novices because of their superior working memory (WM; Meinz & Hambrick, 2010). Because individuals whose WM capacity is greater are more likely to acquire expertise in their field, the skill effect we observed might be due to experts’ superior ability to retain elements in WM, and not necessarily to experts’ vaster amount of chunks stored in their LTM. Although further research is needed to test this and other alternative explanations, the greater average age of the experts compared to novices in many of the reviewed studies (e.g., Barfield, 1986; Guerin & Matthews, 1990; Kalakoski & Saariluoma, 2001; Knecht, 2003; Sloboda, 1976) militates against it. The acquisition of expertise is a relatively long process, and thus experts tend to be older than novices (Lehman, 1953). Because WM efficiency decreases as a function of age (Birren & Schaie, 1996), it is unlikely that experts’ advantage at recalling random material is only due to WM ability. Consistent with this hypothesis, a recent meta-analysis (Moxley & Charness, 2013) has shown that performance on the recall of chess positions is negatively associated with age, but positively associated with chess skill. Therefore, experts’ superior ability to recognize small chunks occurring by chance in random material is the most likely explanation.