Introduction

The relevance of category learning for humankind

The ability to accurately classify persons, events, and objects into categories is central to adaptive human behavior (e.g., Ashby & Maddox, 2005). To illustrate, categorizing a wild plant as edible typically leads to behavior different from that following upon categorization as poisonous, with possibly devastating consequences of incorrect classification. Categories also support pivotal processes including comprehension, learning, inference generation, explanations as well as language and communication (for an overview, see Medin & Rips, 2005). Hence, learning new categories can be considered a fundamental skill (Kruschke, 2005). Given the importance of categories for everyday behavior, it is vital to identify groups of people for whom category acquisition poses challenges in order that adequate learning support can be provided. Autistic individuals may be such a group as suggested by theoretical accounts of autism (Mottron et al., 2006; O’Riordan & Plaisted, 2001; Pellicano & Burr, 2012; van Boxtel & Lu, 2013; Van de Cruys et al., 2014). According to ICD-11 criteria (World Health Organization, 2019), autism spectrum disorder is a neurodevelopmental disorder defined by difficulties in social interaction and communication as well as restrictive, repetitive, and inflexible behaviors and interests (for other conceptualizations of autism, see Milton, 2019). In the present article, we report the first comprehensive meta-analysis of category learning in autistic individuals.

Theoretical models of autism and category learning

Several theoretical models predict that autistic individuals differ in category learning from nonautistic individuals: The enhanced discrimination hypothesis (O’Riordan & Plaisted, 2001) postulates that autistic individuals compared with nonautistic individuals have an advanced ability to discriminate between perceptual stimuli. Hence, when learning new categories, autistic individuals can be expected to build categories with more narrow bounds and less variable category members than nonautistic individuals. This could be an advantage when homogeneous categories (e.g., the category of song birds) are to be learned, but problematic when the category of interest is diverse (e.g., the category of vertebrates).

The enhanced perceptual functioning model (Mottron et al., 2006) assumes that autistic persons differ from nonautistic persons through more fine-grained discrimination in low-level perception, superior pattern recognition in medium-level perception, and greater independence of lower-level perception from processes of top-down control. Similar to the enhanced discrimination hypothesis (O’Riordan & Plaisted, 2001), a possible prediction of this model anticipates disadvantages for learning diverse categories in addition to categories whose acquisition requires top-down control. The latter may be exemplified by categories for which perceptual resemblance to in fact conceptually distinct categories is a misleading cue for category membership. For example, whales are perceptually similar to fish, yet belong to the category of mammals.

Furthermore, the HIPPEA (high, inflexible precision of prediction errors in autism; Van de Cruys et al., 2014; see also the predictive coding theory of autism: Pellicano & Burr, 2012; van Boxtel & Lu, 2013) model conceptualizes autism as a condition of exceptional information processing. It is argued that humans navigate the world based on predictions. If prediction errors occur, it is essential to recognize which errors are serious and need to be learned from and which errors are not essential and can or even need to be ignored to facilitate abstract inferences. The HIPPEA model posits that autistic individuals give inflexibly high weight to prediction errors without sufficient distinction between critical and negligible mistakes. Hence, when learning new categories, this oversensitivity to prediction errors might make it difficult to identify stimulus features or relations between features that specify category membership. This should result in general disadvantages in category acquisition for autistic compared with nonautistic individuals. Overall, difficulties in acquiring at least some types of categories are proposed by the above-mentioned models of autism (for an overview of further theoretical approaches, see Mercado et al., 2020). To decide whether this assumption is warranted, a thorough review of the empirical evidence would be desirable.

Previous reviews of category learning in autistic individuals and open questions

Three review articles relevant to the category learning skills of autistic individuals have been published in recent years: First, Patry and Horn (2019) summarized the evidence regarding prototype formation, categorization, and schema development in autistic individuals based on a systematic review of 23 studies published between 1980 and 2018. Second, Mercado et al. (2020) focused their narrative review on autism-related acquisition of perceptual categories and how it affects cognitive development and social symptoms of autism. Third, Vanpaemel and Bayer (2021) suggested in a narrative review of prototype-based category learning in autistic individuals that the heterogeneity in findings could be related to differences in research methodology.

Even though all of these narrative reviews (Mercado et al., 2020; Patry & Horn, 2019; Vanpaemel & Bayer, 2021) make an important contribution to research on category learning in autistic individuals, so far, a comprehensive and quantitative overview of this topic is still lacking. This is because the reviews either focused on how one type of mental representation is built up, in particular by looking at prototype formation (see Patry & Horn, 2019; Vanpaemel & Bayer, 2021), or because a single type of categories, namely perceptual categories, was taken into account (see Mercado et al., 2020). A thorough synthesis of the literature should ideally cover all kinds of mental representations, types of categories, and category learning tasks, as well as address heterogeneity between studies.

A great deal of research into categorization has focused on the question how categories are stored within the human mind (Medin & Rips, 2005). In short, the current evidence base does not suggest that humans rely on a single, but instead on several types of mental representation, including a mental representation in terms of rules that specify necessary and sufficient conditions for all category members (e.g., Bruner et al., 1956), as bunches of characteristic features whose average constitutes the prototype (e.g., Rosch & Mervis, 1975), or as individual instances, or exemplars, each attached with a category label (e.g., Brooks, 1978). Thus, a comprehensive account of category learning should incorporate more than one of type of mental representation, that is, go beyond prototype formation—the focus of Patry and Horn’s (2019) and Vanpaemel and Bayer’s (2021) syntheses.

Another line of research addresses the question whether there are different types of categories. A classification of categories widely acknowledged in categorization research is the distinction between isolated (e.g., Bott et al., 2006) and interrelated (e.g., Hetzroni & Shalahevich, 2018) concepts (Medin et al., 2000; a concept can be understood as the mental representation of a category; Markman & Ross, 2003). According to Goldstone (1996), along the isolated-interrelated continuum a category can be said to be the more interrelated the more it is affected by other categories. For instance, color is an example of an isolated category since it can be considered independently from other categories (see Shu et al., 2001). In contrast, the function of an object provides an example of interrelated categories since it is defined by the use of an object for a specific purpose and so cannot be regarded in isolation (see Field et al., 2016). In sum, a comprehensive look at category learning may benefit from addressing more than one type of category, for instance, by covering both isolated and interrelated categories. As opposed to this, the review by Mercado et al. (2020) is confined to perceptual categories. Our literature search suggests that so far no study has directly contrasted the processing of isolated versus interrelated categories in autistic persons. However, individual studies within autism research provide information as to whether isolated or interrelated categories were investigated. Hence, in the present meta-analysis we included type of category (isolated vs. interrelated) as moderator variable varying between studies.

Moderator analyses as the one implemented for type of category present a specific advantage of meta-analyses since they afford the possibility to not only quantitatively synthesize findings but to also account for heterogeneity between studies, a known characteristic of empirical research on category learning in autistic individuals (e.g., Dovgopoly & Mercado, 2013). Based on empirical findings, several further variables suggest themselves as potential moderators: An important determinant of category learning is the type of task used within research. Ashby and Maddox (2005) differentiate between rule-based, information-integration, and prototype distortion tasks (they also mention the weather prediction task; however, as our literature search suggests that this task type has not been used in autism research, we will not elaborate on it). In rule-based tasks, categories can be acquired through deliberate reasoning. Typically, the ideal classification or learning strategy can easily be verbalized (Ashby et al., 1998). In the Wisconsin Card Sorting Test (WCST; used by, e.g., Park et al., 2014, see also Table 2), for instance, four pictures resembling playing cards are presented. The cards depict geometric shapes varying in color, shape, and number. Trial by trial, participants need to assign an additional card to only one of the four stimulus cards, although frequently more than one match, each on a different dimension, would be possible (e.g., the sorting card matches one of the stimulus cards in shape color, and another stimulus card in shape number). Feedback informs about the accuracy of each sort. This supports a rule-based approach in which participants systematically try out the different sorting dimensions until they can establish a link between sorting dimension and accuracy. This strategy is typically easy to articulate verbally. In contrast, optimal performance in information-integration category learning tasks is achieved when information from at least two stimulus aspects or dimensions is integrated before a task-relevant decision is made (Ashby & Gott, 1988). Usually, the strategy leading to ideal performance is difficult to describe verbally or cannot be verbalized at all (Ashby et al., 1998). In the study by Plaisted et al. (1998), for example, participants had to learn how to accurately assign eight circle configurations to one of two categories. Each of these categories was defined by two rules, one of which stated that the position of certain circles was fixed, whereas the other rule indicated that the position of the remaining circles was irrelevant for category membership. Optimal performance in this case benefits from predecisional information integration and may hardly be verbalizable (there is also evidence to suggest that articulability supports category acquisition; see Zettersten & Lupyan, 2020). Finally, the categories to be learned within prototype distortion tasks are devised by first producing a stimulus that serves as the prototype (Posner & Keele, 1968, 1970). Subsequently, the remaining category members are constructed by randomly distorting the prototype. An example of this task type was used by Gastgeb et al. (2009): Participants were presented with line-drawn faces that differed from a prototypical face by variations in face length, nose length, nose width, and interocular distance. The relation between type of category and type of category learning task has not received a great deal of attention within research. Hence, looking at type of category and sort of category learning task as separate factors seems justifiable.

Furthermore, two different types of dependent measures are commonly used in research on category learning, namely accuracy (e.g., Potrzeba et al., 2015) and response time (e.g., Rumsey, 1985). It seems therefore worth exploring whether the difference between autistic and nonautistic individuals varies depending on these types of dependent measure.

A set of further potential moderators is related to participant characteristics: language of participants in view of well-known relationships between category formation, concepts, and language (Bowerman & Levinson 2001; Hahn & Cantrell, 2012; Sloutsky & Deng, 2019), age of autistic individuals, since category learning is likely to develop over many years (Patry & Horn, 2019), general cognitive ability, which has been proposed as a factor underlying interindividual differences in category learning, also in autistic individuals (see Dovgopoly & Mercado, 2013), and percentage of male research participants given that autism is more frequent in males than females (Zeidan et al., 2022) so that an unrepresentative distribution of this variable could obliterate the specific characteristics of category learning in autistic individuals. Furthermore, though not a participant characteristic, the year in which studies are published could act as a moderator variable in light of an increase in measured prevalence of autism over the last decades (Zeidan et al., 2022).

Finally, heterogeneity between studies may go back to differences in research methodology. This could be picked up by study quality, for example, by way of validating the ASD diagnosis (Desaunay et al., 2020).

The present meta-analysis

We carried out a meta-analysis on category learning in autistic individuals across all kinds of mental representations and all types of categories to give a thorough overview and to explain the earlier observed between-study heterogeneity.Footnote 1 Our primary research question concerned whether and to what extent autistic individuals differ in category learning from nonautistic individuals. Based on earlier reviews (Mercado et al., 2020; Patry & Horn, 2019; Vanpaemel & Bayer, 2021), we predicted that autistic persons would show lower performance levels than nonautistic persons. In addition, we aimed to identify moderator variables accounting for variability between studies. To this end, the following variables were considered: type of category (isolated vs. interrelated), type of category learning task (information-integration vs. prototype distortion vs. rule-based), type of dependent measure (accuracy vs. response time), study language, age of autistic individuals, type of control group (matched vs. not matched on IQ), IQ of autistic individuals, percentage of male research participants, year of publication, and risk of bias (validation via Autism Diagnostic Interview - Revised (ADI-R) and Autism Diagnostic Observation Schedule (ADOS) vs. validation via ADI-R or ADOS vs. other).

Regarding study design, we did not include investigations using a single-subject design due to associated limitations in quantitative data analysis that can impede meta-analytical synthesis (Sandbank et al., 2020) and studies lacking a comparison group of nonautistic individuals since these are restricted with regards to autism-specific assertions. Studies were integrated based on effect sizes in terms of the standardized mean difference, as Hedges’ g. Heterogeneity was determined by means of χ2 (Q) and Higgins I2 tests. There was no protocol for this synthesis.

Materials and method

Transparency and openness

We adhered to the MARS guidelines for meta-analytic reporting (Appelbaum et al., 2018). All meta-analytic data, analysis code, and research materials (including our coding scheme) are available online (https://osf.io/gtj2p/). This review project was not preregistered.

Selection criteria

We selected studies with the following inclusion criteria:

  1. 1.

    The study was published in 1970 or later.

  2. 2.

    The full text is written in English.

  3. 3.

    The study compared at least one group of autistic individuals with at least one group of nonautistic individuals.

  4. 4.

    The study investigated category learning.

The only exclusion criterion we applied was:

  1. 1.

    The study implemented a single-subject design.

Reports that did not contain sufficient information to judge eligibility were excluded from analysis. If reports did not include sufficient information for analysis, authors with findable current contact details were contacted. In case the necessary details could still not be obtained, reports were excluded.

Search strategy

In order to detect all studies of interest, we performed a literature search using ERIC (via Institute of Education Sciences), PsycInfo (via EBSCOhost), MEDLINE (via PubMed), and Web of Science (via Clarivate). The following expression of search terms was used for each database, with no specific search fields selected: (autis* OR asd OR pdd OR asperg*) AND (categ* OR concept* OR prototype OR schema OR script). This database search was carried out on January 5, 2021, and updated on January 13, 2022. In addition, we conducted a manual search and inspected the reference lists of the reviews by Patry and Horn (2019), Mercado et al. (2020), and Vanpaemel and Bayer (2021) as well as the reference lists of the 10 most recent included articles retrieved from the database search. For five reports where full texts were not accessible, authors were emailed. Two of them responded and supplied their full text. In addition, 14 authors were contacted since full text reports did not state all statistics needed for effect size calculation. Of these, three researchers replied, with one providing the requested details.

Data extraction

A group of five researchers (three of the authors plus two research assistants) screened titles and abstracts. Each of these researchers initially screened a distinct subset of the database hits. To further assess eligibility of the entries remaining after screening, each full text was read by at least one researcher. Ambiguous cases that could not be excluded with certainty were reassessed by at least one other researcher, and conflicting assessments were resolved through mutual discussion. The final set of articles used for analyses was approved by all authors. Data extraction was split up between three authors. The following information was excerpted: author names and year of publication, types of participant groups, sample size, participant age in years (mean and standard deviation), mental age in years (mean and standard deviation), mean and standard deviation of IQ, secondary disorder, number of male and female participants, type of category under investigation (isolated vs. interrelated), language in which the study was carried out, task used to assess the central dependent variable including its type and the dependent measures derived from it, results (means and standard deviations, or, if these were not provided, alternative statistics). Risk of bias was considered in terms of diagnostic validation (similarly to Desaunay et al., 2020: validation via ADI-R and ADOS [regarded as low risk of bias] vs. validation via ADI-R or ADOS [regarded as medium risk of bias] vs. other [regarded as high risk of bias]). The coding scheme is available online (https://osf.io/gtj2p/). Data extraction and risk of bias ratings were double-checked by the second author. The included studies are summarized in Table 1.

Table 1 Characteristics of studies included in the meta-analysis

Coding of moderator variables

Age of autistic participants was coded as the group mean of chronological age in years and implemented as continuous moderator. Similarly, IQ of autistic participants mirrored the continuous group mean of intelligence in units of the IQ scale. Year of publication was incorporated as another continuous predictor in units of whole years. Risk of bias was included as categorical moderator; levels were low, medium, and high (see data extraction). Type of control group was employed as categorical moderator distinguishing between nonautistic groups matched vs. not matched on IQ. A group was deemed to be matched if the original study reported it as such. If only mental age—but not IQ—was reported, we utilized mental age as a proxy of IQ. This means that we considered groups to be matched on IQ if they were reported to be matched on mental age. However, studies reporting only the mental age but not the IQ of their participants were excluded from the moderator analysis on IQ. Type of task was incorporated as a categorical moderator with levels being information-integration, prototype distortion, and rule-based. The strategy leading to optimal performance was easy to verbalize in rule-based tasks and hard or impossible to verbalize in information-integration tasks. Prototype distortion tasks manifested themselves in the way the stimulus material was constructed, namely through first establishing a prototype and subsequently creating random distortions of the prototype. Type of dependent measure was employed as a categorical moderator comparing accuracy with response time. Measures were considered to represent accuracy if the correctness of responses was at their core, whereas measures coded as response time were about the speed with which a response was achieved. Study language was coded as a categorical moderator. Within moderator analyses, a language was included only if it occurred across a minimum of five effect sizes. In line with this, languages covered by less than five effect sizes were excluded from the moderator analysis on study language. Percentage of male participants was modelled as continuous moderator reflecting the proportion of males within autistic groups.

Type of category was considered in terms of the isolated vs. interrelated distinction (Goldstone, 1996; see Introduction). A category was deemed isolated when looking at one or several features in isolation was sufficient to determine membership (e.g., Froehlich, 2008, Experiment 3). In contrast, when classification necessitated looking at relations between stimulus components, a category was considered interrelated (e.g., Hetzroni & Shalahevich, 2018). In line with the view that the isolated-interrelation distinction is continuous rather than categorical, there were ambiguous cases. Here, we checked whether the majority of relevant features could be regarded in isolation—in that case, the category was termed isolated—or required the inclusion of relations with other features to arrive at correct categorization—this presented an instance of interrelated categories. For example, the face stimuli used by Gastgeb et al. (2009) varied in terms of face length, nose length, nose width, and distance between eyes. Since only distance between eyes is a relational feature and the remaining face properties could be considered in isolation, we deemed this study to investigate isolated categories. See Fig. 1 for an illustration of the distinction between isolated and interrelated categories.

Fig. 1
figure 1

Examples of the distinction between isolated and interrelated categories. Note. A = Category is based on one feature: “Rogs” have lines and “Zips” have dots (Froehlich, 2008; with kind permission of Alyson Froehlich). B = Category is based on features that can be regarded in isolation: face length, nose length, nose width, distance between eyes (Gastgeb et al., 2009; with kind permission of John Wiley and Sons). C = Category is determined by a relation; perceptual similarities like shape are not diagnostic (Hetzroni & Shalahevich, 2018; with kind permission of Springer Nature)

Statistical analysis

Effect sizes were calculated in terms of the standardized mean difference, as Hedges’ g, using formulas provided in Borenstein (2009) and Lipsey and Wilson (2001), Comprehensive Meta-Analysis software (Biostat Inc.), the online calculator provided by the Campbell Collaboration (https://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-SMD4.php), and the esc package in R (R Version 4.0.3). All further meta-analytical steps were carried out using the R metafor package (Viechtbauer, 2010). Meta-analytical code can be accessed online (https://osf.io/gtj2p/).

Effect sizes were weighted by their inverse variance. Negative effect sizes indicated lower-level performance of the autistic group compared with the nonautistic group. Imprecision was estimated through 95% confidence intervals. For each of the effect sizes, we checked data for dependency. Two types of statistical dependency were identified: First, the same participants were included in more than one pairwise group comparison, and second, more than one dependent measure was assessed in the same participants. To account for dependency, we implemented a random effects multilevel meta-analysis (Hox et al., 2002). Individual effect sizes constituted Level 1; these were modelled to be nested within dependent comparisons, which represented Level 2. Dependent comparisons in turn were considered to be nested within independent studies constituting Level 3. We determined heterogeneity by means of χ2 (Q) and Higgins I2 tests. An influence analysis was conducted based on hat values and Cook’s distance (Belsley et al., 1980; Cook, 1977). Potential moderators in terms of risk of bias, type of control group, type of category, study language, type of task, type of dependent measure, age of autistic group, IQ of autistic group, year of publication, and percentage of male participants within autistic group were investigated using a series of meta-regression models. Publication bias was assessed through a funnel plot and Egger’s test (Egger et al., 1997; Light & Pillemer, 1984). We adopted the standard 5% significance level for all inferential tests.

Results

Number of studies

Our literature search detected 26,459 database entries on January 5, 2021, and 2,218 additional documents published between January 5, 2021 and January 13, 2022 (see Fig. 2). After removing duplicates, 19,201 records were left for further inspection. Screening of titles and abstracts led to the exclusion of 18,608 documents that did not meet inclusion criteria, and a further 15 records for which abstracts were not accessible. Full texts were sought for the remaining 578 database hits. Out of these documents, six were excluded since they were residual duplicates; 444 were excluded because they did not deal with the acquisition of categories/prototypes/concepts/schemata; 13 records were excluded since full texts were not available; 19 were excluded because they did not involve a group comparison of autistic individuals with controls; 18 titles were excluded because they did not report original research; 12 were excluded because they reported a single-subject design; 13 were excluded due to missing statistics; and seven records were excluded since the full text was not written in English. In addition, four titles were identified via manual search/search of reference lists. Overall, 50 records, which provided 50 statistically independent comparisons and 112 effect sizes, were included in our meta-analysis.

Fig. 2
figure 2

PRISMA flowchart of the literature search

Characteristics of studies and samples

The current pool of studies involved 1,220 autistic and 1,445 nonautistic participants (for a study overview, see Table 1). For individual studies, sample sizes of autistic groups varied between n = 7 (learner-driven condition; McGregor & Bean, 2012) and n = 90 (Minshew et al., 2002), interquartile range (IQR): [16, 27.5], whilst those of nonautistic groups varied between n = 10 (Hoffmann & Prior, 1982) and n = 107 (Minshew et al., 2002), IQR: [16, 29.5]. Mean chronological ages of autistic groups ranged from 2.73 years (Potrzeba et al., 2015) to 49.00 years (Powell, 2016/2017), IQR: [9.70, 23.26], whereas those of nonautistic groups ranged from 1.69 years (Potrzeba et al., 2015) to 48.70 years (Powell, 2016/2017), IQR: [10.10, 24.50]. Regarding participant gender, percentage of male volunteers in autistic groups varied between 37.50% (Maule et al., 2017) and 100.00% (for instance, Froehlich, 2008; Froehlich et al., 2012; Gastgeb et al., 2011, 2012; Hartley & Allen, 2014; Kaland et al., 2008; Meyer, 2014; Molesworth et al., 2015; Rumsey, 1985; Vladusich et al., 2010), IQR: [83.33%, 96.15%]; percentage of male respondents in nonautistic groups ranged from 35.29% (Bott et al., 2006) to 100.00% (for instance, Froehlich, 2008; Froehlich et al., 2012; Gastgeb et al., 2011, 2012; Kado et al., 2020; Kaland et al., 2008; Molesworth et al., 2015; Rumsey, 1985; Vladusich et al., 2010), IQR: [73.30%, 95.20%]. For 75% of comparisons, participant groups were matched on IQ, leaving 25% of comparisons for which groups were not matched on IQ. IQ in autistic groups ranged from 80.00 (Shu et al., 2001) to 114.22 (Meyer, 2014), IQR: [99.16, 108.74]. In terms of the type of categories, 80% of studies addressed isolated categories, whereas only 20% examined interrelated categories. Looking at the type of tasks, 28.57% of group comparisons concerned information-integration tasks, 15.18% addressed prototype distortion tasks, and 56.25% referred to rule-based tasks. Table 2 gives an overview of the tasks used within the seven most influential studies based on the nine most highly weighted effect sizes. Risk of bias was low in 30% of studies, medium in a further 30% of studies, and high in 40% of studies. Category learning as dependent measure was indexed by either accuracy (72% of group comparisons) or response time (28% of group comparisons). Accuracy was typically utilized as absolute (e.g., Powell, 2016) or relative (e.g., Sapey-Triomphe et al., 2018) number of correct responses, or defined ex negativo as number of errors (e.g., Williams et al., 2015). Response time was reflected by the number of trials needed to achieve a certain accuracy criterion (e.g., Schipul & Just, 2016; Shu et al., 2001) or by the time needed to respond to a single stimulus (e.g., Nader et al., 2022). Confined to languages for which there were at least five effect sizes, 64% of studies were conducted in English, 6% in Hebrew, 6% in Japanese, and 8% in French.

Table 2 Overview of the tasks used within the seven most influential studies based on the nine most highly weighted effect sizes

Finally, the full data set was available from the authors upon request in one case only (i.e., Tovar et al., 2020) and not available for the remaining studies.

Meta-analysis

Our primary research question addressed whether and to what extent autistic individuals differ in category learning from nonautistic individuals. According to meta-analysis of the 112 effect sizes from 50 records, autistic individuals showed lower-level performance in category learning compared with nonautistic individuals. This effect was medium-sized and statistically significant, g = −0.55, 95% CI [−0.73, −0.38], p < .0001 (Table 3, supplementary Fig. 1, see https://osf.io/gtj2p/). Presence of heterogeneity was indicated by a significant Q statistic, Q(111) = 617.88, p < .0001, which is addressed in the following section (moderator analyses). Furthermore, total I2 was 85.14% indicating a substantial amount of true variance (vs. sampling error) in effect size estimates, the majority of which came from the within-study cluster, I2Level2 = 55.37%, compared with the between-study cluster, I2Level3 = 29.77%. Together, these results suggest that autism is associated with medium-sized lower-level category learning skills and that the effect sizes differ systematically between studies due to factors varying within-study (e.g., type of control group, type of dependent measure). Robustness of the effect was confirmed across influences analyses: Hat values ranged from .0031 to .0160, thus all scores were below the critical cut-off of 3/k = 3/112 = .0268 (Harrer, 2022). Similarly, Cook’s distance varied between .0000 and .0388, meaning that scores were below the threshold of .45 (Harrer, 2022) that would signal an influential study.

Table 3 Meta-analytic results

Moderator analyses

A series of moderator analyses checked whether the significant amount of between-study heterogeneity can be explained by two types of factors: Firstly, categorical variables including risk of bias (low vs. medium vs. high), type of control group (matched vs. not matched on IQ), type of category (isolated vs. interrelated; see Fig. 3), study language (English vs. Hebrew vs. Japanese vs. French), type of task (information-integration vs. prototype distortion vs. rule-based; see Fig. 3), type of dependent measure (accuracy vs. response time); and secondly, continuous variables, namely age of autistic group, IQ of autistic group, year of publication, and percentage of male participants within autistic group. Table 3 provides an overview of the overall meta-analytic effect as well as the effect of these moderator variables. For categorical moderators, the level presented first serves as the intercept. In this case, the p-value of the corresponding effect size parameter indicates whether the effect of the intercept differs significantly from zero. The effect of each of the subsequent moderator levels is compared to that of the intercept, so that the p values of these effects reflect whether the effect of the moderator level differs significantly from that of the intercept, and hence suggest whether the respective variable exerts a moderating influence. For continuous moderators, the effect size parameter is a regression weight, b, so that p values here indicate whether there is moderation by way of a significant linear relationship between moderator and outcome. As can be seen from Table 3, only one of the variables under investigation was found to be a significant moderator, namely study language. In particular, studies conducted in Hebrew were associated with a more negative effect than studies conducted in English (g = −1.28 vs. g = −0.46, p = .023). In contrast, none of the remaining variables moderated the overall effect (ps > .13).

Fig. 3
figure 3

Visualized effects of types of task and category. Note. Pirate plots reflecting effect size by task and category types, showing raw data points, a horizontal line reflecting the mean, a rectangle representing the 95% confidence interval, and a bean representing a smoothed density

Publication bias

The current pool of studies included published as well as unpublished work, for example, doctoral dissertations. According to the tests we carried out, presence of publication bias could not be ruled out. Visual inspection of the funnel plot revealed a substantial degree of asymmetry as there were more data points from relatively imprecise studies to the left than to the right of the mean effect (see Fig. 4). That is, studies with smaller samples and concomitant larger standard errors reported greater negative effects than studies with larger samples and concomitant smaller standard errors. Resonating with this, the slope of Egger’s regression test for funnel plot asymmetry was significantly negative, b = −6.41, SE = 1.16, t(110) = −5.51, p < .0001, indicating that the precision of the measured effect was significantly linked with the magnitude of the effect. As opposed to this, if there is no publication bias, a symmetric distribution of data points around the mean effect alongside a nonsignificant Egger’s test is to be expected (Egger et al., 1997).

Fig. 4
figure 4

Funnel plot of the results. Note. Effect sizes (in units of Hedges’ g) on the x-axis are plotted against their standard error on the y-axis. The dotted vertical line represents the mean meta-analytical effect. Within the funnel shape, the white area reflects the 90% CI of the mean effect, the dark-grey area reflects the 95% CI of the mean effect, and the light grey area reflects the 99% CI of the mean effect

Discussion

In the present article, we aimed to obtain a comprehensive research overview of category learning in autistic persons. Firstly, we investigated whether and to what extent autistic individuals differ in category learning from nonautistic individuals. Based on earlier narrative reviews (Mercado et al., 2020; Patry & Horn, 2019; Vanpaemel & Bayer, 2021), we predicted lower performance levels for autistic compared with nonautistic persons. This hypothesis was supported: Within our meta-analysis, results of a multilevel random effects model indicated that overall autistic individuals have lower-level skills of category acquisition compared with nonautistic individuals. This total effect was of medium size (g = −0.55) and statistically significant. Both accuracy (g = −0.49) and response time (g = −0.74) in categorization were affected. In sum, this is the first quantitative synthesis to evidence differences in category learning for autistic individuals. These differences suggest a form of atypical category learning leading to difficulties in fluency with regard to correctness and speed of categorization (e.g., Jimenez et al., 2021).

Second, resonating with earlier narrative review articles (e.g., Dovgopoloy & Mercado, 2013), there was a significant amount of heterogeneity between the effect sizes included in our meta-analysis. Therefore, we examined whether moderator variables would account for the observed heterogeneity. This turned out to be the case for only one of the moderator variables under examination, that is, study language—studies conducted in Hebrew yielded more negative effects than studies carried out in English (g = −1.28 vs. g = −0.46). Since no direct comparison of category acquisition in Hebrew and English has been carried out in the extant literature, this effect is difficult to interpret. There is also a potential confound of study language with type of category. In particular, 18.46% of the effect sizes linked with English studies addressed relational categories, whereas this was the case for 57.14% of the effect sizes linked with Hebrew studies. Beyond this, the number of studies in Hebrew was very low (k = 3). In sum, this pattern should be treated with great caution and points out a need for further research in this area. For the remaining moderator variables, namely age, year of publication, risk of bias (low vs. medium vs. high), type of control group (matched vs. not matched on IQ), IQ of autistic group, percentage of male autistic participants, type of category (isolated vs. interrelated), type of task (information-integration vs. prototype distortion vs. rule-based), and type of dependent measure (accuracy vs. response time), meta-regression models did not detect statistically significant effects. Thus, although we considered a large number of variables based on existing literature, we were not able to explain the variability observed in a fully reliable manner.

The overall quality of original studies was mixed. Risk of bias estimates related to validation of the autism diagnosis were very variable, ranging from low to high with approximate equal distribution. Further, to ascertain whether differences between autistic and nonautistic groups are specifically related to autism, it is important to closely match participant groups on all relevant characteristics except for presence of autism. In order that differences in general cognitive potential can be ruled out as explanatory factors, studies within autism research typically match participant groups on IQ (see Jarrold & Brock, 2004; Mottron, 2004). In the current pool of studies, the majority of investigations (i.e., 75%) did rely on participants’ IQ for group matching.

Another factor related to study quality is sample size, as small samples are characterized by instability of the mean estimates they provide (see Bishop et al., 2022; Tversky & Kahneman, 1971). In meta-analyses, this sampling error is acknowledged through study weight. Nevertheless, this weight is relative to the corpus of included studies, and so cannot compensate if sample sizes are low overall. Assuming a two-group design, total sample size of the currently included studies had a median of N = 41. Post hoc power analysis revealed that this sample size has a power of 88% to detect a medium-sized effect of d = 0.5 in a two-tailed t test for paired samples at the standard 5% significance level, and a power of 47% to identify a small-sized effect of d = 0.3 in the same sort of inference test. Thus, the present evidence base did not have sufficient power for detecting small effects, hence may be marked by a certain degree of imprecision. Still, low sample sizes are typical of autism research (Tager-Flusberg, 2004), so this limitation is by no means specific to the current meta-analysis. Furthermore, those indicators of study quality that were amenable to and thus included in moderator analysis, namely risk of bias and type of control group [matched vs. not matched on IQ], were not shown to impact on the results. This means that studies with lower quality did not seem to produce results different from those higher in quality.

Strengths and limitations

In this article, we presented the first-ever meta-analysis of category learning in autistic individuals. As distinct from earlier reviews, our synthesis aimed to incorporate all kinds of mental representations and all types of categories, based on preceding decades of research into human categorization. In addition, it provides a quantitative summary and used statistical methods in order to explain heterogeneity between studies. Related to this all-encompassing approach is the relatively large number of original studies/effect sizes included. Carter et al. (2019) demonstrated that meta-analytical methods involving 60 studies have excellent power. The present number of investigations—50 studies reporting 112 effect sizes—comes comparatively close to this figure. We also used an up-to-date statistical method, that is, a multilevel approach, to account for dependencies between effect sizes and avoid an overestimation of effects by erroneously assuming independence.

Nonetheless, there is reason to assume that the true difference in category learning between autistic and nonautistic individuals is somewhat lower than what is suggested by the total effect, g = −0.55. In particular, inspection of the funnel plot and Egger’s test revealed an asymmetric distribution of effect sizes in the sense that effects suggesting lower-level performance of autistic individuals were overrepresented. As this asymmetry could trace back to publication bias, future work could produce a more balanced picture if researchers and publication outlets tried to publish findings irrespective of statistical significance and direction of effects. Another limitation is linked with study heterogeneousness. Even though checks in terms of hat values and Cook’s distance suggested that our findings were robust, so not biased by individual influential studies, effect sizes were still afflicted with a significant amount of heterogeneity; and this heterogeneity could not be fully explained by any of our moderator variables. This means that currently unknown moderator variables could account for the heterogeneity. Further work is needed to clarify this issue.

As applicable to many lines of autism research, the present meta-analysis is limited in establishing causal links, here between the presence of autism and category learning skills. This is because, firstly, groups of participants were self-selected. Secondly, although the majority of studies, that is, 75%, matched participant groups on IQ, and many of them also on age and gender, whether these are the only variables critical to category learning is unknown. Thirdly, considering the relatively low sample sizes, it is not guaranteed that these further variables were randomly distributed across participant groups. In sum, the presence of autism is possibly not the only difference between participant groups that is apt to explain variations in category learning.

Given that we reported the first comprehensive meta-analysis on this topic, it is difficult to draw straightforward comparisons with previous syntheses. Still, it is interesting to see whether the present work arrived at similar conclusions as earlier reviews. In their article, Patry and Horn (2019) reported small- to large-sized disadvantages for prototype formation, and mostly medium- to large-sized disadvantages for both categorization and schema development. The medium-sized total effect obtained in our meta-analysis generally suggested lower differences between autistic and nonautistic individuals, which was probably still overstated due to publication bias. Hence, Patry and Horn’s (2019) synthesis is likely to be subject to even greater bias.

Mercado et al. (2020) emphasized that findings on learning perceptual categories in autistic individuals are heterogeneous, whilst they considered the bulk of the evidence to gravitate toward dysfunctional category learning. On the one hand, the findings of the present meta-analysis specify this view to the extent that they quantify effects and corresponding heterogeneity. On the other hand, they extend Mercado et al.’s (2020) view as they demonstrate that the effect goes beyond learning perceptual categories and seems to apply to all category types under investigation.

Finally, Vanpaemel and Bayer’s (2021) conclusions about prototype-based category learning in autism basically resonate with Patry and Horn’s (2019) inferences on prototype formation. In an attempt to explain divergent findings, Vanpaemel and Bayer (2021) focused on task characteristics. They hypothesized that tasks suggesting a prototype-based mental representation would pose greater challenges for autistic individuals than tasks prompting an exemplar-based mental representation. Since the vast majority of studies did not provide evidence about the type of mental representation utilized or built up by participants, we were not able to formally test this assumption within moderator analysis. However, such a test seems to be a promising objective for future work.

To what population and what outcomes do the present results generalize? Most of the included studies worked with older children, adolescents, or young adults, so inferences about these age groups are feasible in principle; in contrast, younger children and older adults were clearly underrepresented. Regarding gender, the majority of study samples involved approximately 83 to 96% male participants. Zeidan et al. (2022) reported a male-to-female ratio of 4.2 in autism corresponding to roughly 81% males among autistic persons. Therefore, male participants were slightly overrepresented in the current meta-analysis. Furthermore, most studies relied on verbal materials requiring at least basic language skills. Thus, it can be assumed that those studies worked with high-functioning autistic individuals who are not representative of the entire autism spectrum. In support of this, the average IQ of the autistic groups in most cases (referring to the interquartile range) ranged between 99.16 and 108.74. This bias in selection might go back to the frequently observed strategy that groups of autistic and typically developing groups are matched on IQ. Although autistic individuals with low levels of intelligence are therefore neglected, recent research demonstrates that this group of individuals may constitute a smaller portion of all autistic persons than thought previously (Billeiter & Froiland, 2023; Katusic et al., 2021; Wolff et al., 2022).

Looking at study outcomes, it is striking that 80% of the included studies investigated isolated categories. A more complete picture of autistic individuals’ category learning skills would benefit from a more thorough examination of interrelated categories. Similarly, since the vast majority of studies (64%) were conducted in English, a greater number of studies carried out in other languages would be desirable. These would then permit conclusions about links between category learning and language in autistic individuals.

In sum, results of the current meta-analysis are generally in line with previous syntheses, but specify these. More precisely, autistic persons on average were found not to reach the level of category learning typically achieved by nonautistic individuals; yet the size of the total effect alongside examinations of publication bias indicated that the group difference might be smaller than suggested by earlier overviews. Beyond this, the present results prompt several areas for future research: Firstly, investigation of moderator variables elucidating heterogeneity, for instance, type of mental representation in combination with task characteristics; secondly, looking at downstream effects of suboptimal category learning skills, for example, for academic performance; and thirdly, developing and implementing interventions tailored to the needs of autistic individuals.