Introduction

Autism is a neurodevelopmental disorder, which, according to some current estimates (Maenner et al. 2020), affects about 1 in 54 people. Autism is diagnosed on the basis of clinical observations of certain behavioural features, including persistent difficulties in social communication and interaction, and repetitive and stereotyped behaviour, interests and activities, which cause clinically significant problems in social, educational or occupational functioning (DSM-5, APA, 2013). There are also some characteristic features of autistic cognition, which involve orientation toward detailed perceptual features (often at the expense of central coherence), impairments in social imagination (for example, in imaginative play) and, most importantly for the present discussion, deficits in communicative behaviours (e.g., Bal et al. 2019; Happé & Frith, 2006; Lobban-Shymko et al. 2017; Pellicano et al. 2006; Rajendran & Mitchell, 2007).

Nevertheless, starting from the early descriptions of autism (Kanner, 1943), some special skills and talents have also been identified, including performance on visuospatial tasks and visual search (e.g., Dawson et al. 2007; Frith, 1989; Wing & Wing, 1976). In terms of performance on verbal tasks, outstanding performance on some complex tasks relating to reasoning and judgment have also been described. In particular, autistic people have been found to be less susceptible to some reasoning biases and memory illusions than typically developing people (e.g., De Martino et al. 2008; Morsanyi et al. 2010; Wojcik et al. 2018), although these findings may be attributed to impairments in the automatic processing of linguistic information in context (Pijnacker et al. 2010).

Nevertheless, not all forms of contextual processing are impaired in autism. Nonverbal analogical reasoning with various materials (e.g., analogies based on perceptual relations, and scene analogies) has been identified as a particular strength. Specifically, autistic people perform at the same level as age- and IQ-matched typically developing individuals on nonverbal analogy tasks, and autistic people with learning difficulties show superior performance when compared to age- and IQ-matched controls (see Morsanyi et al. 2020a for a systematic review and meta-analysis). Analogical reasoning relies on the ability to find and exploit similarities among entities based on relations, rather than the features of entities (e.g., Gentner, 2010; Holyoak, 2012; Holyoak & Lu, 2021). In other words, it requires flexibility and abstraction, instead of focusing on specific details. Surprisingly, autistic people have been found to be able to perform analogical reasoning even in the presence of salient distractors, and the strategies used to solve analogical reasoning problems have also appeared very similar to the strategies used by typically developing individuals (Morsanyi & Holyoak, 2010).

Impairments in figurative language understanding have long been considered a defining characteristic of autism (e.g., Happé, 1993; 1995). Nevertheless, language impairments are no longer part of the diagnostic criteria of autism (APA 2013), and there is an ongoing debate about the extent to which impairments in figurative language are present in autism, independent of core language impairments (see Kalandadze et al. 2019; Kalandadze et al. 2018; Morsanyi et al. 2020b; Saban-Bezalel & Mashal, 2018; Vulchanova et al. 2015 for reviews). Additionally, although in the literature on figurative language in autism it is common practice to investigate various forms of figurative language together, the processing requirements of different figures of speech are not the same. For example, the appreciation of irony and sarcasm necessitates detecting the incongruity between the state of affairs and the literal interpretation of a verbal message. By contrast, comprehending metaphors has been proposed to build on the ability to find and exploit similarities based on relations among entities (e.g., Tourangeau & Sternberg, 1981) as well as the activation and integration of semantic and conceptual information (Kintsch, 2000). In a recent meta-analysis of figurative language processing in autism, Kalandadze et al. (2018) reported that the effect size of group differences in irony and sarcasm (Hedges’ g = 0.48) appeared smaller than the effect size of group differences in metaphor processing (Hedges’ g = 0.72). This finding highlights the practical significance of differences in the underlying cognitive mechanisms between different figures of speech.

In light of the above considerations, we present a systematic review and meta-analysis, focusing on idioms and proverbs, both of which represent conventionalized, and often opaque forms of figurative language, which points to a unique way they are processed in conversational contexts. These special types of symbols with (frequently) abstract meanings are likely to depend more intensely on language abilities and crystallized intelligence and less on fluid intelligence, as compared to other forms of figurative language, such as metaphors. Based on these considerations, we have decided to investigate idiom and proverb processing together in the present study.

Idiom and proverb processing

Idioms are multi-word expressions with a non-literal meaning, which can only be derived by using contextual information. Idiomatic expressions are one of the most commonly used forms of figurative language in everyday conversational situations. Just like phrasal verbs, they can be a string of words that corresponds to a single semantic unit (Saeed 2016, p. 56, 444). In most cases, one needs to go beyond their literal meaning in order to understand the essence of what is said (Titone & Connine, 1999). Proverbs are fixed, figurative and traditional expressions, presented in a sentential form and in succinct and formulaic language (Mieder, 2004, p. xi). They typically involve social or moral norms and expressions of wisdom with regard to different everyday life situations. The origins of both of these language phenomena are diverse and can involve metaphoricity, as well as properties of other figures of speech, such as metonymy and hyperbole.

The debate surrounding the processing of idioms is far more complex and longer-running as compared to proverbs. There are two main positions (see Vulchanova et al. 2015) with each of them implying a different approach to the nature of idiom processing, placing them at a different position in terms of processing complexity (see Titone & Connine, 1999; Vulchanova et al. 2011). According to one group of scholars, mostly linked with the lexical representation hypothesis, idioms are stored as lexical items, but along with (usually fast) retrieval, there is also a more complex literal compositional computation process involved in idiom comprehension, as each element is decomposed separately (e.g., Bobrow & Bell, 1973; Chomsky, 1980; Hamblin & Gibbs, 1999; Swinney & Cutler, 1979). The opposing view is based on the configuration hypothesis (Cacciari & Glucksberg, 1991), a more compositional approach, which sees idiom comprehension as a dynamic process in which idioms are considered complex expressions, with constituent parts contributing to the overall meaning of the expression (e.gCacciari & Glucksberg, 1991; Cacciari & Tabossi, 1988). In reviewing different positions, Titone and Connine (1999) proposed a middle-ground or hybrid model which combines the compositional and the non-compositional aspects of idiom comprehension in which idioms retain their lexical status along with a degree of decomposability.

Proverbs are similar to idioms in terms of their origins and degree of conventionality, but are more tied to the notion of causality, where language users are expected to be able to use them as “instructive expressions” (Chahboun et al. 2016) in specific contexts. Therefore, they are likely to depend on both pragmatic and analogical abilities, as we apply formulaic language elements to novel situations which emerge in everyday contexts. On the whole, in spite of the fact that we are talking about linguistic elements which are to some degree fossilized (both idioms and proverbs), the use of these elements goes beyond merely understanding their meaning—in many cases, one needs to be able to apply them to concrete contexts, which is why they are tightly bound to pragmatics. As Gibbs and Beitel (1995, p. 133) put it “the ability to correctly explain what a proverb means does not necessarily imply that an individual can think abstractly”. This is in line with the findings that their comprehension is facilitated by the presence of supportive context (see Vulchanova et al. 2015), in which regard they are similar to metaphors (see Gildea & Glucksberg, 1983; Ortony et al. 1978; Stamenković et al. 2020). Nevertheless, idioms (which are sometimes referred to as “dead metaphors”) and proverbs differ from metaphors in that the link between their literal and figurative meaning is indirect, opaque or non-existent (cf., Vulchanova et al. 2019).

Figurative language processing in ASD

Problems related to figurative language comprehension and production have long been seen as one of the characteristic features of autism, and it has been proposed that they might be linked to more general problems with reading other people’s mind (e.g., Baron-Cohen et al. 2000; Happé, 1993, 1995). Indeed, when processing figurative language, the listener has to go beyond the literal meaning of an expression to derive the speaker’s intended meaning. Nevertheless, more recently it has been suggested that a likely cause of figurative language impairments and pragmatic issues in people with autism is a more general language impairment tied with problems with structural language skills and semantic knowledge (e.g., Gernsbacher & Pripas-Kapit, 2012; Geurts et al. 2020; Norbury, 2005; Saban-Bezalel & Mashal, 2018). In support of this claim, in their meta-analysis of several types of figurative language, including idioms and proverbs, Kalandadze et al. (2018) concluded that individuals with autism exhibited moderately poorer figurative language comprehension skills in comparison to typically developing controls. Nevertheless, in studies in which the ASD and TD groups were matched on verbal ability, differences in figurative language were not found. Kalandadze et al. (2018) did not analyze the results related to idioms and proverbs separately. Indeed, the results extracted from each study often combined performance related to various forms of figurative language, which makes it difficult to assess to what extent this claim applies to these forms of figurative language.

Morsanyi et al. (2020b) performed a systematic review and meta-analysis related to metaphor processing in autism, and also investigated the relations between the age and verbal ability of participants and the effect size of group differences. Unlike the review by Kalandadze et al. (2018), this study only included studies where the autistic and non-autistic groups were matched on both chronological age and (verbal) intelligence. Morsanyi et al. (2020b) found that, overall, there was a difference in metaphor processing ability between autistic and typically developing participants with a medium effect size, even when the groups were matched on age and verbal ability. Nevertheless, group differences were smaller or non-existent in the case of participants with high levels of intelligence. This finding potentially suggests that matching participants on verbal ability might only eliminate group differences in figurative language processing when the groups are characterized by high levels of intellectual functioning. Nevertheless, this meta-analysis was specific to metaphor processing and the findings might not apply to other figures of speech.

Regarding the cognitive processes underlying figurative language processing in autism, in a critical review of the literature, Vulchanova et al. (2015) explains that besides various types of linguistic abilities (ranging from vocabulary size to syntactic and semantic knowledge), individuals’ knowledge base, and their ability to draw inferences, perform information integration, suppress irrelevant information, and their mentalizing skills all play a role. In the case of idioms, the role of inferencing skills, the ability to integrate contextual information from both verbal and nonverbal sources, and linguistic skills and competences (especially, the role of syntax—see Whyte et al. 2014) have been highlighted by Vulchanova et al. (2015). Some properties of idiomatic expressions also affect their processing demands. These include their transparency/decomposability (i.e., transparent expressions are easier to understand than opaque ones), familiarity (i.e., a higher degree of familiarity facilitates understanding), and the context in which they are encountered (cf., Vulchanova et al. 2015). In terms of presentation format, Mashal and Kasirer (2012) found that children with ASD understood more easily visually presented idioms than verbally presented ones, and Vulchanova et al. (2019) reported that idioms and proverbs were processed more easily in a written format as compared to auditory presentation.

In summary, there is good evidence for figurative language impairments in autism, including idioms and proverbs. Nevertheless, this deficit might not always be present. In particular, in the case of high ability autistic participants who are carefully matched to a control group on age and verbal ability, group differences might be absent (see Morsanyi et al. 2020b). Although some reviews of the literature have discussed findings related to idiom and proverb processing in autism (Saban-Bezalel & Mashal, 2018; Vulchanova et al. 2015), no meta-analyses so far have focused specifically on idioms and proverbs. Thus, it is not known whether idiom and proverb processing deficits in autism exist beyond impairments in core language abilities (see Kalandadze et al. 2018). It is also possible that there is a relationship between participants’ age and the magnitude of group differences. In particular, if there is a developmental delay in idiom and proverb processing in autism, the effect size of group differences might decrease with age.

Aims and Scope of the Present Review

The present review focuses on idiom and proverb processing in autism. Similar to Morsanyi et al. (2020a, 2020b), this meta-analysis included studies in which the ASD and TD groups were matched on both chronological age and IQ (ideally, verbal IQ). Given that autism is a neurodevelopmental disorder, it is important to consider not only the verbal abilities of participants, but also their age (i.e., the samples should ideally be matched on both age and verbal ability levels). For example, it is not always reasonable to assume that a group of autistic participants with intellectual impairment would perform on a particular cognitive task in the same way as a typically developing group of younger participants, who are matched on verbal ability to the autistic group (see Jarrold & Brock, 2004 for a detailed discussion of issues around comparing autistic and non-autistic participants on cognitive tasks). Although using this strict criterion eliminates a number of studies from our analyses, this means that we can better understand the potential role of participants’ ability levels and their age in the size of group differences. In other words, we can examine the questions of whether there is a developmental delay in idiom and proverb processing in autism, and whether (similar to metaphor processing; see Morsanyi et al. 2020b), group differences disappear in the case of participants with high verbal ability.

Thus, in addition to performing a meta-analysis, we also conducted meta-regressions to determine whether the size of group differences in idiom and proverb processing was related to participants’ level of (verbal) intelligence and their chronological age. Additionally, we also performed analyses related to the presence of a publication bias, and evaluated the quality of the studies included in our analyses. Publication bias refers to a tendency to publish studies that report positive (i.e., significant) findings, which can distort the results of meta-analyses, and lead to an overestimation of effect sizes. For example, Polanin et al. (2016), based on a review of meta-analyses published in top-tier educational and psychology journals, found that published studies reported larger effect sizes than unpublished studies, with an average difference of 0.18 standard deviation. Our systematic review and meta-analysis was preregistered in the International Register of Systematic Reviews, PROSPERO, Registration number: CRD42021235762 (available at: https://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=235762).

Method

Search strategy

In designing our systematic review and reporting the results of the meta-analysis, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The following databases were included in our search procedure: Web of Science, Scopus, and ProQuest Dissertations & Theses Global for all studies published up to February 2021. We used the following combinations as search terms: autis* OR ASD OR Asperger* crossed with idiom* OR proverb*. The target fields in the searches included titles, abstracts, keywords, topics, subjects and indexing. There were no restrictions in terms of the publication year.

The grey literature was also covered, including studies that were not published in peer-reviewed journals (e.g., conference proceedings) and dissertations, which we found in ProQuest Dissertations & Theses Global. By searching for studies in the grey literature, our aim was to minimize the effects of publication bias. We also searched the references of all articles that reached the full-text stage in our search for potential additional papers to be checked against our eligibility criteria.

Study inclusion criteria

We selected articles for the meta-analysis on the basis of the following set of predetermined criteria: (1) Each paper had to report the results of an original research study including an idiom or proverb comprehension or production task, and scores relating to accuracy on the relevant task had to be reported separately from other measures.Footnote 1 Studies were excluded if they involved data that were insufficient to calculate effect sizes or if the relevant data could not be obtained from the authors; (2) In each study, participants had to be diagnosed with ASD by experienced clinicians using the standard diagnostic criteria: Diagnostic and Statistical Manual of Mental Disorders (DSM) or International Classification of Diseases (ICD). This included all diagnostic categories of Autism Spectrum Disorder which had previously been considered separate (i.e., Asperger’s syndrome and pervasive developmental disorder); (3) The study had to include a TD comparison group, matched to the ASD group on chronological age and IQ.

Screening process

The search was conducted by the two authors independently, using three major databases and their search engines. In the first step, we compared the three lists of results (coming from Web of Science, Scopus and ProQuest D&T Global) and removed all duplicates. Then, we read the titles and abstracts of each study. In case the study appeared to meet our eligibility criteria, or if the title and abstract provided insufficient information, the full-text article/dissertation was reviewed. The tripartite inclusion criteria described above were then applied to determine the final selection of papers to be included in the meta-analysis. At each stage, any disagreements between the two raters were resolved by discussion. Details regarding the number of papers considered at each stage of the selection process, as well as the reasons for exclusions, are presented in the PRISMA flow diagram (Moher 2009) in Fig. 1.

Fig. 1
figure 1

The PRISMA flow diagram describing the search process, the number of papers included at each step, and the reasons for exclusion at the full-text stage

Data extraction

For descriptive purposes, we extracted the following study characteristics: title, authors and publication year, language and country, types of tasks involved. The number of ASD and TD participants, and descriptive statistics (means and SDs for performance on the idiom and proverb tasks for both groups) reported in the papers were extracted for the purposes of statistical analyses. In those studies that encompassed neuroimaging or any similar data, we extracted only the behavioural results. When a study included several tasks relating to idiom and proverb processing (or the results were broken down by type of task), the results of these were combined into a single composite measure for the study. This was done because computing effect sizes multiple times based on data coming from the same sample can lead to a misrepresentation of the overall results (see Borenstein et al. 2009). Finally, we extracted the average age and the IQ of the participants in the ASD and TD samples, and recorded the approach used for matching the groups on intellectual ability (i.e., verbal, full-scale or non-verbal IQ).

Meta-analytic procedures and analysis

The statistical analyses were conducted using the Comprehensive Meta-Analysis (CMA) software, version 3 (Biostat). For each effect size a 95% confidence interval was computed, using the original data from the studies. Hedges’ g (a variation of Cohen’s d correcting for biases caused by small sample sizes; Hedges, 1981; Hedges & Olkin, 1985) was used as our effect size statistic. The Hedges’ g value was assigned a positive sign when TD individuals had the higher group mean (and vice versa for negative values). The overall effect size was estimated by calculating a weighted average of individual effect sizes, based on a random effects model that assumes that between-study variations in effect sizes result not only from random error, but also from systematic effects of variables that are likely to vary from study to study (Borenstein et al. 2009).

The possibility that factors beyond an ASD diagnosis had an impact on effect sizes exists because of heterogeneous effect sizes. The heterogeneity of effect sizes was statistically tested using Cochran’s Q-statistic. Besides this, we report the I2 statistic, which expresses the percentage of variation in effect sizes across studies which can be attributed to systematic effects of study variables (rather than chance; Higgins & Thompson, 2002; Higgins et al. 2003). We carried out meta-regression analyses using random-effects models in order to test for the effects of possible moderator variables—the average age and IQ of ASD participants. A funnel plot was created to identify and evaluate the effect of publication bias on the overall effect size estimate. The effect sizes of group differences were plotted on the horizontal axis against the standard error of effect sizes on the vertical axis. The classic fail-Safe N statistic (Rosenthal, 1979) was employed to test for publication bias, although this result should also be treated with caution when a relatively small number of studies are included in the analysis (Lau et al. 2006). This statistic investigates the stability of the findings of the meta-analysis by assessing the degree to which including additional studies with non-significant findings would change the overall result to non-significant (Long, 2001).

Finally, we have also assessed the quality of the included studies, using a similar procedure to Spain et al. (2018) and Morsanyi et al. (2020b), focusing on the following three criteria: (1) sample characteristics, (2) characteristics of the task, and (3) procedure and materials used for matching the groups on verbal and/or intellectual ability, using the procedures and criteria defined in Morsanyi et al. (2020b).

Results

A total of 11 studies (published in 10 papers), involving 235 ASD and 224 TD participants, were included in the analyses. The country and language of each study, the characteristics of the participants included in each study in the meta-analysis (number of participants in the ASD and TD groups, and their mean chronological age and verbal- or full-scale IQ) and the type of task(s) used in each study, are listed in Table 1. The studies are listed in the table in rank order of effect size (Hedge’s g) for the group difference in performance on the idiom or proverb task(s), from positive (i.e., the TD group performs better) to negative (i.e., the ASD group performs better). A point estimate of effect sizes is also included in the table. The studies were conducted in five different countries and four different languages (English, Hebrew, Korean, and Spanish), and included participants spanning a reasonably broad age range: from mid-childhood through early adolescence to adult samples. Participants’ mean IQ scores ranged from average to above average, which can be expected given that all studies matched the ASD participants to a TD group on IQ. The idiom (n = 6) and proverb (n = 4) tasks used in the studies were diverse in terms of the properties of the idioms and proverbs, and the modality of presentation, although most studies used a multiple-choice response format.

Table 1 Participant characteristics, type of idiom or proverb task, and effect size of group differences in the studies included in the meta-analysis

Figure 2 presents the effect size of the group differences in idiom or proverb comprehension (Hedges’ g with 95% CIs) between individuals with ASD and matched TD controls. Overall, the results showed a medium sized group difference in idiom or proverb comprehension (g = 0.69, p < 0.001; 95% CI 0.24–1.12) in favour of the TD group.Footnote 2 It is notable that there was one study, which showed a large difference in favour of the ASD group (McCrimmon et al. 2012), there were three studies that reported no difference in performance between the groups, and there was also a cluster of 6 studies that reported very similar effect sizes (a large group difference of just above 1 in favour of the TD group). There were two studies (Lee et al. 2015 and Strandburg et al. 1993) that matched the groups on full-scale IQ instead of verbal IQ, and one study (Whyte et al. 2014) that matched the groups on non-verbal IQ. Lee et al. (2015) and Strandburg et al. (1993) were among the cluster of studies that yielded the largest group differences. Leaving out these studies slightly reduced the overall effect size, although it did not have a major effect (g = 0.60, p < 0.001; 95% CI 0.004–1.19).

Fig. 2
figure 2

Hedges’ g effect sizes with 95% confidence intervals for group differences in idiom and proverb processing accuracy between individuals with ASD and age- and verbal intelligence-matched controls. The overall mean effect size is presented in the bottom line (and marked by a filled diamond in the figure)

The heterogeneity between studies was significant (Q (10) = 55.73, p < 0.001, I2 = 82.05). This analysis indicated that a considerable proportion of the variance in effect sizes (82%) could be attributed to true variance rather than to random noise. To investigate the potential causes of variation in effect sizes, we conducted two meta-regression analyses. The first one investigated the relations between participants’ age and the magnitude of group differences. This analysis included all studies in our meta-analysis (n = 11). The results were non-significant (p = 0.344), indicating that participants’ age was unrelated to the effect size of group differences. We also performed a meta-regression analysis to investigate the relations between participants’ level of (verbal) intelligence and the effect size of group differences. In this analysis, we could only include 9 studies that reported participants’ (verbal) intelligence. The model was only marginally significant (Q (1) = 3.55, p = 0.059), explaining 36% of the variance in effect sizes between studies. An inspection of the regression plot (Fig. 3) indicated that the mean (verbal) IQ of autistic participants was negatively related to the effect size of group differences (i.e., group differences were smaller in the case of participants with higher verbal ability), although it is worth remembering that this was only a marginally significant trend.

Fig. 3
figure 3

Plot presenting the regression of Hedges’ g effect sizes for group differences in idiom and proverb comprehension scores on the average verbal IQ of the ASD participants in each study

Figure 4 presents a funnel plot which assesses the possible impact of publication bias. This analysis encompassed all studies (n = 11) included in our meta-analysis. The funnel plot presents the effect sizes of group difference on the horizontal axis and the standard error of effect sizes (a sample-size dependent statistic) on the vertical axis. Larger studies (i.e., studies with smaller standard errors) appear toward the top of the graph and, in general, they are expected to cluster around the mean effect size. Smaller studies (i.e., studies with larger standard errors) appear toward the bottom, and are expected to be more dispersed. In the absence of publication bias, studies should be symmetrically distributed on each side of the overall mean effect size (Cooper et al. 2009). If publication bias is present, a higher concentration of studies is expected on one side of the mean, toward the bottom of the plot. A visual inspection of the plot showed some asymmetry, with the McCrimmon et al. (2012) study appearing as an outlier. Nevertheless, in the case of a random effects model, it can be difficult to interpret the funnel plot visually (Lau et al. 2006). For this reason, we have also conducted some statistical analyses relating to the potential presence of publication bias. Using the fail-Safe N statistic, our analysis revealed that 117 missing studies with a null finding would be needed to change the overall result of the meta-analysis to not significantly different from zero. Rosenthal (1969) proposed that the results can be considered stable if the number of missing studies that would change the result to non-significant is above 5 k + 10, where k represents the number of studies in the meta-analysis. In our case, the critical number of missing studies would have to be above 65 for the results to be considered stable. As 117 is well above this critical value, the results can be considered stable (i.e., there is no evidence of a publication bias). This method is, however, less reliable when the number of studies included in the analysis is low (Lau et al. 2006). As an additional check for the possible presence of publication bias, we also computed the correlation between sample sizes and effect sizes. In this analysis we considered McCrimmon et al. (2012) with a positive sign, as our interest was in determining if larger studies tended to yield smaller effect sizes than smaller studies. The correlation was weak, negative and non-significant (r(8) = − 0.07, p = 0.839), indicating that smaller studies were not systematically associated with larger effect sizes.

Fig. 4
figure 4

Funnel plot to assess the potential impact of publication bias

In terms of the quality of the papers (Table 2), all of them were published in peer-reviewed journals, and we judged that all of them were of at least moderate quality, which suggests that in general the methodology of the papers can be considered robust. The studies usually matched the groups at least on age and gender (apart from intelligence), and several papers used additional measures as well. About half of the studies gave detailed explanations regarding the inclusion and exclusion criteria that they used. We also listed the specific diagnostic categories of the participants in Table 2 (although in the DSM-5, all of these categories have been merged into a single autism spectrum disorder diagnostic category). Most studies used idiom and proverb tasks that were created by the authors, but these tasks were typically piloted with independent samples, indicating that they were appropriate for the relevant age- and ability-level groups. Although the studies did not generally report the reliability of the idiom and proverb tasks, the number of items tended to be reasonably large. All studies matched the participants on a standardized test of verbal ability or intelligence (usually using a version of the Wechsler intelligence scale), although the details of the instrument used by Lee et al. (2015) were not described. Some studies assessed language skills with multiple tests, although the most common strategy was to match the groups on the Vocabulary subtest of the Wechsler Intelligence Scale.

Table 2 Sample characteristics, design and properties of the idiom and proverb tasks, materials used for matching the groups on (verbal) IQ, and publication status of the studies included in the meta-analysis, with global quality assessment

Discussion

This meta-analysis focused on idioms and proverbs, two special forms of figurative language, which are more opaque and conventionalized than metaphors, and, for this reason, rely more heavily on the processing of contextual cues, and might also be more difficult to process (cf., Vulchanova et al. 2015). Idioms and proverbs are also likely to be more language-related than metaphors, the processing of which, in some contemporary views, go beyond language (see Holyoak, 2019; Holyoak & Stamenković, 2018).

Our analysis revealed a medium size group difference in processing idioms and proverbs in autism, with a point estimate of g = 0.69. A detailed evaluation of the methods presented in the papers suggested that the quality of the studies was generally good. Moreover, there was no evidence of publication bias, which suggests that our estimate of the overall effect size may be considered reliable. The overall effect size in our study is very similar to the effect size of group differences in the case of metaphors reported in recent meta-analyses (g = 0.72 in Kalandadze et al. 2018, and g = 0.76 in Morsanyi et al. 2020b). This suggests that although the cognitive processes involved in understanding metaphors might be different from how idioms and proverbs are processed, and idioms and proverbs might be more bound to verbal skills, the difference in their levels of difficulty is not very substantial. Indeed, the heterogeneity of group differences reported by the individual studies (ranging from non-significant to large in size, and including differences in favour of both the TD and the ASD groups) suggests that factors other than the type of figurative language (such as the characteristics of the participants involved in the studies, and the properties of the idiom and proverb processing tasks) might be responsible for much of the variance.

In our analyses, we only included studies where the autistic and TD samples were matched on both age and (preferably verbal) intelligence. Several theorists proposed that impairments in figurative language are related to more general issues with language skills in autism (e.g., Brock et al. 2008; Gernsbacher & Pripas-Kapit, 2012; Geurts et al. 2020; Norbury, 2005), and when the groups are well-matched in terms of their language skills, group differences should disappear. There were two studies (Lee et al. 2015 and Strandburg et al. 1993) that matched the groups on full-scale IQ instead of verbal IQ, and one study (Whyte et al. 2014) that matched the groups on non-verbal intelligence. The effect sizes reported by these studies were among the largest, indicating that matching strategy might impact the size of group differences. Nevertheless, excluding these studies only reduced slightly the overall effect size estimate (from 0.69 to 0.60), which suggests that matching strategy (in terms of whether verbal, non-verbal or full-scale IQ was used) did not have a very considerable effect on the results.

We also conducted a meta-regression analysis to investigate the potential relation between participants’ level of intelligence and the effect size of group differences. This analysis yielded a marginally significant result, indicating a trend for smaller group differences in the case of participants with higher (verbal) ability levels. This suggests that group differences might be reduced or absent in the case of participants with particularly high verbal abilities. Although this result was not statistically significant (most likely due to a lack of statistical power, as we could only include 9 studies in this analysis), there are reasons to believe that this finding might be meaningful. One reason is that the effect size relating to the effect of (verbal) ability on the size of group differences was moderate (R2 = 0.36), and because this result replicated the findings of Morsanyi et al. (2020b) in relation to metaphor processing. These results suggest that closely matching the autistic and TD groups on verbal skills might not be enough to eliminate group differences, unless participants have very high verbal ability. This finding contrasts with claims in the literature that matching ASD and TD participants closely on verbal ability eliminates group differences (e.g., Brock et al. 2008; Gernsbacher & Pripas-Kapit, 2012; Geurts et al. 2020; Norbury, 2005), although we should note that most studies did not use an extensive range of tasks to match the groups on verbal ability.

One study included in our meta-analysis (McCrimmon et al. 2012) reported a large group difference in proverb explanation in favour of the ASD group (comprising of participants diagnosed with Asperger’s syndrome, with no history of a language delay). The findings of this study stand in contrast with the other studies in our meta-analysis, which all reported a group difference in favour of the TD participants, although this difference was negligible in some cases. The findings of this study are even more surprising, given that previous reviews suggested that verbally explaining the meaning of figures of speech is particularly challenging for autistic participants (cf., Kalandadze et al. 2019). A potential reason for this seemingly anomalous finding is that, as idioms and proverbs are conventionalized, and their representations are stored in the mental lexicon (Conklin & Schmitt, 2012), explaining their meaning is not too dissimilar to explaining the meaning of individual words, which tends to be a peak ability in Asperger’s syndrome (see Soulières et al. 2011). In contrast to McCrimmon et al. (2012), Tzuriel and Groman (2017) found a large difference in favour of the TD group on a proverb explanation task (see Fig. 2). Although, both in McCrimmon et al. (2012) and Tzuriel and Groman (2017) the materials were presented verbally by the experimenter, it is likely that the latter study was more similar to an everyday conversational situation, as the materials required taking into account a story context. This could have increased the processing demands of the task in the case of the ASD group.

The finding that participants with ASD do not always have problems with figurative language processing, and sometimes even outperform TD participants, fits well with suggestions that pragmatic impairments in autism are neither global nor uniform (Geurts et al. 2020), and also with recent changes in the DSM diagnostic criteria for autism that no longer involve language impairments (although impairments of non-verbal communication still feature prominently). Indeed, overall, our findings suggest that autistic participants with high verbal ability are able to perform idiom and proverb processing tasks with a similar level of accuracy as age- and verbal intelligence-matched TD participants (and sometimes even better).

Relating to the above point, it should be noted that an absence of group differences in accuracy does not necessarily imply an absence of differences in the strategies that participants use when processing idioms and proverbs. For example, Vulchanova et al. (2019) reported no group differences in accuracy on a sentence-picture matching task involving idioms and proverbs either in the case of children or adults. Nevertheless, eye-tracking and mouse-tracking results showed that participants with autism spent more time on considering the literal (as well as the figurative) interpretations of the idioms and proverbs presented to them. This might suggest that they were less able to suppress a literal interpretation of these figures of speech (see also Gold & Faust, 2010; Melogno et al. 2017; Vulchanova et al. 2019), or could simply point at a processing style where participants with autism take longer to make decisions and avoid “jumping to conclusions” (Brosnan et al. 2014).

Saban-Bezalel and Mashal (2015) also reported no difference in the performance of autistic and non-autistic participants on a lexical decision task related to idiom processing. However, they did find evidence for differences in the hemispheric processing of these expressions (with a right hemisphere advantage in the TD group, but bilateral processing in the ASD group). A potential explanation is that the ASD group relied on a compensatory mechanism. If this is the case, this might explain why ASD participants with higher levels of intelligence are more likely to show no impairments in figurative language processing.

In a separate meta-regression analysis, we also investigated the relation between participants’ age and the effect size of group differences. In this analysis, we were able to include all studies from our meta-analysis. There are suggestions in the literature that there is a developmental delay in autism in figurative language processing (cf., Saban-Bezalel & Mashal, 2018; Vulchanova et al. 2015). If this was the case, we could expect that the effect size of group differences might decrease with age. Idiom processing shows a protracted development from around the age of five to adolescence (see Hattouti et al. 2016 for a review). The samples included in our meta-regression covered a broad age range from mid-childhood to young adulthood, which made it possible to look at developmental differences at the time period which is critical for the development of idiom and proverb processing. The meta-regression relating to the effect of participants’ age yielded a non-significant result, which suggested that the effect size of group differences neither increased nor decreased with age. This finding speaks against a general developmental delay in idiom and proverb processing in autism.

Limitations

The most important limitation of this study is the relatively small number of studies that we were able to include in our analyses. Although our overall estimate for the average effect size of group differences is likely to be reasonably accurate, we experienced a problem with statistical power in the case of our meta-regression analysis relating to the effects of (verbal) ability. Another factor that limits our ability to advance current debates in the field, relating to the role of specific language skills in figurative language in autism, is that most studies in our review used a vocabulary test to match the groups. This subtest is considered to provide a good measure of an individual’s expressive vocabulary, verbal knowledge, and crystallized and general intelligence, and it also draws heavily on memory, learning ability, and concept- and language development (Sattler, 1988). Nevertheless, it does not provide a good measure of syntax, which is considered to play an important role in the processing of idioms, at least in real-life conversational situations (cf., Whyte et al. 2014). Relating to this point, it is notable that most studies in our analysis presented short idioms and proverbs in a written format on a computer screen, without the requirement to process contextual information, which is very different from how idioms and proverbs are typically encountered in everyday situations (where pragmatics play a more important role).

Future directions

Our study suggests that, in general, there is a medium size difference in idiom and proverb processing between autistic and TD participants, in favour of the TD group. Nevertheless, the studies included in our meta-analysis yielded extremely heterogeneous results, which calls for further investigations into the circumstances in which autistic people might perform well or where they might struggle. Given that there was only one study which found an advantage in favour of the ASD group, it would be important to follow up on this finding, and see if it replicates in a more diverse ASD group, also including participants who experienced a delay in their language development.

Another interesting future avenue could be to examine the role of context in understanding idioms and proverbs in autism. So far, no study has manipulated the presence/absence of context. The presence of supporting context facilitates the processing of the figurative meaning of idioms and proverbs in the case of TD individuals (e.g., Vulchanova et al. 2015), helping them to avoid a literal interpretation. Nevertheless, contextual cues might be less helpful for autistic individuals when they have to suspend common interpretation of statements (e.g., Happé, 1997; Joliffe and Baron-Cohen 1999; López & Leekam, 2003; Morsanyi & Handley, 2012—although see e.g., Giora et al. 2012). The investigation of contextual cues could involve the presentation of proverbs and idioms embedded in a text or a story, compared to a decontextualized presentation format. Additionally, the role of ecologically valid contextual cues (such as prosody and facial expressions) could also be investigated. For example, it has been noted that the auditory presentation format might be difficult for autistic participants (Vulchanova et al. 2019). Nevertheless, within the auditory modality there is likely to be important differences between an experimenter presenting participants with idioms and proverbs, as opposed to an auditory presentation of the same materials by a robotic voice. The inclusion of context would make it possible to evaluate the role of pragmatic abilities in both idiom and proverb comprehension in autism. The results of these investigations may also contribute to the debate related to the competing models of idiom comprehension.

Our decision to consider idioms and proverbs together due to their shared features of conventionality and opacity, and their differences in comparison to metaphor, shall not be interpreted as a claim that the processing of these two forms of figurative language is exactly the same. Nevertheless, we expected that the patterns of impairment in ASD might be similar in the case of idioms and proverbs. Indeed, when we consider the effect sizes of group differences between ASD and TD participants in the case of studies that included idioms vs. proverbs (or both), we can see that that the results were overlapping, without an apparent separation between the two types of figurative language. Evaluating differences in idiom and proverb comprehension in both ASD and TD populations might be an interesting future direction in figurative language research.

Another future direction could be to extend the systematic investigation of figurative language processing in autism to other figures of speech. Recent systematic reviews and meta-analyses have investigated figurative language processing in autism in general (Kalandadze et al. 2018), as well as the processing of metaphors (Kalandadze et al. 2019; Morsanyi et al. 2020a, 2020b). An interesting future direction could be to review the studies related to irony—a figure of speech which not only communicates the opposite of what is said, but is also more dependent on pragmatics than metaphor, idioms or proverbs. This is especially true for sarcasm, a subtype of irony which is directed at a person, with the intent to criticize. These phenomena depend on both relevant background knowledge (which allows us to see how an ironic expression contradicts what is usual or expected) and prosodic features/intonation patterns which make them easier to understand.